Add readthedocs for diffsynth-studio

* add conf docs

* add conf docs

* add index

* add index

* update ref

* test root

* add en

* test relative

* redirect relative

* add document

* test_document

* test_document
This commit is contained in:
Hong Zhang
2026-02-10 19:51:04 +08:00
committed by GitHub
parent f6d85f3c2e
commit b3b63fef3e
68 changed files with 777 additions and 267 deletions

5
diffsynth/version.py Normal file
View File

@@ -0,0 +1,5 @@
# Make sure to modify __release_datetime__ to release time when making official release.
__version__ = '2.0.0'
# default release datetime for branches under active development is set
# to be a time far-far-away-into-the-future
__release_datetime__ = '2099-10-13 08:56:12'

28
docs/en/.readthedocs.yaml Normal file
View File

@@ -0,0 +1,28 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.10"
# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/en/conf.py
# Optionally build your docs in additional formats such as PDF and ePub
# formats:
# - pdf
# - epub
# Optional but recommended, declare the Python requirements required
# to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt

View File

@@ -1,6 +1,6 @@
# `diffsynth.core.attention`: Attention Mechanism Implementation # `diffsynth.core.attention`: Attention Mechanism Implementation
`diffsynth.core.attention` provides routing mechanisms for attention mechanism implementations, automatically selecting efficient attention implementations based on available packages in the `Python` environment and [environment variables](/docs/en/Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation). `diffsynth.core.attention` provides routing mechanisms for attention mechanism implementations, automatically selecting efficient attention implementations based on available packages in the `Python` environment and [environment variables](../../Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation).
## Attention Mechanism ## Attention Mechanism
@@ -46,7 +46,7 @@ Note that the dimension of the Attention Score in the attention mechanism ( $\te
* xFormers: [GitHub](https://github.com/facebookresearch/xformers), [Documentation](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) * xFormers: [GitHub](https://github.com/facebookresearch/xformers), [Documentation](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops)
* PyTorch: [GitHub](https://github.com/pytorch/pytorch), [Documentation](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) * PyTorch: [GitHub](https://github.com/pytorch/pytorch), [Documentation](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html)
To call attention implementations other than `PyTorch`, please follow the instructions on their GitHub pages to install the corresponding packages. `DiffSynth-Studio` will automatically route to the corresponding implementation based on available packages in the Python environment, or can be controlled through [environment variables](/docs/en/Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation). To call attention implementations other than `PyTorch`, please follow the instructions on their GitHub pages to install the corresponding packages. `DiffSynth-Studio` will automatically route to the corresponding implementation based on available packages in the Python environment, or can be controlled through [environment variables](../../Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation).
```python ```python
from diffsynth.core.attention import attention_forward from diffsynth.core.attention import attention_forward

View File

@@ -8,9 +8,9 @@ This document introduces the model download and loading functionalities in `diff
### Downloading and Loading Models from Remote Sources ### Downloading and Loading Models from Remote Sources
Taking the model [DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny) as an example, after filling in `model_id` and `origin_file_pattern` in `ModelConfig`, the model can be automatically downloaded. By default, it downloads to the `./models` path, which can be modified through the [environment variable DIFFSYNTH_MODEL_BASE_PATH](/docs/en/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path). Taking the model [DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny) as an example, after filling in `model_id` and `origin_file_pattern` in `ModelConfig`, the model can be automatically downloaded. By default, it downloads to the `./models` path, which can be modified through the [environment variable DIFFSYNTH_MODEL_BASE_PATH](../../Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path).
By default, even if the model has already been downloaded, the program will still query the remote for any missing files. To completely disable remote requests, set the [environment variable DIFFSYNTH_SKIP_DOWNLOAD](/docs/en/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) to `True`. By default, even if the model has already been downloaded, the program will still query the remote for any missing files. To completely disable remote requests, set the [environment variable DIFFSYNTH_SKIP_DOWNLOAD](../../Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) to `True`.
```python ```python
from diffsynth.core import ModelConfig from diffsynth.core import ModelConfig
@@ -51,7 +51,7 @@ config = ModelConfig(path=[
### VRAM Management Configuration ### VRAM Management Configuration
`ModelConfig` also contains VRAM management configuration information. See [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md#more-usage-methods) for details. `ModelConfig` also contains VRAM management configuration information. See [VRAM Management](../../Pipeline_Usage/VRAM_management.md#more-usage-methods) for details.
## Model File Loading ## Model File Loading
@@ -103,11 +103,11 @@ print(hash_model_file([
The model hash value is only related to the keys and tensor shapes in the state dict of the model file, and is unrelated to the numerical values of the model parameters, file saving time, and other information. When calculating the model hash value of `.safetensors` format files, `hash_model_file` is almost instantly completed without reading the model parameters. However, when calculating the model hash value of `.bin`, `.pth`, `.ckpt`, and other binary files, all model parameters need to be read, so **we do not recommend developers to continue using these formats of files.** The model hash value is only related to the keys and tensor shapes in the state dict of the model file, and is unrelated to the numerical values of the model parameters, file saving time, and other information. When calculating the model hash value of `.safetensors` format files, `hash_model_file` is almost instantly completed without reading the model parameters. However, when calculating the model hash value of `.bin`, `.pth`, `.ckpt`, and other binary files, all model parameters need to be read, so **we do not recommend developers to continue using these formats of files.**
By [writing model Config](/docs/en/Developer_Guide/Integrating_Your_Model.md#step-3-writing-model-config) and filling in model hash value and other information into `diffsynth/configs/model_configs.py`, developers can let `DiffSynth-Studio` automatically identify the model type and load it. By [writing model Config](../../Developer_Guide/Integrating_Your_Model.md#step-3-writing-model-config) and filling in model hash value and other information into `diffsynth/configs/model_configs.py`, developers can let `DiffSynth-Studio` automatically identify the model type and load it.
## Model Loading ## Model Loading
`load_model` is the external entry for loading models in `diffsynth.core.loader`. It will call [skip_model_initialization](/docs/en/API_Reference/core/vram.md#skipping-model-parameter-initialization) to skip model parameter initialization. If [Disk Offload](/docs/en/Pipeline_Usage/VRAM_management.md#disk-offload) is enabled, it calls [DiskMap](/docs/en/API_Reference/core/vram.md#state-dict-disk-mapping) for lazy loading. If Disk Offload is not enabled, it calls [load_state_dict](#model-file-loading) to load model parameters. If necessary, it will also call [state dict converter](/docs/en/Developer_Guide/Integrating_Your_Model.md#step-2-model-file-format-conversion) for model format conversion. Finally, it calls `model.eval()` to switch to inference mode. `load_model` is the external entry for loading models in `diffsynth.core.loader`. It will call [skip_model_initialization](../../API_Reference/core/vram.md#skipping-model-parameter-initialization) to skip model parameter initialization. If [Disk Offload](../../Pipeline_Usage/VRAM_management.md#disk-offload) is enabled, it calls [DiskMap](../../API_Reference/core/vram.md#state-dict-disk-mapping) for lazy loading. If Disk Offload is not enabled, it calls [load_state_dict](#model-file-loading) to load model parameters. If necessary, it will also call [state dict converter](../../Developer_Guide/Integrating_Your_Model.md#step-2-model-file-format-conversion) for model format conversion. Finally, it calls `model.eval()` to switch to inference mode.
Here is a usage example with Disk Offload enabled: Here is a usage example with Disk Offload enabled:

View File

@@ -31,7 +31,7 @@ state_dict = load_state_dict(path, device="cpu")
model.load_state_dict(state_dict, assign=True) model.load_state_dict(state_dict, assign=True)
``` ```
In `DiffSynth-Studio`, all pretrained models follow this loading logic. After developers [integrate models](/docs/en/Developer_Guide/Integrating_Your_Model.md), they can directly load models quickly using this approach. In `DiffSynth-Studio`, all pretrained models follow this loading logic. After developers [integrate models](../../Developer_Guide/Integrating_Your_Model.md), they can directly load models quickly using this approach.
## State Dict Disk Mapping ## State Dict Disk Mapping
@@ -57,10 +57,10 @@ state_dict = DiskMap(path, device="cpu") # Fast
print(state_dict["img_in.weight"]) print(state_dict["img_in.weight"])
``` ```
`DiskMap` is the basic component of Disk Offload in `DiffSynth-Studio`. After developers [configure fine-grained VRAM management schemes](/docs/en/Developer_Guide/Enabling_VRAM_management.md), they can directly enable Disk Offload. `DiskMap` is the basic component of Disk Offload in `DiffSynth-Studio`. After developers [configure fine-grained VRAM management schemes](../../Developer_Guide/Enabling_VRAM_management.md), they can directly enable Disk Offload.
`DiskMap` is a functionality implemented using the characteristics of `.safetensors` files. Therefore, when using `.bin`, `.pth`, `.ckpt`, and other binary files, model parameters are fully loaded, which causes Disk Offload to not support these formats of files. **We do not recommend developers to continue using these formats of files.** `DiskMap` is a functionality implemented using the characteristics of `.safetensors` files. Therefore, when using `.bin`, `.pth`, `.ckpt`, and other binary files, model parameters are fully loaded, which causes Disk Offload to not support these formats of files. **We do not recommend developers to continue using these formats of files.**
## Replacable Modules for VRAM Management ## Replacable Modules for VRAM Management
When `DiffSynth-Studio`'s VRAM management is enabled, the modules inside the model will be replaced with replacable modules in `diffsynth.core.vram.layers`. For usage, see [Fine-grained VRAM Management Scheme](/docs/en/Developer_Guide/Enabling_VRAM_management.md#writing-fine-grained-vram-management-schemes). When `DiffSynth-Studio`'s VRAM management is enabled, the modules inside the model will be replaced with replacable modules in `diffsynth.core.vram.layers`. For usage, see [Fine-grained VRAM Management Scheme](../../Developer_Guide/Enabling_VRAM_management.md#writing-fine-grained-vram-management-schemes).

View File

@@ -1,6 +1,6 @@
# Building a Pipeline # Building a Pipeline
After [integrating the required models for the Pipeline](/docs/en/Developer_Guide/Integrating_Your_Model.md), you also need to build a `Pipeline` for model inference. This document provides a standardized process for building a `Pipeline`. Developers can also refer to existing `Pipeline` implementations for construction. After [integrating the required models for the Pipeline](../Developer_Guide/Integrating_Your_Model.md), you also need to build a `Pipeline` for model inference. This document provides a standardized process for building a `Pipeline`. Developers can also refer to existing `Pipeline` implementations for construction.
The `Pipeline` implementation is located in `diffsynth/pipelines`. Each `Pipeline` contains the following essential key components: The `Pipeline` implementation is located in `diffsynth/pipelines`. Each `Pipeline` contains the following essential key components:
@@ -79,7 +79,7 @@ This includes the following parts:
return pipe return pipe
``` ```
Developers need to implement the logic for fetching models. The corresponding model names are the `"model_name"` in the [model Config filled in during model integration](/docs/en/Developer_Guide/Integrating_Your_Model.md#step-3-writing-model-config). Developers need to implement the logic for fetching models. The corresponding model names are the `"model_name"` in the [model Config filled in during model integration](../Developer_Guide/Integrating_Your_Model.md#step-3-writing-model-config).
Some models also need to load `tokenizer`. Extra `tokenizer_config` parameters can be added to `from_pretrained` as needed, and this part can be implemented after fetching the models. Some models also need to load `tokenizer`. Extra `tokenizer_config` parameters can be added to `from_pretrained` as needed, and this part can be implemented after fetching the models.

View File

@@ -1,6 +1,6 @@
# Fine-Grained VRAM Management Scheme # Fine-Grained VRAM Management Scheme
This document introduces how to write reasonable fine-grained VRAM management schemes for models, and how to use the VRAM management functions in `DiffSynth-Studio` for other external code libraries. Before reading this document, please read the document [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md). This document introduces how to write reasonable fine-grained VRAM management schemes for models, and how to use the VRAM management functions in `DiffSynth-Studio` for other external code libraries. Before reading this document, please read the document [VRAM Management](../Pipeline_Usage/VRAM_management.md).
## How Much VRAM Does a 20B Model Need? ## How Much VRAM Does a 20B Model Need?
@@ -124,7 +124,7 @@ module_map={
} }
``` ```
In addition, `vram_config` and `vram_limit` are also required, which have been introduced in [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md#more-usage-methods). In addition, `vram_config` and `vram_limit` are also required, which have been introduced in [VRAM Management](../Pipeline_Usage/VRAM_management.md#more-usage-methods).
Call `enable_vram_management` to enable VRAM management. Note that the `device` when loading the model is `cpu`, consistent with `offload_device`: Call `enable_vram_management` to enable VRAM management. Note that the `device` when loading the model is `cpu`, consistent with `offload_device`:
@@ -171,7 +171,7 @@ The above code only requires 2G VRAM to run the `forward` of a 20B model.
## Disk Offload ## Disk Offload
[Disk Offload](/docs/en/Pipeline_Usage/VRAM_management.md#disk-offload) is a special VRAM management scheme that needs to be enabled during the model loading process, not after the model is loaded. Usually, when the above code can run smoothly, Disk Offload can be directly enabled: [Disk Offload](../Pipeline_Usage/VRAM_management.md#disk-offload) is a special VRAM management scheme that needs to be enabled during the model loading process, not after the model is loaded. Usually, when the above code can run smoothly, Disk Offload can be directly enabled:
```python ```python
from diffsynth.core import load_model, enable_vram_management, AutoWrappedLinear, AutoWrappedModule from diffsynth.core import load_model, enable_vram_management, AutoWrappedLinear, AutoWrappedModule
@@ -212,7 +212,7 @@ with torch.no_grad():
output = model(**inputs) output = model(**inputs)
``` ```
Disk Offload is an extremely special VRAM management scheme. It only supports `.safetensors` format files, not binary files such as `.bin`, `.pth`, `.ckpt`, and does not support [state dict converter](/docs/en/Developer_Guide/Integrating_Your_Model.md#step-2-model-file-format-conversion) with Tensor reshape. Disk Offload is an extremely special VRAM management scheme. It only supports `.safetensors` format files, not binary files such as `.bin`, `.pth`, `.ckpt`, and does not support [state dict converter](../Developer_Guide/Integrating_Your_Model.md#step-2-model-file-format-conversion) with Tensor reshape.
If there are situations where Disk Offload cannot run normally but non-Disk Offload can run normally, please submit an issue to us on GitHub. If there are situations where Disk Offload cannot run normally but non-Disk Offload can run normally, please submit an issue to us on GitHub.
@@ -227,7 +227,7 @@ To make it easier for users to use the VRAM management function, we write the fi
} }
```# Fine-Grained VRAM Management Scheme ```# Fine-Grained VRAM Management Scheme
This document introduces how to write reasonable fine-grained VRAM management schemes for models, and how to use the VRAM management functions in `DiffSynth-Studio` for other external code libraries. Before reading this document, please read the document [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md). This document introduces how to write reasonable fine-grained VRAM management schemes for models, and how to use the VRAM management functions in `DiffSynth-Studio` for other external code libraries. Before reading this document, please read the document [VRAM Management](../Pipeline_Usage/VRAM_management.md).
## How Much VRAM Does a 20B Model Need? ## How Much VRAM Does a 20B Model Need?
@@ -351,7 +351,7 @@ module_map={
} }
``` ```
In addition, `vram_config` and `vram_limit` are also required, which have been introduced in [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md#more-usage-methods). In addition, `vram_config` and `vram_limit` are also required, which have been introduced in [VRAM Management](../Pipeline_Usage/VRAM_management.md#more-usage-methods).
Call `enable_vram_management` to enable VRAM management. Note that the `device` when loading the model is `cpu`, consistent with `offload_device`: Call `enable_vram_management` to enable VRAM management. Note that the `device` when loading the model is `cpu`, consistent with `offload_device`:
@@ -398,7 +398,7 @@ The above code only requires 2G VRAM to run the `forward` of a 20B model.
## Disk Offload ## Disk Offload
[Disk Offload](/docs/en/Pipeline_Usage/VRAM_management.md#disk-offload) is a special VRAM management scheme that needs to be enabled during the model loading process, not after the model is loaded. Usually, when the above code can run smoothly, Disk Offload can be directly enabled: [Disk Offload](../Pipeline_Usage/VRAM_management.md#disk-offload) is a special VRAM management scheme that needs to be enabled during the model loading process, not after the model is loaded. Usually, when the above code can run smoothly, Disk Offload can be directly enabled:
```python ```python
from diffsynth.core import load_model, enable_vram_management, AutoWrappedLinear, AutoWrappedModule from diffsynth.core import load_model, enable_vram_management, AutoWrappedLinear, AutoWrappedModule
@@ -439,7 +439,7 @@ with torch.no_grad():
output = model(**inputs) output = model(**inputs)
``` ```
Disk Offload is an extremely special VRAM management scheme. It only supports `.safetensors` format files, not binary files such as `.bin`, `.pth`, `.ckpt`, and does not support [state dict converter](/docs/en/Developer_Guide/Integrating_Your_Model.md#step-2-model-file-format-conversion) with Tensor reshape. Disk Offload is an extremely special VRAM management scheme. It only supports `.safetensors` format files, not binary files such as `.bin`, `.pth`, `.ckpt`, and does not support [state dict converter](../Developer_Guide/Integrating_Your_Model.md#step-2-model-file-format-conversion) with Tensor reshape.
If there are situations where Disk Offload cannot run normally but non-Disk Offload can run normally, please submit an issue to us on GitHub. If there are situations where Disk Offload cannot run normally but non-Disk Offload can run normally, please submit an issue to us on GitHub.

View File

@@ -183,4 +183,4 @@ Loaded model: {
## Step 5: Writing Model VRAM Management Scheme ## Step 5: Writing Model VRAM Management Scheme
`DiffSynth-Studio` supports complex VRAM management. See [Enabling VRAM Management](/docs/en/Developer_Guide/Enabling_VRAM_management.md) for details. `DiffSynth-Studio` supports complex VRAM management. See [Enabling VRAM Management](../Developer_Guide/Enabling_VRAM_management.md) for details.

View File

@@ -1,6 +1,6 @@
# Integrating Model Training # Integrating Model Training
After [integrating models](/docs/en/Developer_Guide/Integrating_Your_Model.md) and [implementing Pipeline](/docs/en/Developer_Guide/Building_a_Pipeline.md), the next step is to integrate model training functionality. After [integrating models](../Developer_Guide/Integrating_Your_Model.md) and [implementing Pipeline](../Developer_Guide/Building_a_Pipeline.md), the next step is to integrate model training functionality.
## Training-Inference Consistent Pipeline Modification ## Training-Inference Consistent Pipeline Modification

20
docs/en/Makefile Normal file
View File

@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

View File

@@ -14,7 +14,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
For more information about installation, please refer to [Install Dependencies](/docs/en/Pipeline_Usage/Setup.md). For more information about installation, please refer to [Install Dependencies](../Pipeline_Usage/Setup.md).
## Quick Start ## Quick Start
@@ -98,14 +98,14 @@ graph LR;
Special Training Scripts: Special Training Scripts:
* Differential LoRA Training: [doc](/docs/en/Training/Differential_LoRA.md), [code](/examples/flux/model_training/special/differential_training/) * Differential LoRA Training: [doc](../Training/Differential_LoRA.md), [code](/examples/flux/model_training/special/differential_training/)
* FP8 Precision Training: [doc](/docs/en/Training/FP8_Precision.md), [code](/examples/flux/model_training/special/fp8_training/) * FP8 Precision Training: [doc](../Training/FP8_Precision.md), [code](/examples/flux/model_training/special/fp8_training/)
* Two-stage Split Training: [doc](/docs/en/Training/Split_Training.md), [code](/examples/flux/model_training/special/split_training/) * Two-stage Split Training: [doc](../Training/Split_Training.md), [code](/examples/flux/model_training/special/split_training/)
* End-to-end Direct Distillation: [doc](/docs/en/Training/Direct_Distill.md), [code](/examples/flux/model_training/lora/FLUX.1-dev-Distill-LoRA.sh) * End-to-end Direct Distillation: [doc](../Training/Direct_Distill.md), [code](/examples/flux/model_training/lora/FLUX.1-dev-Distill-LoRA.sh)
## Model Inference ## Model Inference
Models are loaded via `FluxImagePipeline.from_pretrained`, see [Loading Models](/docs/en/Pipeline_Usage/Model_Inference.md#loading-models). Models are loaded via `FluxImagePipeline.from_pretrained`, see [Loading Models](../Pipeline_Usage/Model_Inference.md#loading-models).
Input parameters for `FluxImagePipeline` inference include: Input parameters for `FluxImagePipeline` inference include:
@@ -143,7 +143,7 @@ Input parameters for `FluxImagePipeline` inference include:
* `flex_control_stop`: Flex model control stop timestep. * `flex_control_stop`: Flex model control stop timestep.
* `nexus_gen_reference_image`: Nexus-Gen model reference image. * `nexus_gen_reference_image`: Nexus-Gen model reference image.
If VRAM is insufficient, please enable [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the "Model Overview" section above. If VRAM is insufficient, please enable [VRAM Management](../Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the "Model Overview" section above.
## Model Training ## Model Training
@@ -198,4 +198,4 @@ We have built a sample image dataset for your testing. You can download this dat
modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
``` ```
We have written recommended training scripts for each model, please refer to the table in the "Model Overview" section above. For how to write model training scripts, please refer to [Model Training](/docs/en/Pipeline_Usage/Model_Training.md); for more advanced training algorithms, please refer to [Training Framework Detailed Explanation](/docs/Training/). We have written recommended training scripts for each model, please refer to the table in the "Model Overview" section above. For how to write model training scripts, please refer to [Model Training](../Pipeline_Usage/Model_Training.md); for more advanced training algorithms, please refer to [Training Framework Detailed Explanation](/docs/Training/).

View File

@@ -21,7 +21,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
For more information about installation, please refer to [Install Dependencies](/docs/en/Pipeline_Usage/Setup.md). For more information about installation, please refer to [Install Dependencies](../Pipeline_Usage/Setup.md).
## Quick Start ## Quick Start
@@ -69,14 +69,14 @@ image.save("image.jpg")
Special Training Scripts: Special Training Scripts:
* Differential LoRA Training: [doc](/docs/en/Training/Differential_LoRA.md) * Differential LoRA Training: [doc](../Training/Differential_LoRA.md)
* FP8 Precision Training: [doc](/docs/en/Training/FP8_Precision.md) * FP8 Precision Training: [doc](../Training/FP8_Precision.md)
* Two-stage Split Training: [doc](/docs/en/Training/Split_Training.md) * Two-stage Split Training: [doc](../Training/Split_Training.md)
* End-to-end Direct Distillation: [doc](/docs/en/Training/Direct_Distill.md) * End-to-end Direct Distillation: [doc](../Training/Direct_Distill.md)
## Model Inference ## Model Inference
Models are loaded via `Flux2ImagePipeline.from_pretrained`, see [Loading Models](/docs/en/Pipeline_Usage/Model_Inference.md#loading-models). Models are loaded via `Flux2ImagePipeline.from_pretrained`, see [Loading Models](../Pipeline_Usage/Model_Inference.md#loading-models).
Input parameters for `Flux2ImagePipeline` inference include: Input parameters for `Flux2ImagePipeline` inference include:
@@ -95,7 +95,7 @@ Input parameters for `Flux2ImagePipeline` inference include:
* `tile_stride`: Tile stride during VAE encoding/decoding stages, default is 64, only effective when `tiled=True`, must be less than or equal to `tile_size`. * `tile_stride`: Tile stride during VAE encoding/decoding stages, default is 64, only effective when `tiled=True`, must be less than or equal to `tile_size`.
* `progress_bar_cmd`: Progress bar, default is `tqdm.tqdm`. Can be disabled by setting to `lambda x:x`. * `progress_bar_cmd`: Progress bar, default is `tqdm.tqdm`. Can be disabled by setting to `lambda x:x`.
If VRAM is insufficient, please enable [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the "Model Overview" section above. If VRAM is insufficient, please enable [VRAM Management](../Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the "Model Overview" section above.
## Model Training ## Model Training
@@ -148,4 +148,4 @@ We have built a sample image dataset for your testing. You can download this dat
modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
``` ```
We have written recommended training scripts for each model, please refer to the table in the "Model Overview" section above. For how to write model training scripts, please refer to [Model Training](/docs/en/Pipeline_Usage/Model_Training.md); for more advanced training algorithms, please refer to [Training Framework Detailed Explanation](/docs/Training/). We have written recommended training scripts for each model, please refer to the table in the "Model Overview" section above. For how to write model training scripts, please refer to [Model Training](../Pipeline_Usage/Model_Training.md); for more advanced training algorithms, please refer to [Training Framework Detailed Explanation](/docs/Training/).

View File

@@ -12,7 +12,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
For more information about installation, please refer to [Installation Dependencies](/docs/en/Pipeline_Usage/Setup.md). For more information about installation, please refer to [Installation Dependencies](../Pipeline_Usage/Setup.md).
## Quick Start ## Quick Start
@@ -83,7 +83,7 @@ write_video_audio_ltx2(
## Model Inference ## Model Inference
Models are loaded through `LTX2AudioVideoPipeline.from_pretrained`, see [Loading Models](/docs/en/Pipeline_Usage/Model_Inference.md#loading-models) for details. Models are loaded through `LTX2AudioVideoPipeline.from_pretrained`, see [Loading Models](../Pipeline_Usage/Model_Inference.md#loading-models) for details.
Input parameters for `LTX2AudioVideoPipeline` inference include: Input parameters for `LTX2AudioVideoPipeline` inference include:
@@ -109,7 +109,7 @@ Input parameters for `LTX2AudioVideoPipeline` inference include:
* `use_distilled_pipeline`: Whether to use distilled pipeline, default is `False`. * `use_distilled_pipeline`: Whether to use distilled pipeline, default is `False`.
* `progress_bar_cmd`: Progress bar, default is `tqdm.tqdm`. Can be set to `lambda x:x` to hide the progress bar. * `progress_bar_cmd`: Progress bar, default is `tqdm.tqdm`. Can be set to `lambda x:x` to hide the progress bar.
If VRAM is insufficient, please enable [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the previous "Supported Inference Scripts" section. If VRAM is insufficient, please enable [VRAM Management](../Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the previous "Supported Inference Scripts" section.
## Model Training ## Model Training

View File

@@ -2,7 +2,7 @@
## Qwen-Image ## Qwen-Image
Documentation: [./Qwen-Image.md](/docs/en/Model_Details/Qwen-Image.md) Documentation: [./Qwen-Image.md](../Model_Details/Qwen-Image.md)
<details> <details>
@@ -85,7 +85,7 @@ graph LR;
## FLUX Series ## FLUX Series
Documentation: [./FLUX.md](/docs/en/Model_Details/FLUX.md) Documentation: [./FLUX.md](../Model_Details/FLUX.md)
<details> <details>
@@ -166,7 +166,7 @@ graph LR;
## Wan Series ## Wan Series
Documentation: [./Wan.md](/docs/en/Model_Details/Wan.md) Documentation: [./Wan.md](../Model_Details/Wan.md)
<details> <details>
@@ -286,6 +286,6 @@ graph LR;
| [PAI/Wan2.2-Fun-A14B-Control](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control) | `control_video`, `reference_image` | [code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control.py) | [code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control.sh) | [code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control.py) | [code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control.sh) | [code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control.py) | | [PAI/Wan2.2-Fun-A14B-Control](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control) | `control_video`, `reference_image` | [code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control.py) | [code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control.sh) | [code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control.py) | [code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control.sh) | [code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control.py) |
| [PAI/Wan2.2-Fun-A14B-Control-Camera](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control-Camera) | `control_camera_video`, `input_image` | [code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control-Camera.py) | [code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control-Camera.sh) | [code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control-Camera.py) | [code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control-Camera.sh) | [code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control-Camera.py) | | [PAI/Wan2.2-Fun-A14B-Control-Camera](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control-Camera) | `control_camera_video`, `input_image` | [code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control-Camera.py) | [code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control-Camera.sh) | [code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control-Camera.py) | [code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control-Camera.sh) | [code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control-Camera.py) |
* FP8 Precision Training: [doc](/docs/en/Training/FP8_Precision.md), [code](/examples/wanvideo/model_training/special/fp8_training/) * FP8 Precision Training: [doc](../Training/FP8_Precision.md), [code](/examples/wanvideo/model_training/special/fp8_training/)
* Two-stage Split Training: [doc](/docs/en/Training/Split_Training.md), [code](/examples/wanvideo/model_training/special/split_training/) * Two-stage Split Training: [doc](../Training/Split_Training.md), [code](/examples/wanvideo/model_training/special/split_training/)
* End-to-end Direct Distillation: [doc](/docs/en/Training/Direct_Distill.md), [code](/examples/wanvideo/model_training/special/direct_distill/) * End-to-end Direct Distillation: [doc](../Training/Direct_Distill.md), [code](/examples/wanvideo/model_training/special/direct_distill/)

View File

@@ -14,7 +14,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
For more information about installation, please refer to [Install Dependencies](/docs/en/Pipeline_Usage/Setup.md). For more information about installation, please refer to [Install Dependencies](../Pipeline_Usage/Setup.md).
## Quick Start ## Quick Start
@@ -102,10 +102,10 @@ graph LR;
Special Training Scripts: Special Training Scripts:
* Differential LoRA Training: [doc](/docs/en/Training/Differential_LoRA.md), [code](/examples/qwen_image/model_training/special/differential_training/) * Differential LoRA Training: [doc](../Training/Differential_LoRA.md), [code](/examples/qwen_image/model_training/special/differential_training/)
* FP8 Precision Training: [doc](/docs/en/Training/FP8_Precision.md), [code](/examples/qwen_image/model_training/special/fp8_training/) * FP8 Precision Training: [doc](../Training/FP8_Precision.md), [code](/examples/qwen_image/model_training/special/fp8_training/)
* Two-stage Split Training: [doc](/docs/en/Training/Split_Training.md), [code](/examples/qwen_image/model_training/special/split_training/) * Two-stage Split Training: [doc](../Training/Split_Training.md), [code](/examples/qwen_image/model_training/special/split_training/)
* End-to-end Direct Distillation: [doc](/docs/en/Training/Direct_Distill.md), [code](/examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh) * End-to-end Direct Distillation: [doc](../Training/Direct_Distill.md), [code](/examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh)
DeepSpeed ZeRO Stage 3 Training: The Qwen-Image series models support DeepSpeed ZeRO Stage 3 training, which partitions the model across multiple GPUs. Taking full parameter training of the Qwen-Image model as an example, the following modifications are required: DeepSpeed ZeRO Stage 3 Training: The Qwen-Image series models support DeepSpeed ZeRO Stage 3 training, which partitions the model across multiple GPUs. Taking full parameter training of the Qwen-Image model as an example, the following modifications are required:
@@ -114,7 +114,7 @@ DeepSpeed ZeRO Stage 3 Training: The Qwen-Image series models support DeepSpeed
## Model Inference ## Model Inference
Models are loaded via `QwenImagePipeline.from_pretrained`, see [Loading Models](/docs/en/Pipeline_Usage/Model_Inference.md#loading-models). Models are loaded via `QwenImagePipeline.from_pretrained`, see [Loading Models](../Pipeline_Usage/Model_Inference.md#loading-models).
Input parameters for `QwenImagePipeline` inference include: Input parameters for `QwenImagePipeline` inference include:
@@ -145,7 +145,7 @@ Input parameters for `QwenImagePipeline` inference include:
* `tile_stride`: Tile stride during VAE encoding/decoding stages, default is 64, only effective when `tiled=True`, must be less than or equal to `tile_size`. * `tile_stride`: Tile stride during VAE encoding/decoding stages, default is 64, only effective when `tiled=True`, must be less than or equal to `tile_size`.
* `progress_bar_cmd`: Progress bar, default is `tqdm.tqdm`. Can be disabled by setting to `lambda x:x`. * `progress_bar_cmd`: Progress bar, default is `tqdm.tqdm`. Can be disabled by setting to `lambda x:x`.
If VRAM is insufficient, please enable [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the "Model Overview" section above. If VRAM is insufficient, please enable [VRAM Management](../Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the "Model Overview" section above.
## Model Training ## Model Training
@@ -199,4 +199,4 @@ We have built a sample image dataset for your testing. You can download this dat
modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
``` ```
We have written recommended training scripts for each model, please refer to the table in the "Model Overview" section above. For how to write model training scripts, please refer to [Model Training](/docs/en/Pipeline_Usage/Model_Training.md); for more advanced training algorithms, please refer to [Training Framework Detailed Explanation](/docs/Training/). We have written recommended training scripts for each model, please refer to the table in the "Model Overview" section above. For how to write model training scripts, please refer to [Model Training](../Pipeline_Usage/Model_Training.md); for more advanced training algorithms, please refer to [Training Framework Detailed Explanation](/docs/Training/).

View File

@@ -14,7 +14,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
For more information about installation, please refer to [Install Dependencies](/docs/en/Pipeline_Usage/Setup.md). For more information about installation, please refer to [Install Dependencies](../Pipeline_Usage/Setup.md).
## Quick Start ## Quick Start
@@ -138,9 +138,9 @@ graph LR;
| [PAI/Wan2.2-Fun-A14B-Control](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control) | `control_video`, `reference_image` | [code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control.py) | [code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control.sh) | [code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control.py) | [code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control.sh) | [code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control.py) | | [PAI/Wan2.2-Fun-A14B-Control](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control) | `control_video`, `reference_image` | [code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control.py) | [code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control.sh) | [code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control.py) | [code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control.sh) | [code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control.py) |
| [PAI/Wan2.2-Fun-A14B-Control-Camera](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control-Camera) | `control_camera_video`, `input_image` | [code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control-Camera.py) | [code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control-Camera.sh) | [code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control-Camera.py) | [code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control-Camera.sh) | [code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control-Camera.py) | | [PAI/Wan2.2-Fun-A14B-Control-Camera](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control-Camera) | `control_camera_video`, `input_image` | [code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control-Camera.py) | [code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control-Camera.sh) | [code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control-Camera.py) | [code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control-Camera.sh) | [code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control-Camera.py) |
* FP8 Precision Training: [doc](/docs/en/Training/FP8_Precision.md), [code](/examples/wanvideo/model_training/special/fp8_training/) * FP8 Precision Training: [doc](../Training/FP8_Precision.md), [code](/examples/wanvideo/model_training/special/fp8_training/)
* Two-stage Split Training: [doc](/docs/en/Training/Split_Training.md), [code](/examples/wanvideo/model_training/special/split_training/) * Two-stage Split Training: [doc](../Training/Split_Training.md), [code](/examples/wanvideo/model_training/special/split_training/)
* End-to-end Direct Distillation: [doc](/docs/en/Training/Direct_Distill.md), [code](/examples/wanvideo/model_training/special/direct_distill/) * End-to-end Direct Distillation: [doc](../Training/Direct_Distill.md), [code](/examples/wanvideo/model_training/special/direct_distill/)
DeepSpeed ZeRO Stage 3 Training: The Wan series models support DeepSpeed ZeRO Stage 3 training, which partitions the model across multiple GPUs. Taking full parameter training of the Wan2.1-T2V-14B model as an example, the following modifications are required: DeepSpeed ZeRO Stage 3 Training: The Wan series models support DeepSpeed ZeRO Stage 3 training, which partitions the model across multiple GPUs. Taking full parameter training of the Wan2.1-T2V-14B model as an example, the following modifications are required:
@@ -149,7 +149,7 @@ DeepSpeed ZeRO Stage 3 Training: The Wan series models support DeepSpeed ZeRO St
## Model Inference ## Model Inference
Models are loaded via `WanVideoPipeline.from_pretrained`, see [Loading Models](/docs/en/Pipeline_Usage/Model_Inference.md#loading-models). Models are loaded via `WanVideoPipeline.from_pretrained`, see [Loading Models](../Pipeline_Usage/Model_Inference.md#loading-models).
Input parameters for `WanVideoPipeline` inference include: Input parameters for `WanVideoPipeline` inference include:
@@ -199,7 +199,7 @@ Input parameters for `WanVideoPipeline` inference include:
* `tea_cache_model_id`: Model ID used by TeaCache. * `tea_cache_model_id`: Model ID used by TeaCache.
* `progress_bar_cmd`: Progress bar, default is `tqdm.tqdm`. Can be disabled by setting to `lambda x:x`. * `progress_bar_cmd`: Progress bar, default is `tqdm.tqdm`. Can be disabled by setting to `lambda x:x`.
If VRAM is insufficient, please enable [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the "Model Overview" section above. If VRAM is insufficient, please enable [VRAM Management](../Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the "Model Overview" section above.
## Model Training ## Model Training
@@ -254,4 +254,4 @@ We have built a sample video dataset for your testing. You can download this dat
modelscope download --dataset DiffSynth-Studio/example_video_dataset --local_dir ./data/example_video_dataset modelscope download --dataset DiffSynth-Studio/example_video_dataset --local_dir ./data/example_video_dataset
``` ```
We have written recommended training scripts for each model, please refer to the table in the "Model Overview" section above. For how to write model training scripts, please refer to [Model Training](/docs/en/Pipeline_Usage/Model_Training.md); for more advanced training algorithms, please refer to [Training Framework Detailed Explanation](/docs/Training/). We have written recommended training scripts for each model, please refer to the table in the "Model Overview" section above. For how to write model training scripts, please refer to [Model Training](../Pipeline_Usage/Model_Training.md); for more advanced training algorithms, please refer to [Training Framework Detailed Explanation](/docs/Training/).

View File

@@ -12,7 +12,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
For more information about installation, please refer to [Install Dependencies](/docs/en/Pipeline_Usage/Setup.md). For more information about installation, please refer to [Install Dependencies](../Pipeline_Usage/Setup.md).
## Quick Start ## Quick Start
@@ -61,12 +61,12 @@ image.save("image.jpg")
Special Training Scripts: Special Training Scripts:
* Differential LoRA Training: [doc](/docs/en/Training/Differential_LoRA.md), [code](/examples/z_image/model_training/special/differential_training/) * Differential LoRA Training: [doc](../Training/Differential_LoRA.md), [code](/examples/z_image/model_training/special/differential_training/)
* Trajectory Imitation Distillation Training (Experimental Feature): [code](/examples/z_image/model_training/special/trajectory_imitation/) * Trajectory Imitation Distillation Training (Experimental Feature): [code](/examples/z_image/model_training/special/trajectory_imitation/)
## Model Inference ## Model Inference
Models are loaded via `ZImagePipeline.from_pretrained`, see [Loading Models](/docs/en/Pipeline_Usage/Model_Inference.md#loading-models). Models are loaded via `ZImagePipeline.from_pretrained`, see [Loading Models](../Pipeline_Usage/Model_Inference.md#loading-models).
Input parameters for `ZImagePipeline` inference include: Input parameters for `ZImagePipeline` inference include:
@@ -84,7 +84,7 @@ Input parameters for `ZImagePipeline` inference include:
* `edit_image`: Edit images for image editing models, supporting multiple images. * `edit_image`: Edit images for image editing models, supporting multiple images.
* `positive_only_lora`: LoRA weights used only in positive prompts. * `positive_only_lora`: LoRA weights used only in positive prompts.
If VRAM is insufficient, please enable [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the "Model Overview" section above. If VRAM is insufficient, please enable [VRAM Management](../Pipeline_Usage/VRAM_management.md). We provide recommended low VRAM configurations for each model in the example code, see the table in the "Model Overview" section above.
## Model Training ## Model Training
@@ -137,7 +137,7 @@ We have built a sample image dataset for your testing. You can download this dat
modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
``` ```
We have written recommended training scripts for each model, please refer to the table in the "Model Overview" section above. For how to write model training scripts, please refer to [Model Training](/docs/en/Pipeline_Usage/Model_Training.md); for more advanced training algorithms, please refer to [Training Framework Detailed Explanation](/docs/Training/). We have written recommended training scripts for each model, please refer to the table in the "Model Overview" section above. For how to write model training scripts, please refer to [Model Training](../Pipeline_Usage/Model_Training.md); for more advanced training algorithms, please refer to [Training Framework Detailed Explanation](/docs/Training/).
Training Tips: Training Tips:

View File

@@ -28,7 +28,7 @@ Model download root directory. Can be set to any local path. If `local_model_pat
## `DIFFSYNTH_ATTENTION_IMPLEMENTATION` ## `DIFFSYNTH_ATTENTION_IMPLEMENTATION`
Attention mechanism implementation method. Can be set to `flash_attention_3`, `flash_attention_2`, `sage_attention`, `xformers`, or `torch`. See [`./core/attention.md`](/docs/en/API_Reference/core/attention.md) for details. Attention mechanism implementation method. Can be set to `flash_attention_3`, `flash_attention_2`, `sage_attention`, `xformers`, or `torch`. See [`./core/attention.md`](../API_Reference/core/attention.md) for details.
## `DIFFSYNTH_DISK_MAP_BUFFER_SIZE` ## `DIFFSYNTH_DISK_MAP_BUFFER_SIZE`

View File

@@ -2,7 +2,7 @@
`DiffSynth-Studio` supports various GPUs and NPUs. This document explains how to run model inference and training on these devices. `DiffSynth-Studio` supports various GPUs and NPUs. This document explains how to run model inference and training on these devices.
Before you begin, please follow the [Installation Guide](/docs/en/Pipeline_Usage/Setup.md) to install the required GPU/NPU dependencies. Before you begin, please follow the [Installation Guide](../Pipeline_Usage/Setup.md) to install the required GPU/NPU dependencies.
## NVIDIA GPU ## NVIDIA GPU

View File

@@ -22,7 +22,7 @@ pipe = QwenImagePipeline.from_pretrained(
) )
``` ```
Where `torch_dtype` and `device` are computation precision and computation device (not model precision and device). `model_configs` can be configured in multiple ways for model paths. For how models are loaded internally in this project, please refer to [`diffsynth.core.loader`](/docs/en/API_Reference/core/loader.md). Where `torch_dtype` and `device` are computation precision and computation device (not model precision and device). `model_configs` can be configured in multiple ways for model paths. For how models are loaded internally in this project, please refer to [`diffsynth.core.loader`](../API_Reference/core/loader.md).
<details> <details>
@@ -34,7 +34,7 @@ Where `torch_dtype` and `device` are computation precision and computation devic
> ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"), > ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
> ``` > ```
> >
> Model files are downloaded to the `./models` path by default, which can be modified through [environment variable DIFFSYNTH_MODEL_BASE_PATH](/docs/en/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path). > Model files are downloaded to the `./models` path by default, which can be modified through [environment variable DIFFSYNTH_MODEL_BASE_PATH](../Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path).
</details> </details>
@@ -61,7 +61,7 @@ Where `torch_dtype` and `device` are computation precision and computation devic
</details> </details>
By default, even after models have been downloaded, the program will still query remotely for missing files. To completely disable remote requests, set [environment variable DIFFSYNTH_SKIP_DOWNLOAD](/docs/en/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) to `True`. By default, even after models have been downloaded, the program will still query remotely for missing files. To completely disable remote requests, set [environment variable DIFFSYNTH_SKIP_DOWNLOAD](../Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) to `True`.
```shell ```shell
import os import os
@@ -69,7 +69,7 @@ os.environ["DIFFSYNTH_SKIP_DOWNLOAD"] = "True"
import diffsynth import diffsynth
``` ```
To download models from [HuggingFace](https://huggingface.co/), set [environment variable DIFFSYNTH_DOWNLOAD_SOURCE](/docs/en/Pipeline_Usage/Environment_Variables.md#diffsynth_download_source) to `huggingface`. To download models from [HuggingFace](https://huggingface.co/), set [environment variable DIFFSYNTH_DOWNLOAD_SOURCE](../Pipeline_Usage/Environment_Variables.md#diffsynth_download_source) to `huggingface`.
```shell ```shell
import os import os
@@ -102,13 +102,13 @@ image.save("image.jpg")
Each model `Pipeline` has different input parameters. Please refer to the documentation for each model. Each model `Pipeline` has different input parameters. Please refer to the documentation for each model.
If the model parameters are too large, causing insufficient VRAM, please enable [VRAM management](/docs/en/Pipeline_Usage/VRAM_management.md). If the model parameters are too large, causing insufficient VRAM, please enable [VRAM management](../Pipeline_Usage/VRAM_management.md).
## Loading LoRA ## Loading LoRA
LoRA is a lightweight model training method that produces a small number of parameters to extend model capabilities. DiffSynth-Studio supports two ways to load LoRA: cold loading and hot loading. LoRA is a lightweight model training method that produces a small number of parameters to extend model capabilities. DiffSynth-Studio supports two ways to load LoRA: cold loading and hot loading.
* Cold loading: When the base model does not have [VRAM management](/docs/en/Pipeline_Usage/VRAM_management.md) enabled, LoRA will be fused into the base model weights. In this case, inference speed remains unchanged, but LoRA cannot be unloaded after loading. * Cold loading: When the base model does not have [VRAM management](../Pipeline_Usage/VRAM_management.md) enabled, LoRA will be fused into the base model weights. In this case, inference speed remains unchanged, but LoRA cannot be unloaded after loading.
```python ```python
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
@@ -131,7 +131,7 @@ image = pipe(prompt, seed=0, num_inference_steps=40)
image.save("image.jpg") image.save("image.jpg")
``` ```
* Hot loading: When the base model has [VRAM management](/docs/en/Pipeline_Usage/VRAM_management.md) enabled, LoRA will not be fused into the base model weights. In this case, inference speed will be slower, but LoRA can be unloaded through `pipe.clear_lora()` after loading. * Hot loading: When the base model has [VRAM management](../Pipeline_Usage/VRAM_management.md) enabled, LoRA will not be fused into the base model weights. In this case, inference speed will be slower, but LoRA can be unloaded through `pipe.clear_lora()` after loading.
```python ```python
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig

View File

@@ -65,7 +65,7 @@ image_1.jpg,"a dog"
image_2.jpg,"a cat" image_2.jpg,"a cat"
``` ```
We have built sample datasets for your testing. To understand how the universal dataset architecture is implemented, please refer to [`diffsynth.core.data`](/docs/en/API_Reference/core/data.md). We have built sample datasets for your testing. To understand how the universal dataset architecture is implemented, please refer to [`diffsynth.core.data`](../API_Reference/core/data.md).
<details> <details>
@@ -93,7 +93,7 @@ We have built sample datasets for your testing. To understand how the universal
## Loading Models ## Loading Models
Similar to [model loading during inference](/docs/en/Pipeline_Usage/Model_Inference.md#loading-models), we support multiple ways to configure model paths, and the two methods can be mixed. Similar to [model loading during inference](../Pipeline_Usage/Model_Inference.md#loading-models), we support multiple ways to configure model paths, and the two methods can be mixed.
<details> <details>
@@ -115,9 +115,9 @@ Similar to [model loading during inference](/docs/en/Pipeline_Usage/Model_Infere
> --model_id_with_origin_paths "Qwen/Qwen-Image:transformer/diffusion_pytorch_model*.safetensors,Qwen/Qwen-Image:text_encoder/model*.safetensors,Qwen/Qwen-Image:vae/diffusion_pytorch_model.safetensors" > --model_id_with_origin_paths "Qwen/Qwen-Image:transformer/diffusion_pytorch_model*.safetensors,Qwen/Qwen-Image:text_encoder/model*.safetensors,Qwen/Qwen-Image:vae/diffusion_pytorch_model.safetensors"
> ``` > ```
> >
> Model files are downloaded to the `./models` path by default, which can be modified through [environment variable DIFFSYNTH_MODEL_BASE_PATH](/docs/en/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path). > Model files are downloaded to the `./models` path by default, which can be modified through [environment variable DIFFSYNTH_MODEL_BASE_PATH](../Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path).
> >
> By default, even after models have been downloaded, the program will still query remotely for missing files. To completely disable remote requests, set [environment variable DIFFSYNTH_SKIP_DOWNLOAD](/docs/en/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) to `True`. > By default, even after models have been downloaded, the program will still query remotely for missing files. To completely disable remote requests, set [environment variable DIFFSYNTH_SKIP_DOWNLOAD](../Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) to `True`.
</details> </details>
@@ -237,11 +237,11 @@ accelerate launch --config_file examples/qwen_image/model_training/full/accelera
## Training Considerations ## Training Considerations
* In addition to the `csv` format, dataset metadata also supports `json` and `jsonl` formats. For how to choose the best metadata format, please refer to [/docs/en/API_Reference/core/data.md#metadata](/docs/en/API_Reference/core/data.md#metadata) * In addition to the `csv` format, dataset metadata also supports `json` and `jsonl` formats. For how to choose the best metadata format, please refer to [../API_Reference/core/data.md#metadata](../API_Reference/core/data.md#metadata)
* Training effectiveness is usually strongly correlated with training steps and weakly correlated with epoch count. Therefore, we recommend using the `--save_steps` parameter to save model files at training step intervals. * Training effectiveness is usually strongly correlated with training steps and weakly correlated with epoch count. Therefore, we recommend using the `--save_steps` parameter to save model files at training step intervals.
* When data volume * `dataset_repeat` exceeds $10^9$, we observed that the dataset speed becomes significantly slower, which seems to be a `PyTorch` bug. We are not sure if newer versions of `PyTorch` have fixed this issue. * When data volume * `dataset_repeat` exceeds $10^9$, we observed that the dataset speed becomes significantly slower, which seems to be a `PyTorch` bug. We are not sure if newer versions of `PyTorch` have fixed this issue.
* For learning rate `--learning_rate`, it is recommended to set to `1e-4` in LoRA training and `1e-5` in full training. * For learning rate `--learning_rate`, it is recommended to set to `1e-4` in LoRA training and `1e-5` in full training.
* The training framework does not support batch size > 1. The reasons are complex. See [Q&A: Why doesn't the training framework support batch size > 1?](/docs/en/QA.md#why-doesnt-the-training-framework-support-batch-size--1) * The training framework does not support batch size > 1. The reasons are complex. See [Q&A: Why doesn't the training framework support batch size > 1?](../QA.md#why-doesnt-the-training-framework-support-batch-size--1)
* Some models contain redundant parameters. For example, the text encoding part of the last layer of Qwen-Image's DiT part. When training these models, `--find_unused_parameters` needs to be set to avoid errors in multi-GPU training. For compatibility with community models, we do not intend to remove these redundant parameters. * Some models contain redundant parameters. For example, the text encoding part of the last layer of Qwen-Image's DiT part. When training these models, `--find_unused_parameters` needs to be set to avoid errors in multi-GPU training. For compatibility with community models, we do not intend to remove these redundant parameters.
* The loss function value of Diffusion models has little relationship with actual effects. Therefore, we do not record loss function values during training. We recommend setting `--num_epochs` to a sufficiently large value, testing while training, and manually closing the training program after the effect converges. * The loss function value of Diffusion models has little relationship with actual effects. Therefore, we do not record loss function values during training. We recommend setting `--num_epochs` to a sufficiently large value, testing while training, and manually closing the training program after the effect converges.
* `--use_gradient_checkpointing` is usually enabled unless GPU VRAM is sufficient; `--use_gradient_checkpointing_offload` is enabled as needed. See [`diffsynth.core.gradient`](/docs/en/API_Reference/core/gradient.md) for details. * `--use_gradient_checkpointing` is usually enabled unless GPU VRAM is sufficient; `--use_gradient_checkpointing_offload` is enabled as needed. See [`diffsynth.core.gradient`](../API_Reference/core/gradient.md) for details.

View File

@@ -41,7 +41,7 @@ pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6
# x86 # x86
pip install -e .[npu] pip install -e .[npu]
When using Ascend NPU, please replace `"cuda"` with `"npu"` in your Python code. For details, see [NPU Support](/docs/en/Pipeline_Usage/GPU_support.md#ascend-npu). When using Ascend NPU, please replace `"cuda"` with `"npu"` in your Python code. For details, see [NPU Support](../Pipeline_Usage/GPU_support.md#ascend-npu).
## Other Installation Issues ## Other Installation Issues

View File

@@ -140,7 +140,7 @@ image.save("image.jpg")
In more extreme cases, when memory is also insufficient to store the entire model, the Disk Offload feature allows lazy loading of model parameters, meaning each Layer of the model only reads the corresponding parameters from disk when the forward function is called. When enabling this feature, we recommend using high-speed SSD drives. In more extreme cases, when memory is also insufficient to store the entire model, the Disk Offload feature allows lazy loading of model parameters, meaning each Layer of the model only reads the corresponding parameters from disk when the forward function is called. When enabling this feature, we recommend using high-speed SSD drives.
Disk Offload is a very special VRAM management solution that only supports `.safetensors` format files, not `.bin`, `.pth`, `.ckpt`, or other binary files, and does not support [state dict converter](/docs/en/Developer_Guide/Integrating_Your_Model.md#step-2-model-file-format-conversion) with Tensor reshape. Disk Offload is a very special VRAM management solution that only supports `.safetensors` format files, not `.bin`, `.pth`, `.ckpt`, or other binary files, and does not support [state dict converter](../Developer_Guide/Integrating_Your_Model.md#step-2-model-file-format-conversion) with Tensor reshape.
```python ```python
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
@@ -196,7 +196,7 @@ Specifically, the VRAM management module divides model Layers into the following
* Preparing: Intermediate state between Onload and Computation. A temporary storage state when VRAM allows. This state is controlled by the VRAM management mechanism and enters this state if and only if [vram_limit is set to unlimited] or [vram_limit is set and there is spare VRAM] * Preparing: Intermediate state between Onload and Computation. A temporary storage state when VRAM allows. This state is controlled by the VRAM management mechanism and enters this state if and only if [vram_limit is set to unlimited] or [vram_limit is set and there is spare VRAM]
* Computation: The model is being computed. This state is controlled by the VRAM management mechanism and is temporarily entered only during `forward` * Computation: The model is being computed. This state is controlled by the VRAM management mechanism and is temporarily entered only during `forward`
If you are a model developer and want to control the VRAM management granularity of a specific model, please refer to [../Developer_Guide/Enabling_VRAM_management.md](/docs/en/Developer_Guide/Enabling_VRAM_management.md). If you are a model developer and want to control the VRAM management granularity of a specific model, please refer to [../Developer_Guide/Enabling_VRAM_management.md](../Developer_Guide/Enabling_VRAM_management.md).
## Best Practices ## Best Practices

View File

@@ -29,7 +29,7 @@ Therefore, native FP8 precision training technology is extremely immature. We wi
## How to dynamically load LoRA models during inference? ## How to dynamically load LoRA models during inference?
We support two loading methods for LoRA models. See [LoRA Loading](/docs/en/Pipeline_Usage/Model_Inference.md#loading-lora) for details: We support two loading methods for LoRA models. See [LoRA Loading](./Pipeline_Usage/Model_Inference.md#loading-lora) for details:
* Cold Loading: When [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md) is not enabled for the base model, LoRA will be fused into the base model weights. In this case, inference speed remains unchanged, and LoRA cannot be unloaded after loading. * Cold Loading: When [VRAM Management](./Pipeline_Usage/VRAM_management.md) is not enabled for the base model, LoRA will be fused into the base model weights. In this case, inference speed remains unchanged, and LoRA cannot be unloaded after loading.
* Hot Loading: When [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md) is enabled for the base model, LoRA will not be fused into the base model weights. In this case, inference speed will slow down, and LoRA can be unloaded after loading via `pipe.clear_lora()`. * Hot Loading: When [VRAM Management](./Pipeline_Usage/VRAM_management.md) is enabled for the base model, LoRA will not be fused into the base model weights. In this case, inference speed will slow down, and LoRA can be unloaded after loading via `pipe.clear_lora()`.

View File

@@ -26,58 +26,58 @@ graph LR;
This section introduces the basic usage of `DiffSynth-Studio`, including how to enable VRAM management for inference on GPUs with extremely low VRAM, and how to train various base models, LoRAs, ControlNets, and other models. This section introduces the basic usage of `DiffSynth-Studio`, including how to enable VRAM management for inference on GPUs with extremely low VRAM, and how to train various base models, LoRAs, ControlNets, and other models.
* [Installation Dependencies](/docs/en/Pipeline_Usage/Setup.md) * [Installation Dependencies](./Pipeline_Usage/Setup.md)
* [Model Inference](/docs/en/Pipeline_Usage/Model_Inference.md) * [Model Inference](./Pipeline_Usage/Model_Inference.md)
* [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md) * [VRAM Management](./Pipeline_Usage/VRAM_management.md)
* [Model Training](/docs/en/Pipeline_Usage/Model_Training.md) * [Model Training](./Pipeline_Usage/Model_Training.md)
* [Environment Variables](/docs/en/Pipeline_Usage/Environment_Variables.md) * [Environment Variables](./Pipeline_Usage/Environment_Variables.md)
* [GPU/NPU Support](/docs/en/Pipeline_Usage/GPU_support.md) * [GPU/NPU Support](./Pipeline_Usage/GPU_support.md)
## Section 2: Model Details ## Section 2: Model Details
This section introduces the Diffusion models supported by `DiffSynth-Studio`. Some model pipelines feature special functionalities such as controllable generation and parallel acceleration. This section introduces the Diffusion models supported by `DiffSynth-Studio`. Some model pipelines feature special functionalities such as controllable generation and parallel acceleration.
* [FLUX.1](/docs/en/Model_Details/FLUX.md) * [FLUX.1](./Model_Details/FLUX.md)
* [Wan](/docs/en/Model_Details/Wan.md) * [Wan](./Model_Details/Wan.md)
* [Qwen-Image](/docs/en/Model_Details/Qwen-Image.md) * [Qwen-Image](./Model_Details/Qwen-Image.md)
* [FLUX.2](/docs/en/Model_Details/FLUX2.md) * [FLUX.2](./Model_Details/FLUX2.md)
* [Z-Image](/docs/en/Model_Details/Z-Image.md) * [Z-Image](./Model_Details/Z-Image.md)
## Section 3: Training Framework ## Section 3: Training Framework
This section introduces the design philosophy of the training framework in `DiffSynth-Studio`, helping developers understand the principles of Diffusion model training algorithms. This section introduces the design philosophy of the training framework in `DiffSynth-Studio`, helping developers understand the principles of Diffusion model training algorithms.
* [Basic Principles of Diffusion Models](/docs/en/Training/Understanding_Diffusion_models.md) * [Basic Principles of Diffusion Models](./Training/Understanding_Diffusion_models.md)
* [Standard Supervised Training](/docs/en/Training/Supervised_Fine_Tuning.md) * [Standard Supervised Training](./Training/Supervised_Fine_Tuning.md)
* [Enabling FP8 Precision in Training](/docs/en/Training/FP8_Precision.md) * [Enabling FP8 Precision in Training](./Training/FP8_Precision.md)
* [End-to-End Distillation Accelerated Training](/docs/en/Training/Direct_Distill.md) * [End-to-End Distillation Accelerated Training](./Training/Direct_Distill.md)
* [Two-Stage Split Training](/docs/en/Training/Split_Training.md) * [Two-Stage Split Training](./Training/Split_Training.md)
* [Differential LoRA Training](/docs/en/Training/Differential_LoRA.md) * [Differential LoRA Training](./Training/Differential_LoRA.md)
## Section 4: Model Integration ## Section 4: Model Integration
This section introduces how to integrate models into `DiffSynth-Studio` to utilize the framework's basic functions, helping developers provide support for new models in this project or perform inference and training of private models. This section introduces how to integrate models into `DiffSynth-Studio` to utilize the framework's basic functions, helping developers provide support for new models in this project or perform inference and training of private models.
* [Integrating Model Architecture](/docs/en/Developer_Guide/Integrating_Your_Model.md) * [Integrating Model Architecture](./Developer_Guide/Integrating_Your_Model.md)
* [Building a Pipeline](/docs/en/Developer_Guide/Building_a_Pipeline.md) * [Building a Pipeline](./Developer_Guide/Building_a_Pipeline.md)
* [Enabling Fine-Grained VRAM Management](/docs/en/Developer_Guide/Enabling_VRAM_management.md) * [Enabling Fine-Grained VRAM Management](./Developer_Guide/Enabling_VRAM_management.md)
* [Model Training Integration](/docs/en/Developer_Guide/Training_Diffusion_Models.md) * [Model Training Integration](./Developer_Guide/Training_Diffusion_Models.md)
## Section 5: API Reference ## Section 5: API Reference
This section introduces the independent core module `diffsynth.core` in `DiffSynth-Studio`, explaining how internal functions are designed and operate. Developers can use these functional modules in other codebase developments if needed. This section introduces the independent core module `diffsynth.core` in `DiffSynth-Studio`, explaining how internal functions are designed and operate. Developers can use these functional modules in other codebase developments if needed.
* [`diffsynth.core.attention`](/docs/en/API_Reference/core/attention.md): Attention mechanism implementation * [`diffsynth.core.attention`](./API_Reference/core/attention.md): Attention mechanism implementation
* [`diffsynth.core.data`](/docs/en/API_Reference/core/data.md): Data processing operators and general datasets * [`diffsynth.core.data`](./API_Reference/core/data.md): Data processing operators and general datasets
* [`diffsynth.core.gradient`](/docs/en/API_Reference/core/gradient.md): Gradient checkpointing * [`diffsynth.core.gradient`](./API_Reference/core/gradient.md): Gradient checkpointing
* [`diffsynth.core.loader`](/docs/en/API_Reference/core/loader.md): Model download and loading * [`diffsynth.core.loader`](./API_Reference/core/loader.md): Model download and loading
* [`diffsynth.core.vram`](/docs/en/API_Reference/core/vram.md): VRAM management * [`diffsynth.core.vram`](./API_Reference/core/vram.md): VRAM management
## Section 6: Academic Guide ## Section 6: Academic Guide
This section introduces how to use `DiffSynth-Studio` to train new models, helping researchers explore new model technologies. This section introduces how to use `DiffSynth-Studio` to train new models, helping researchers explore new model technologies.
* [Training models from scratch](/docs/en/Research_Tutorial/train_from_scratch.md) * [Training models from scratch](./Research_Tutorial/train_from_scratch.md)
* Inference improvement techniques 【coming soon】 * Inference improvement techniques 【coming soon】
* Designing controllable generation models 【coming soon】 * Designing controllable generation models 【coming soon】
* Creating new training paradigms 【coming soon】 * Creating new training paradigms 【coming soon】
@@ -86,4 +86,4 @@ This section introduces how to use `DiffSynth-Studio` to train new models, helpi
This section summarizes common developer questions. If you encounter issues during usage or development, please refer to this section. If you still cannot resolve the problem, please submit an issue on GitHub. This section summarizes common developer questions. If you encounter issues during usage or development, please refer to this section. If you still cannot resolve the problem, please submit an issue on GitHub.
* [Frequently Asked Questions](/docs/en/QA.md) * [Frequently Asked Questions](./QA.md)

View File

@@ -12,7 +12,7 @@ From UNet [[1]](https://arxiv.org/abs/1505.04597) [[2]](https://arxiv.org/abs/21
* Text tensor (`prompt_embeds`): The encoding of text, generated by the text encoder * Text tensor (`prompt_embeds`): The encoding of text, generated by the text encoder
* Timestep (`timestep`): A scalar used to mark which stage of the Diffusion process we are currently at * Timestep (`timestep`): A scalar used to mark which stage of the Diffusion process we are currently at
The model's output is a tensor with the same shape as the image tensor, representing the denoising direction predicted by the model. For details about Diffusion model theory, please refer to [Basic Principles of Diffusion Models](/docs/en/Training/Understanding_Diffusion_models.md). In this article, we build a DiT model with only 0.1B parameters: `AAADiT`. The model's output is a tensor with the same shape as the image tensor, representing the denoising direction predicted by the model. For details about Diffusion model theory, please refer to [Basic Principles of Diffusion Models](../Training/Understanding_Diffusion_models.md). In this article, we build a DiT model with only 0.1B parameters: `AAADiT`.
<details> <details>
<summary>Model Architecture Code</summary> <summary>Model Architecture Code</summary>
@@ -141,7 +141,7 @@ The architectures of these two models are already integrated in DiffSynth-Studio
## 2. Building Pipeline ## 2. Building Pipeline
We introduced how to build a model Pipeline in the document [Integrating Pipeline](/docs/en/Developer_Guide/Building_a_Pipeline.md). For the model in this article, we also need to build a Pipeline to connect the text encoder, Diffusion model, and VAE encoder-decoder. We introduced how to build a model Pipeline in the document [Integrating Pipeline](../Developer_Guide/Building_a_Pipeline.md). For the model in this article, we also need to build a Pipeline to connect the text encoder, Diffusion model, and VAE encoder-decoder.
<details> <details>
<summary>Pipeline Code</summary> <summary>Pipeline Code</summary>
@@ -328,7 +328,7 @@ def model_fn_aaa(
## 3. Preparing Dataset ## 3. Preparing Dataset
To quickly verify training effectiveness, we use the dataset [Pokemon-First Generation](https://modelscope.cn/datasets/DiffSynth-Studio/pokemon-gen1), which is reproduced from the open-source project [pokemon-dataset-zh](https://github.com/42arch/pokemon-dataset-zh), containing 151 first-generation Pokemon from Bulbasaur to Mew. If you want to use other datasets, please refer to the document [Preparing Datasets](/docs/en/Pipeline_Usage/Model_Training.md#preparing-datasets) and [`diffsynth.core.data`](/docs/en/API_Reference/core/data.md). To quickly verify training effectiveness, we use the dataset [Pokemon-First Generation](https://modelscope.cn/datasets/DiffSynth-Studio/pokemon-gen1), which is reproduced from the open-source project [pokemon-dataset-zh](https://github.com/42arch/pokemon-dataset-zh), containing 151 first-generation Pokemon from Bulbasaur to Mew. If you want to use other datasets, please refer to the document [Preparing Datasets](../Pipeline_Usage/Model_Training.md#preparing-datasets) and [`diffsynth.core.data`](../API_Reference/core/data.md).
```shell ```shell
modelscope download --dataset DiffSynth-Studio/pokemon-gen1 --local_dir ./data modelscope download --dataset DiffSynth-Studio/pokemon-gen1 --local_dir ./data
@@ -336,7 +336,7 @@ modelscope download --dataset DiffSynth-Studio/pokemon-gen1 --local_dir ./data
### 4. Start Training ### 4. Start Training
The training process can be quickly implemented using Pipeline. We have placed the complete code at [/docs/en/Research_Tutorial/train_from_scratch.py](/docs/en/Research_Tutorial/train_from_scratch.py), which can be directly started with `python docs/en/Research_Tutorial/train_from_scratch.py` for single GPU training. The training process can be quickly implemented using Pipeline. We have placed the complete code at [../Research_Tutorial/train_from_scratch.py](../Research_Tutorial/train_from_scratch.py), which can be directly started with `python docs/en/Research_Tutorial/train_from_scratch.py` for single GPU training.
To enable multi-GPU parallel training, please run `accelerate config` to set relevant parameters, then use the command `accelerate launch docs/en/Research_Tutorial/train_from_scratch.py` to start training. To enable multi-GPU parallel training, please run `accelerate config` to set relevant parameters, then use the command `accelerate launch docs/en/Research_Tutorial/train_from_scratch.py` to start training.

View File

@@ -8,8 +8,8 @@ We were unable to identify the original proposer of differential LoRA training,
Assume we have two similar-content images: Image 1 and Image 2. For example, both images contain a car, but Image 1 has fewer details while Image 2 has more details. In differential LoRA training, we perform two-step training: Assume we have two similar-content images: Image 1 and Image 2. For example, both images contain a car, but Image 1 has fewer details while Image 2 has more details. In differential LoRA training, we perform two-step training:
* Train LoRA 1 using Image 1 as training data with [standard supervised training](/docs/en/Training/Supervised_Fine_Tuning.md) * Train LoRA 1 using Image 1 as training data with [standard supervised training](../Training/Supervised_Fine_Tuning.md)
* Train LoRA 2 using Image 2 as training data, after integrating LoRA 1 into the base model, with [standard supervised training](/docs/en/Training/Supervised_Fine_Tuning.md) * Train LoRA 2 using Image 2 as training data, after integrating LoRA 1 into the base model, with [standard supervised training](../Training/Supervised_Fine_Tuning.md)
In the first training step, since there is only one training image, the LoRA model easily overfits. Therefore, after training, LoRA 1 will cause the model to generate Image 1 without hesitation, regardless of the random seed. In the second training step, the LoRA model overfits again. Thus, after training, with the combined effect of LoRA 1 and LoRA 2, the model will generate Image 2 without hesitation. In short: In the first training step, since there is only one training image, the LoRA model easily overfits. Therefore, after training, LoRA 1 will cause the model to generate Image 1 without hesitation, regardless of the random seed. In the second training step, the LoRA model overfits again. Thus, after training, with the combined effect of LoRA 1 and LoRA 2, the model will generate Image 2 without hesitation. In short:

View File

@@ -44,7 +44,7 @@ Click on the model links to go to the model pages and view the model effects.
## Using Distillation Accelerated Training in the Training Framework ## Using Distillation Accelerated Training in the Training Framework
First, you need to generate training data. Please refer to the [Model Inference](/docs/en/Pipeline_Usage/Model_Inference.md) section to write inference code and generate training data with a sufficient number of inference steps. First, you need to generate training data. Please refer to the [Model Inference](../Pipeline_Usage/Model_Inference.md) section to write inference code and generate training data with a sufficient number of inference steps.
Taking Qwen-Image as an example, the following code can generate an image: Taking Qwen-Image as an example, the following code can generate an image:
@@ -67,7 +67,7 @@ image = pipe(prompt, seed=0, num_inference_steps=40)
image.save("image.jpg") image.save("image.jpg")
``` ```
Then, we compile the necessary information into [metadata files](/docs/en/API_Reference/core/data.md#metadata): Then, we compile the necessary information into [metadata files](../API_Reference/core/data.md#metadata):
```csv ```csv
image,prompt,seed,rand_device,num_inference_steps,cfg_scale image,prompt,seed,rand_device,num_inference_steps,cfg_scale
@@ -86,11 +86,11 @@ Then start LoRA distillation accelerated training:
bash examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh bash examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh
``` ```
Please note that in the [training script parameters](/docs/en/Pipeline_Usage/Model_Training.md#script-parameters), the image resolution setting for the dataset should avoid triggering scaling processing. When setting `--height` and `--width` to enable fixed resolution, all training data must be generated with exactly the same width and height. When setting `--max_pixels` to enable dynamic resolution, the value of `--max_pixels` must be greater than or equal to the pixel area of any training image. Please note that in the [training script parameters](../Pipeline_Usage/Model_Training.md#script-parameters), the image resolution setting for the dataset should avoid triggering scaling processing. When setting `--height` and `--width` to enable fixed resolution, all training data must be generated with exactly the same width and height. When setting `--max_pixels` to enable dynamic resolution, the value of `--max_pixels` must be greater than or equal to the pixel area of any training image.
## Framework Design Concept ## Framework Design Concept
Compared to [Standard Supervised Training](/docs/en/Training/Supervised_Fine_Tuning.md), Direct Distillation only differs in the training loss function. The loss function for Direct Distillation is `DirectDistillLoss` in `diffsynth.diffusion.loss`. Compared to [Standard Supervised Training](../Training/Supervised_Fine_Tuning.md), Direct Distillation only differs in the training loss function. The loss function for Direct Distillation is `DirectDistillLoss` in `diffsynth.diffusion.loss`.
## Future Work ## Future Work

View File

@@ -1,8 +1,8 @@
# Enabling FP8 Precision in Training # Enabling FP8 Precision in Training
Although `DiffSynth-Studio` supports [VRAM management](/docs/en/Pipeline_Usage/VRAM_management.md) in model inference, most of the techniques for reducing VRAM usage are not suitable for training. Offloading would cause extremely slow training processes. Although `DiffSynth-Studio` supports [VRAM management](../Pipeline_Usage/VRAM_management.md) in model inference, most of the techniques for reducing VRAM usage are not suitable for training. Offloading would cause extremely slow training processes.
FP8 precision is the only VRAM management strategy that can be enabled during training. However, this framework currently does not support native FP8 precision training. For reasons, see [Q&A: Why doesn't the training framework support native FP8 precision training?](/docs/en/QA.md#why-doesnt-the-training-framework-support-native-fp8-precision-training). It only supports storing models whose parameters are not updated by gradients (models that do not require gradient backpropagation, or whose gradients only update their LoRA) in FP8 precision. FP8 precision is the only VRAM management strategy that can be enabled during training. However, this framework currently does not support native FP8 precision training. For reasons, see [Q&A: Why doesn't the training framework support native FP8 precision training?](../QA.md#why-doesnt-the-training-framework-support-native-fp8-precision-training). It only supports storing models whose parameters are not updated by gradients (models that do not require gradient backpropagation, or whose gradients only update their LoRA) in FP8 precision.
## Enabling FP8 ## Enabling FP8

View File

@@ -8,7 +8,7 @@ This document introduces split training, which can automatically divide the trai
In the training process of most models, a large amount of computation occurs in "preprocessing," i.e., "computations unrelated to the denoising model," including VAE encoding, text encoding, etc. When the corresponding model parameters are fixed, the results of these computations are repetitive. For each data sample, the computational results are identical across multiple epochs. Therefore, we provide a "split training" feature that can automatically analyze and split the training process. In the training process of most models, a large amount of computation occurs in "preprocessing," i.e., "computations unrelated to the denoising model," including VAE encoding, text encoding, etc. When the corresponding model parameters are fixed, the results of these computations are repetitive. For each data sample, the computational results are identical across multiple epochs. Therefore, we provide a "split training" feature that can automatically analyze and split the training process.
For standard supervised training of ordinary text-to-image models, the splitting process is straightforward. It only requires splitting the computation of all [`Pipeline Units`](/docs/en/Developer_Guide/Building_a_Pipeline.md#units) into the first stage, storing the computational results to disk, and then reading these results from disk in the second stage for subsequent computations. However, if gradient backpropagation is required during preprocessing, the situation becomes extremely complex. To address this, we introduced a computational graph splitting algorithm to analyze how to split the computation. For standard supervised training of ordinary text-to-image models, the splitting process is straightforward. It only requires splitting the computation of all [`Pipeline Units`](../Developer_Guide/Building_a_Pipeline.md#units) into the first stage, storing the computational results to disk, and then reading these results from disk in the second stage for subsequent computations. However, if gradient backpropagation is required during preprocessing, the situation becomes extremely complex. To address this, we introduced a computational graph splitting algorithm to analyze how to split the computation.
## Computational Graph Splitting Algorithm ## Computational Graph Splitting Algorithm
@@ -16,7 +16,7 @@ For standard supervised training of ordinary text-to-image models, the splitting
## Using Split Training ## Using Split Training
Split training already supports [Standard Supervised Training](/docs/en/Training/Supervised_Fine_Tuning.md) and [Direct Distillation Training](/docs/en/Training/Direct_Distill.md). The `--task` parameter in the training command controls this. Taking LoRA training of the Qwen-Image model as an example, the pre-split training command is: Split training already supports [Standard Supervised Training](../Training/Supervised_Fine_Tuning.md) and [Direct Distillation Training](../Training/Direct_Distill.md). The `--task` parameter in the training command controls this. Taking LoRA training of the Qwen-Image model as an example, the pre-split training command is:
```shell ```shell
accelerate launch examples/qwen_image/model_training/train.py \ accelerate launch examples/qwen_image/model_training/train.py \

View File

@@ -1,10 +1,10 @@
# Standard Supervised Training # Standard Supervised Training
After understanding the [Basic Principles of Diffusion Models](/docs/en/Training/Understanding_Diffusion_models.md), this document introduces how the framework implements Diffusion model training. This document explains the framework's principles to help developers write new training code. If you want to use our provided default training functions, please refer to [Model Training](/docs/en/Pipeline_Usage/Model_Training.md). After understanding the [Basic Principles of Diffusion Models](../Training/Understanding_Diffusion_models.md), this document introduces how the framework implements Diffusion model training. This document explains the framework's principles to help developers write new training code. If you want to use our provided default training functions, please refer to [Model Training](../Pipeline_Usage/Model_Training.md).
Recalling the model training pseudocode from earlier, when we actually write code, the situation becomes extremely complex. Some models require additional guidance conditions and preprocessing, such as ControlNet; some models require cross-computation with the denoising model, such as VACE; some models require Gradient Checkpointing due to excessive VRAM demands, such as Qwen-Image's DiT. Recalling the model training pseudocode from earlier, when we actually write code, the situation becomes extremely complex. Some models require additional guidance conditions and preprocessing, such as ControlNet; some models require cross-computation with the denoising model, such as VACE; some models require Gradient Checkpointing due to excessive VRAM demands, such as Qwen-Image's DiT.
To achieve strict consistency between inference and training, we abstractly encapsulate components like `Pipeline`, reusing inference code extensively during training. Please refer to [Integrating Pipeline](/docs/en/Developer_Guide/Building_a_Pipeline.md) to understand the design of `Pipeline` components. Next, we'll introduce how the training framework utilizes `Pipeline` components to build training algorithms. To achieve strict consistency between inference and training, we abstractly encapsulate components like `Pipeline`, reusing inference code extensively during training. Please refer to [Integrating Pipeline](../Developer_Guide/Building_a_Pipeline.md) to understand the design of `Pipeline` components. Next, we'll introduce how the training framework utilizes `Pipeline` components to build training algorithms.
## Framework Design Concept ## Framework Design Concept
@@ -48,13 +48,13 @@ In `__init__`, model initialization is required. First load the model, then swit
) )
``` ```
The logic for loading models is basically consistent with inference, supporting loading models from remote and local paths. See [Model Inference](/docs/en/Pipeline_Usage/Model_Inference.md) for details, but please note not to enable [VRAM Management](/docs/en/Pipeline_Usage/VRAM_management.md). The logic for loading models is basically consistent with inference, supporting loading models from remote and local paths. See [Model Inference](../Pipeline_Usage/Model_Inference.md) for details, but please note not to enable [VRAM Management](../Pipeline_Usage/VRAM_management.md).
`switch_pipe_to_training_mode` can switch the model to training mode. See `switch_pipe_to_training_mode` for details. `switch_pipe_to_training_mode` can switch the model to training mode. See `switch_pipe_to_training_mode` for details.
### `forward` ### `forward`
In `forward`, the loss function value needs to be calculated. First perform preprocessing, then compute the loss function through the `Pipeline`'s [`model_fn`](/docs/en/Developer_Guide/Building_a_Pipeline.md#model_fn). In `forward`, the loss function value needs to be calculated. First perform preprocessing, then compute the loss function through the `Pipeline`'s [`model_fn`](../Developer_Guide/Building_a_Pipeline.md#model_fn).
```python ```python
def forward(self, data): def forward(self, data):
@@ -90,7 +90,7 @@ The loss function calculation reuses `FlowMatchSFTLoss` from `diffsynth.diffusio
The training framework requires other modules, including: The training framework requires other modules, including:
* accelerator: Training launcher provided by `accelerate`, see [`accelerate`](https://huggingface.co/docs/accelerate/index) for details * accelerator: Training launcher provided by `accelerate`, see [`accelerate`](https://huggingface.co/docs/accelerate/index) for details
* dataset: Generic dataset, see [`diffsynth.core.data`](/docs/en/API_Reference/core/data.md) for details * dataset: Generic dataset, see [`diffsynth.core.data`](../API_Reference/core/data.md) for details
* model_logger: Model logger, see `diffsynth.diffusion.logger` for details * model_logger: Model logger, see `diffsynth.diffusion.logger` for details
```python ```python

View File

@@ -138,4 +138,4 @@ The denoising model is the true essence of Diffusion models, with diverse model
## How does this project encapsulate and implement model training? ## How does this project encapsulate and implement model training?
Please read the next document: [Standard Supervised Training](/docs/en/Training/Supervised_Fine_Tuning.md) Please read the next document: [Standard Supervised Training](../Training/Supervised_Fine_Tuning.md)

123
docs/en/conf.py Normal file
View File

@@ -0,0 +1,123 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
# import sphinx_book_theme
sys.path.insert(0, os.path.abspath('../../'))
# -- Project information -----------------------------------------------------
project = 'diffsynth'
copyright = '2022-2025, Alibaba ModelScope'
author = 'ModelScope Authors'
version_file = '../../diffsynth/version.py'
html_theme = 'sphinx_rtd_theme'
language = 'en'
def get_version():
with open(version_file, 'r', encoding='utf-8') as f:
exec(compile(f.read(), version_file, 'exec'))
return locals()['__version__']
# The full version, including alpha/beta/rc tags
version = get_version()
release = version
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.napoleon',
'sphinx.ext.autosummary',
'sphinx.ext.autodoc',
'sphinx.ext.viewcode',
'sphinx_markdown_tables',
'sphinx_copybutton',
'myst_parser',
]
# build the templated autosummary files
autosummary_generate = True
numpydoc_show_class_members = False
# Enable overriding of function signatures in the first line of the docstring.
autodoc_docstring_signature = True
# Disable docstring inheritance
autodoc_inherit_docstrings = False
# Show type hints in the description
autodoc_typehints = 'description'
# Add parameter types if the parameter is documented in the docstring
autodoc_typehints_description_target = 'documented_params'
autodoc_default_options = {
'member-order': 'bysource',
}
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = ['.rst', '.md']
# The master toctree document.
root_doc = 'index'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['build']
# A list of glob-style patterns [1] that are used to find source files.
# They are matched against the source file names relative to the source directory,
# using slashes as directory separators on all platforms.
# The default is **, meaning that all files are recursively included from the source directory.
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
# html_theme = 'sphinx_book_theme'
# html_theme_path = [sphinx_book_theme.get_html_theme_path()]
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# html_css_files = ['css/readthedocs.css']
# -- Options for HTMLHelp output ---------------------------------------------
# Output file base name for HTML help builder.
# -- Extension configuration -------------------------------------------------
# Ignore >>> when copying code
copybutton_prompt_text = r'>>> |\.\.\. '
copybutton_prompt_is_regexp = True
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'https://docs.python.org/': None}
myst_enable_extensions = [
'amsmath',
'dollarmath',
'colon_fence',
]

77
docs/en/index.rst Normal file
View File

@@ -0,0 +1,77 @@
Welcome to DiffSynth-Studio's Documentation
==========================================
.. toctree::
:maxdepth: 2
:caption: Documentation Introduction
README
.. toctree::
:maxdepth: 2
:caption: Getting Started
Pipeline_Usage/Setup
Pipeline_Usage/Model_Inference
Pipeline_Usage/VRAM_management
Pipeline_Usage/Model_Training
Pipeline_Usage/Environment_Variables
Pipeline_Usage/GPU_support
.. toctree::
:maxdepth: 2
:caption: Model Details
Model_Details/FLUX
Model_Details/Wan
Model_Details/Qwen-Image
Model_Details/FLUX2
Model_Details/Z-Image
.. toctree::
:maxdepth: 2
:caption: Training Framework
Training/Understanding_Diffusion_models
Training/Supervised_Fine_Tuning
Training/FP8_Precision
Training/Direct_Distill
Training/Split_Training
Training/Differential_LoRA
.. toctree::
:maxdepth: 2
:caption: Model Integration
Developer_Guide/Integrating_Your_Model
Developer_Guide/Building_a_Pipeline
Developer_Guide/Enabling_VRAM_management
Developer_Guide/Training_Diffusion_Models
.. toctree::
:maxdepth: 2
:caption: API Reference
API_Reference/core/attention
API_Reference/core/data
API_Reference/core/gradient
API_Reference/core/loader
API_Reference/core/vram
.. toctree::
:maxdepth: 2
:caption: Research Guide
Research_Tutorial/train_from_scratch
.. toctree::
:maxdepth: 2
:caption: FAQ
QA
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

9
docs/requirements.txt Normal file
View File

@@ -0,0 +1,9 @@
docutils>=0.16.0
myst_parser
recommonmark
sphinx>=5.3.0
sphinx-book-theme
sphinx-copybutton
sphinx-rtd-theme
sphinx_markdown_tables
sphinxcontrib-mermaid

28
docs/zh/.readthedocs.yaml Normal file
View File

@@ -0,0 +1,28 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.10"
# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/zh/conf.py
# Optionally build your docs in additional formats such as PDF and ePub
# formats:
# - pdf
# - epub
# Optional but recommended, declare the Python requirements required
# to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt

View File

@@ -1,6 +1,6 @@
# `diffsynth.core.attention`: 注意力机制实现 # `diffsynth.core.attention`: 注意力机制实现
`diffsynth.core.attention` 提供了注意力机制实现的路由机制,根据 `Python` 环境中的可用包和[环境变量](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation)自动选择高效的注意力机制实现。 `diffsynth.core.attention` 提供了注意力机制实现的路由机制,根据 `Python` 环境中的可用包和[环境变量](../../Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation)自动选择高效的注意力机制实现。
## 注意力机制 ## 注意力机制
@@ -46,7 +46,7 @@ output_1 = attention(query, key, value)
* xFormers[GitHub](https://github.com/facebookresearch/xformers)、[文档](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) * xFormers[GitHub](https://github.com/facebookresearch/xformers)、[文档](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops)
* PyTorch[GitHub](https://github.com/pytorch/pytorch)、[文档](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) * PyTorch[GitHub](https://github.com/pytorch/pytorch)、[文档](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html)
如需调用除 `PyTorch` 外的其他注意力实现,请按照其 GitHub 页面的指引安装对应的包。`DiffSynth-Studio` 会自动根据 Python 环境中的可用包路由到对应的实现上,也可通过[环境变量](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation)控制。 如需调用除 `PyTorch` 外的其他注意力实现,请按照其 GitHub 页面的指引安装对应的包。`DiffSynth-Studio` 会自动根据 Python 环境中的可用包路由到对应的实现上,也可通过[环境变量](../../Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation)控制。
```python ```python
from diffsynth.core.attention import attention_forward from diffsynth.core.attention import attention_forward

View File

@@ -8,9 +8,9 @@
### 从远程下载并加载模型 ### 从远程下载并加载模型
以模型[DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny) 为例,在 `ModelConfig` 中填写 `model_id``origin_file_pattern` 后即可自动下载模型。默认下载到 `./models` 路径,该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。 以模型[DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny) 为例,在 `ModelConfig` 中填写 `model_id``origin_file_pattern` 后即可自动下载模型。默认下载到 `./models` 路径,该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](../../Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。
默认情况下,即使模型已经下载完毕,程序仍会向远程查询是否有遗漏文件,如果要完全关闭远程请求,请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True` 默认情况下,即使模型已经下载完毕,程序仍会向远程查询是否有遗漏文件,如果要完全关闭远程请求,请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](../../Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True`
```python ```python
from diffsynth.core import ModelConfig from diffsynth.core import ModelConfig
@@ -51,7 +51,7 @@ config = ModelConfig(path=[
### 显存管理配置 ### 显存管理配置
`ModelConfig` 也包含了显存管理配置信息,详见[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md#更多使用方式)。 `ModelConfig` 也包含了显存管理配置信息,详见[显存管理](../../Pipeline_Usage/VRAM_management.md#更多使用方式)。
## 模型文件加载 ## 模型文件加载
@@ -103,11 +103,11 @@ print(hash_model_file([
模型哈希值只与模型文件中 state dict 的 keys 和 tensor shape 有关,与模型参数的数值、文件保存时间等信息无关。在计算 `.safetensors` 格式文件的模型哈希值时,`hash_model_file` 是几乎瞬间完成的,无需读取模型的参数;但在计算 `.bin``.pth``.ckpt` 等二进制文件的模型哈希值时,则需要读取全部模型参数,因此**我们不建议开发者继续使用这些格式的文件。** 模型哈希值只与模型文件中 state dict 的 keys 和 tensor shape 有关,与模型参数的数值、文件保存时间等信息无关。在计算 `.safetensors` 格式文件的模型哈希值时,`hash_model_file` 是几乎瞬间完成的,无需读取模型的参数;但在计算 `.bin``.pth``.ckpt` 等二进制文件的模型哈希值时,则需要读取全部模型参数,因此**我们不建议开发者继续使用这些格式的文件。**
通过[编写模型 Config](/docs/zh/Developer_Guide/Integrating_Your_Model.md#step-3-编写模型-config)并将模型哈希值等信息填入 `diffsynth/configs/model_configs.py`,开发者可以让 `DiffSynth-Studio` 自动识别模型类型并加载。 通过[编写模型 Config](../../Developer_Guide/Integrating_Your_Model.md#step-3-编写模型-config)并将模型哈希值等信息填入 `diffsynth/configs/model_configs.py`,开发者可以让 `DiffSynth-Studio` 自动识别模型类型并加载。
## 模型加载 ## 模型加载
`load_model``diffsynth.core.loader` 中加载模型的外部入口,它会调用 [skip_model_initialization](/docs/zh/API_Reference/core/vram.md#跳过模型参数初始化) 跳过模型参数初始化。如果启用了 [Disk Offload](/docs/zh/Pipeline_Usage/VRAM_management.md#disk-offload),则调用 [DiskMap](/docs/zh/API_Reference/core/vram.md#state-dict-硬盘映射) 进行惰性加载;如果没有启用 Disk Offload则调用 [load_state_dict](#模型文件加载) 加载模型参数。如果需要的话,还会调用 [state dict converter](/docs/zh/Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换) 进行模型格式转换。最后调用 `model.eval()` 将其切换到推理模式。 `load_model``diffsynth.core.loader` 中加载模型的外部入口,它会调用 [skip_model_initialization](../../API_Reference/core/vram.md#跳过模型参数初始化) 跳过模型参数初始化。如果启用了 [Disk Offload](../../Pipeline_Usage/VRAM_management.md#disk-offload),则调用 [DiskMap](../../API_Reference/core/vram.md#state-dict-硬盘映射) 进行惰性加载;如果没有启用 Disk Offload则调用 [load_state_dict](#模型文件加载) 加载模型参数。如果需要的话,还会调用 [state dict converter](../../Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换) 进行模型格式转换。最后调用 `model.eval()` 将其切换到推理模式。
以下是一个启用了 Disk Offload 的使用案例: 以下是一个启用了 Disk Offload 的使用案例:

View File

@@ -31,7 +31,7 @@ state_dict = load_state_dict(path, device="cpu")
model.load_state_dict(state_dict, assign=True) model.load_state_dict(state_dict, assign=True)
``` ```
`DiffSynth-Studio` 中,所有预训练模型都遵循这一加载逻辑。开发者在[接入模型](/docs/zh/Developer_Guide/Integrating_Your_Model.md)完毕后即可直接以这种方式快速加载模型。 `DiffSynth-Studio` 中,所有预训练模型都遵循这一加载逻辑。开发者在[接入模型](../../Developer_Guide/Integrating_Your_Model.md)完毕后即可直接以这种方式快速加载模型。
## State Dict 硬盘映射 ## State Dict 硬盘映射
@@ -57,10 +57,10 @@ state_dict = DiskMap(path, device="cpu") # Fast
print(state_dict["img_in.weight"]) print(state_dict["img_in.weight"])
``` ```
`DiskMap``DiffSynth-Studio` 中 Disk Offload 的基本组件,开发者在[配置细粒度显存管理方案](/docs/zh/Developer_Guide/Enabling_VRAM_management.md)后即可直接启用 Disk Offload。 `DiskMap``DiffSynth-Studio` 中 Disk Offload 的基本组件,开发者在[配置细粒度显存管理方案](../../Developer_Guide/Enabling_VRAM_management.md)后即可直接启用 Disk Offload。
`DiskMap` 是利用 `.safetensors` 文件的特性实现的功能,因此在使用 `.bin``.pth``.ckpt` 等二进制文件时,模型的参数是全量加载的,这也导致 Disk Offload 不支持这些格式的文件。**我们不建议开发者继续使用这些格式的文件。** `DiskMap` 是利用 `.safetensors` 文件的特性实现的功能,因此在使用 `.bin``.pth``.ckpt` 等二进制文件时,模型的参数是全量加载的,这也导致 Disk Offload 不支持这些格式的文件。**我们不建议开发者继续使用这些格式的文件。**
## 显存管理可替换模块 ## 显存管理可替换模块
在启用 `DiffSynth-Studio` 的显存管理后,模型内部的模块会被替换为 `diffsynth.core.vram.layers` 中的可替换模块,其使用方式详见[细粒度显存管理方案](/docs/zh/Developer_Guide/Enabling_VRAM_management.md#编写细粒度显存管理方案)。 在启用 `DiffSynth-Studio` 的显存管理后,模型内部的模块会被替换为 `diffsynth.core.vram.layers` 中的可替换模块,其使用方式详见[细粒度显存管理方案](../../Developer_Guide/Enabling_VRAM_management.md#编写细粒度显存管理方案)。

View File

@@ -1,6 +1,6 @@
# 接入 Pipeline # 接入 Pipeline
在[将 Pipeline 所需的模型接入](/docs/zh/Developer_Guide/Integrating_Your_Model.md)之后,还需构建 `Pipeline` 用于模型推理,本文档提供 `Pipeline` 构建的标准化流程,开发者也可参考现有的 `Pipeline` 进行构建。 在[将 Pipeline 所需的模型接入](../Developer_Guide/Integrating_Your_Model.md)之后,还需构建 `Pipeline` 用于模型推理,本文档提供 `Pipeline` 构建的标准化流程,开发者也可参考现有的 `Pipeline` 进行构建。
`Pipeline` 的实现位于 `diffsynth/pipelines`,每个 `Pipeline` 包含以下必要的关键组件: `Pipeline` 的实现位于 `diffsynth/pipelines`,每个 `Pipeline` 包含以下必要的关键组件:
@@ -79,7 +79,7 @@ class NewDiffSynthPipeline(BasePipeline):
return pipe return pipe
``` ```
开发者需要实现其中获取模型的逻辑,对应的模型名称即为[模型接入时填写的模型 Config](/docs/zh/Developer_Guide/Integrating_Your_Model.md#step-3-编写模型-config) 中的 `"model_name"` 开发者需要实现其中获取模型的逻辑,对应的模型名称即为[模型接入时填写的模型 Config](../Developer_Guide/Integrating_Your_Model.md#step-3-编写模型-config) 中的 `"model_name"`
部分模型还需要加载 `tokenizer`,可根据需要在 `from_pretrained` 上添加额外的 `tokenizer_config` 参数并在获取模型后实现这部分。 部分模型还需要加载 `tokenizer`,可根据需要在 `from_pretrained` 上添加额外的 `tokenizer_config` 参数并在获取模型后实现这部分。

View File

@@ -1,6 +1,6 @@
# 细粒度显存管理方案 # 细粒度显存管理方案
本文档介绍如何为模型编写合理的细粒度显存管理方案,以及如何将 `DiffSynth-Studio` 中的显存管理功能用于外部的其他代码库,在阅读本文档前,请先阅读文档[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)。 本文档介绍如何为模型编写合理的细粒度显存管理方案,以及如何将 `DiffSynth-Studio` 中的显存管理功能用于外部的其他代码库,在阅读本文档前,请先阅读文档[显存管理](../Pipeline_Usage/VRAM_management.md)。
## 20B 模型需要多少显存? ## 20B 模型需要多少显存?
@@ -124,7 +124,7 @@ module_map={
} }
``` ```
此外,还需要提供 `vram_config``vram_limit`,这两个参数在[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md#更多使用方式)中已有介绍。 此外,还需要提供 `vram_config``vram_limit`,这两个参数在[显存管理](../Pipeline_Usage/VRAM_management.md#更多使用方式)中已有介绍。
调用 `enable_vram_management` 即可启用显存管理,注意此时模型加载时的 `device``cpu`,与 `offload_device` 一致: 调用 `enable_vram_management` 即可启用显存管理,注意此时模型加载时的 `device``cpu`,与 `offload_device` 一致:
@@ -171,7 +171,7 @@ with torch.no_grad():
## Disk Offload ## Disk Offload
[Disk Offload](/docs/zh/Pipeline_Usage/VRAM_management.md#disk-offload) 是特殊的显存管理方案需在模型加载过程中启用而非模型加载完毕后。通常在以上代码能够顺利运行的前提下Disk Offload 可以直接启用: [Disk Offload](../Pipeline_Usage/VRAM_management.md#disk-offload) 是特殊的显存管理方案需在模型加载过程中启用而非模型加载完毕后。通常在以上代码能够顺利运行的前提下Disk Offload 可以直接启用:
```python ```python
from diffsynth.core import load_model, enable_vram_management, AutoWrappedLinear, AutoWrappedModule from diffsynth.core import load_model, enable_vram_management, AutoWrappedLinear, AutoWrappedModule
@@ -212,7 +212,7 @@ with torch.no_grad():
output = model(**inputs) output = model(**inputs)
``` ```
Disk Offload 是极为特殊的显存管理方案,只支持 `.safetensors` 格式文件,不支持 `.bin``.pth``.ckpt` 等二进制文件,不支持带 Tensor reshape 的 [state dict converter](/docs/zh/Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换)。 Disk Offload 是极为特殊的显存管理方案,只支持 `.safetensors` 格式文件,不支持 `.bin``.pth``.ckpt` 等二进制文件,不支持带 Tensor reshape 的 [state dict converter](../Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换)。
如果出现非 Disk Offload 能正常运行但 Disk Offload 不能正常运行的情况,请在 GitHub 上给我们提 issue。 如果出现非 Disk Offload 能正常运行但 Disk Offload 不能正常运行的情况,请在 GitHub 上给我们提 issue。

View File

@@ -183,4 +183,4 @@ Loaded model: {
## Step 5: 编写模型显存管理方案 ## Step 5: 编写模型显存管理方案
`DiffSynth-Studio` 支持复杂的显存管理,详见[启用显存管理](/docs/zh/Developer_Guide/Enabling_VRAM_management.md)。 `DiffSynth-Studio` 支持复杂的显存管理,详见[启用显存管理](../Developer_Guide/Enabling_VRAM_management.md)。

View File

@@ -1,6 +1,6 @@
# 接入模型训练 # 接入模型训练
在[接入模型](/docs/zh/Developer_Guide/Integrating_Your_Model.md)并[实现 Pipeline](/docs/zh/Developer_Guide/Building_a_Pipeline.md)后,接下来接入模型训练功能。 在[接入模型](../Developer_Guide/Integrating_Your_Model.md)并[实现 Pipeline](../Developer_Guide/Building_a_Pipeline.md)后,接下来接入模型训练功能。
## 训推一致的 Pipeline 改造 ## 训推一致的 Pipeline 改造

20
docs/zh/Makefile Normal file
View File

@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

View File

@@ -14,7 +14,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
更多关于安装的信息,请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。 更多关于安装的信息,请参考[安装依赖](../Pipeline_Usage/Setup.md)。
## 快速开始 ## 快速开始
@@ -98,14 +98,14 @@ graph LR;
特殊训练脚本: 特殊训练脚本:
* 差分 LoRA 训练:[doc](/docs/zh/Training/Differential_LoRA.md)、[code](/examples/flux/model_training/special/differential_training/) * 差分 LoRA 训练:[doc](../Training/Differential_LoRA.md)、[code](/examples/flux/model_training/special/differential_training/)
* FP8 精度训练:[doc](/docs/zh/Training/FP8_Precision.md)、[code](/examples/flux/model_training/special/fp8_training/) * FP8 精度训练:[doc](../Training/FP8_Precision.md)、[code](/examples/flux/model_training/special/fp8_training/)
* 两阶段拆分训练:[doc](/docs/zh/Training/Split_Training.md)、[code](/examples/flux/model_training/special/split_training/) * 两阶段拆分训练:[doc](../Training/Split_Training.md)、[code](/examples/flux/model_training/special/split_training/)
* 端到端直接蒸馏:[doc](/docs/zh/Training/Direct_Distill.md)、[code](/examples/flux/model_training/lora/FLUX.1-dev-Distill-LoRA.sh) * 端到端直接蒸馏:[doc](../Training/Direct_Distill.md)、[code](/examples/flux/model_training/lora/FLUX.1-dev-Distill-LoRA.sh)
## 模型推理 ## 模型推理
模型通过 `FluxImagePipeline.from_pretrained` 加载,详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。 模型通过 `FluxImagePipeline.from_pretrained` 加载,详见[加载模型](../Pipeline_Usage/Model_Inference.md#加载模型)。
`FluxImagePipeline` 推理的输入参数包括: `FluxImagePipeline` 推理的输入参数包括:
@@ -143,7 +143,7 @@ graph LR;
* `flex_control_stop`: Flex 模型的控制停止时间步。 * `flex_control_stop`: Flex 模型的控制停止时间步。
* `nexus_gen_reference_image`: Nexus-Gen 模型的参考图像。 * `nexus_gen_reference_image`: Nexus-Gen 模型的参考图像。
如果显存不足,请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文"模型总览"中的表格。 如果显存不足,请开启[显存管理](../Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文"模型总览"中的表格。
## 模型训练 ## 模型训练
@@ -198,4 +198,4 @@ FLUX 系列模型统一通过 [`examples/flux/model_training/train.py`](/example
modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
``` ```
我们为每个模型编写了推荐的训练脚本,请参考前文"模型总览"中的表格。关于如何编写模型训练脚本,请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](/docs/Training/)。 我们为每个模型编写了推荐的训练脚本,请参考前文"模型总览"中的表格。关于如何编写模型训练脚本,请参考[模型训练](../Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](/docs/Training/)。

View File

@@ -21,7 +21,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
更多关于安装的信息,请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。 更多关于安装的信息,请参考[安装依赖](../Pipeline_Usage/Setup.md)。
## 快速开始 ## 快速开始
@@ -69,14 +69,14 @@ image.save("image.jpg")
特殊训练脚本: 特殊训练脚本:
* 差分 LoRA 训练:[doc](/docs/zh/Training/Differential_LoRA.md) * 差分 LoRA 训练:[doc](../Training/Differential_LoRA.md)
* FP8 精度训练:[doc](/docs/zh/Training/FP8_Precision.md) * FP8 精度训练:[doc](../Training/FP8_Precision.md)
* 两阶段拆分训练:[doc](/docs/zh/Training/Split_Training.md) * 两阶段拆分训练:[doc](../Training/Split_Training.md)
* 端到端直接蒸馏:[doc](/docs/zh/Training/Direct_Distill.md) * 端到端直接蒸馏:[doc](../Training/Direct_Distill.md)
## 模型推理 ## 模型推理
模型通过 `Flux2ImagePipeline.from_pretrained` 加载,详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。 模型通过 `Flux2ImagePipeline.from_pretrained` 加载,详见[加载模型](../Pipeline_Usage/Model_Inference.md#加载模型)。
`Flux2ImagePipeline` 推理的输入参数包括: `Flux2ImagePipeline` 推理的输入参数包括:
@@ -95,7 +95,7 @@ image.save("image.jpg")
* `tile_stride`: VAE 编解码阶段的分块步长,默认为 64仅在 `tiled=True` 时生效,需保证其数值小于或等于 `tile_size` * `tile_stride`: VAE 编解码阶段的分块步长,默认为 64仅在 `tiled=True` 时生效,需保证其数值小于或等于 `tile_size`
* `progress_bar_cmd`: 进度条,默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。 * `progress_bar_cmd`: 进度条,默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。
如果显存不足,请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文"模型总览"中的表格。 如果显存不足,请开启[显存管理](../Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文"模型总览"中的表格。
## 模型训练 ## 模型训练
@@ -148,4 +148,4 @@ FLUX.2 系列模型统一通过 [`examples/flux2/model_training/train.py`](/exam
modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
``` ```
我们为每个模型编写了推荐的训练脚本,请参考前文"模型总览"中的表格。关于如何编写模型训练脚本,请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](/docs/Training/)。 我们为每个模型编写了推荐的训练脚本,请参考前文"模型总览"中的表格。关于如何编写模型训练脚本,请参考[模型训练](../Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](/docs/Training/)。

View File

@@ -12,7 +12,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
更多关于安装的信息,请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。 更多关于安装的信息,请参考[安装依赖](../Pipeline_Usage/Setup.md)。
## 快速开始 ## 快速开始
@@ -83,7 +83,7 @@ write_video_audio_ltx2(
## 模型推理 ## 模型推理
模型通过 `LTX2AudioVideoPipeline.from_pretrained` 加载,详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。 模型通过 `LTX2AudioVideoPipeline.from_pretrained` 加载,详见[加载模型](../Pipeline_Usage/Model_Inference.md#加载模型)。
`LTX2AudioVideoPipeline` 推理的输入参数包括: `LTX2AudioVideoPipeline` 推理的输入参数包括:
@@ -109,7 +109,7 @@ write_video_audio_ltx2(
* `use_distilled_pipeline`: 是否使用蒸馏管道,默认为 `False` * `use_distilled_pipeline`: 是否使用蒸馏管道,默认为 `False`
* `progress_bar_cmd`: 进度条,默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。 * `progress_bar_cmd`: 进度条,默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。
如果显存不足,请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文"支持的推理脚本"中的表格。 如果显存不足,请开启[显存管理](../Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文"支持的推理脚本"中的表格。
## 模型训练 ## 模型训练

View File

@@ -2,7 +2,7 @@
## Qwen-Image ## Qwen-Image
文档:[./Qwen-Image.md](/docs/zh/Model_Details/Qwen-Image.md) 文档:[./Qwen-Image.md](../Model_Details/Qwen-Image.md)
<details> <details>
@@ -85,7 +85,7 @@ graph LR;
## FLUX 系列 ## FLUX 系列
文档:[./FLUX.md](/docs/zh/Model_Details/FLUX.md) 文档:[./FLUX.md](../Model_Details/FLUX.md)
<details> <details>
@@ -166,7 +166,7 @@ graph LR;
## Wan 系列 ## Wan 系列
文档:[./Wan.md](/docs/zh/Model_Details/Wan.md) 文档:[./Wan.md](../Model_Details/Wan.md)
<details> <details>

View File

@@ -14,7 +14,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
更多关于安装的信息,请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。 更多关于安装的信息,请参考[安装依赖](../Pipeline_Usage/Setup.md)。
## 快速开始 ## 快速开始
@@ -102,10 +102,10 @@ graph LR;
特殊训练脚本: 特殊训练脚本:
* 差分 LoRA 训练:[doc](/docs/zh/Training/Differential_LoRA.md)、[code](/examples/qwen_image/model_training/special/differential_training/) * 差分 LoRA 训练:[doc](../Training/Differential_LoRA.md)、[code](/examples/qwen_image/model_training/special/differential_training/)
* FP8 精度训练:[doc](/docs/zh/Training/FP8_Precision.md)、[code](/examples/qwen_image/model_training/special/fp8_training/) * FP8 精度训练:[doc](../Training/FP8_Precision.md)、[code](/examples/qwen_image/model_training/special/fp8_training/)
* 两阶段拆分训练:[doc](/docs/zh/Training/Split_Training.md)、[code](/examples/qwen_image/model_training/special/split_training/) * 两阶段拆分训练:[doc](../Training/Split_Training.md)、[code](/examples/qwen_image/model_training/special/split_training/)
* 端到端直接蒸馏:[doc](/docs/zh/Training/Direct_Distill.md)、[code](/examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh) * 端到端直接蒸馏:[doc](../Training/Direct_Distill.md)、[code](/examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh)
DeepSpeed ZeRO 3 训练Qwen-Image 系列模型支持 DeepSpeed ZeRO 3 训练,将模型拆分到多个 GPU 上,以 Qwen-Image 模型的全量训练为例,需修改: DeepSpeed ZeRO 3 训练Qwen-Image 系列模型支持 DeepSpeed ZeRO 3 训练,将模型拆分到多个 GPU 上,以 Qwen-Image 模型的全量训练为例,需修改:
@@ -114,7 +114,7 @@ DeepSpeed ZeRO 3 训练Qwen-Image 系列模型支持 DeepSpeed ZeRO 3 训练
## 模型推理 ## 模型推理
模型通过 `QwenImagePipeline.from_pretrained` 加载,详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。 模型通过 `QwenImagePipeline.from_pretrained` 加载,详见[加载模型](../Pipeline_Usage/Model_Inference.md#加载模型)。
`QwenImagePipeline` 推理的输入参数包括: `QwenImagePipeline` 推理的输入参数包括:
@@ -145,7 +145,7 @@ DeepSpeed ZeRO 3 训练Qwen-Image 系列模型支持 DeepSpeed ZeRO 3 训练
* `tile_stride`: VAE 编解码阶段的分块步长,默认为 64仅在 `tiled=True` 时生效,需保证其数值小于或等于 `tile_size` * `tile_stride`: VAE 编解码阶段的分块步长,默认为 64仅在 `tiled=True` 时生效,需保证其数值小于或等于 `tile_size`
* `progress_bar_cmd`: 进度条,默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。 * `progress_bar_cmd`: 进度条,默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。
如果显存不足,请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文“模型总览”中的表格。 如果显存不足,请开启[显存管理](../Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文“模型总览”中的表格。
## 模型训练 ## 模型训练
@@ -199,4 +199,4 @@ Qwen-Image 系列模型统一通过 [`examples/qwen_image/model_training/train.p
modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
``` ```
我们为每个模型编写了推荐的训练脚本,请参考前文“模型总览”中的表格。关于如何编写模型训练脚本,请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](/docs/Training/)。 我们为每个模型编写了推荐的训练脚本,请参考前文“模型总览”中的表格。关于如何编写模型训练脚本,请参考[模型训练](../Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](/docs/Training/)。

View File

@@ -14,7 +14,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
更多关于安装的信息,请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。 更多关于安装的信息,请参考[安装依赖](../Pipeline_Usage/Setup.md)。
## 快速开始 ## 快速开始
@@ -139,9 +139,9 @@ graph LR;
|[PAI/Wan2.2-Fun-A14B-Control](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control)|`control_video`, `reference_image`|[code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control.py)|[code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control.sh)|[code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control.py)|[code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control.sh)|[code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control.py)| |[PAI/Wan2.2-Fun-A14B-Control](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control)|`control_video`, `reference_image`|[code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control.py)|[code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control.sh)|[code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control.py)|[code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control.sh)|[code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control.py)|
|[PAI/Wan2.2-Fun-A14B-Control-Camera](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control-Camera)|`control_camera_video`, `input_image`|[code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control-Camera.py)|[code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control-Camera.sh)|[code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control-Camera.py)|[code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control-Camera.sh)|[code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control-Camera.py)| |[PAI/Wan2.2-Fun-A14B-Control-Camera](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control-Camera)|`control_camera_video`, `input_image`|[code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control-Camera.py)|[code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control-Camera.sh)|[code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control-Camera.py)|[code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control-Camera.sh)|[code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control-Camera.py)|
* FP8 精度训练:[doc](/docs/zh/Training/FP8_Precision.md)、[code](/examples/wanvideo/model_training/special/fp8_training/) * FP8 精度训练:[doc](../Training/FP8_Precision.md)、[code](/examples/wanvideo/model_training/special/fp8_training/)
* 两阶段拆分训练:[doc](/docs/zh/Training/Split_Training.md)、[code](/examples/wanvideo/model_training/special/split_training/) * 两阶段拆分训练:[doc](../Training/Split_Training.md)、[code](/examples/wanvideo/model_training/special/split_training/)
* 端到端直接蒸馏:[doc](/docs/zh/Training/Direct_Distill.md)、[code](/examples/wanvideo/model_training/special/direct_distill/) * 端到端直接蒸馏:[doc](../Training/Direct_Distill.md)、[code](/examples/wanvideo/model_training/special/direct_distill/)
DeepSpeed ZeRO 3 训练Wan 系列模型支持 DeepSpeed ZeRO 3 训练,将模型拆分到多个 GPU 上,以 Wan2.1-T2V-14B 模型的全量训练为例,需修改: DeepSpeed ZeRO 3 训练Wan 系列模型支持 DeepSpeed ZeRO 3 训练,将模型拆分到多个 GPU 上,以 Wan2.1-T2V-14B 模型的全量训练为例,需修改:
@@ -150,7 +150,7 @@ DeepSpeed ZeRO 3 训练Wan 系列模型支持 DeepSpeed ZeRO 3 训练,将
## 模型推理 ## 模型推理
模型通过 `WanVideoPipeline.from_pretrained` 加载,详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。 模型通过 `WanVideoPipeline.from_pretrained` 加载,详见[加载模型](../Pipeline_Usage/Model_Inference.md#加载模型)。
`WanVideoPipeline` 推理的输入参数包括: `WanVideoPipeline` 推理的输入参数包括:
@@ -200,7 +200,7 @@ DeepSpeed ZeRO 3 训练Wan 系列模型支持 DeepSpeed ZeRO 3 训练,将
* `tea_cache_model_id`: TeaCache 使用的模型 ID。 * `tea_cache_model_id`: TeaCache 使用的模型 ID。
* `progress_bar_cmd`: 进度条,默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。 * `progress_bar_cmd`: 进度条,默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。
如果显存不足,请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文"模型总览"中的表格。 如果显存不足,请开启[显存管理](../Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文"模型总览"中的表格。
## 模型训练 ## 模型训练
@@ -255,4 +255,4 @@ Wan 系列模型统一通过 [`examples/wanvideo/model_training/train.py`](/exam
modelscope download --dataset DiffSynth-Studio/example_video_dataset --local_dir ./data/example_video_dataset modelscope download --dataset DiffSynth-Studio/example_video_dataset --local_dir ./data/example_video_dataset
``` ```
我们为每个模型编写了推荐的训练脚本,请参考前文"模型总览"中的表格。关于如何编写模型训练脚本,请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](/docs/Training/)。 我们为每个模型编写了推荐的训练脚本,请参考前文"模型总览"中的表格。关于如何编写模型训练脚本,请参考[模型训练](../Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](/docs/Training/)。

View File

@@ -12,7 +12,7 @@ cd DiffSynth-Studio
pip install -e . pip install -e .
``` ```
更多关于安装的信息,请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。 更多关于安装的信息,请参考[安装依赖](../Pipeline_Usage/Setup.md)。
## 快速开始 ## 快速开始
@@ -61,12 +61,12 @@ image.save("image.jpg")
特殊训练脚本: 特殊训练脚本:
* 差分 LoRA 训练:[doc](/docs/zh/Training/Differential_LoRA.md)、[code](/examples/z_image/model_training/special/differential_training/) * 差分 LoRA 训练:[doc](../Training/Differential_LoRA.md)、[code](/examples/z_image/model_training/special/differential_training/)
* 轨迹模仿蒸馏训练(实验性功能):[code](/examples/z_image/model_training/special/trajectory_imitation/) * 轨迹模仿蒸馏训练(实验性功能):[code](/examples/z_image/model_training/special/trajectory_imitation/)
## 模型推理 ## 模型推理
模型通过 `ZImagePipeline.from_pretrained` 加载,详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。 模型通过 `ZImagePipeline.from_pretrained` 加载,详见[加载模型](../Pipeline_Usage/Model_Inference.md#加载模型)。
`ZImagePipeline` 推理的输入参数包括: `ZImagePipeline` 推理的输入参数包括:
@@ -84,7 +84,7 @@ image.save("image.jpg")
* `edit_image`: 编辑模型的待编辑图像,支持多张图像。 * `edit_image`: 编辑模型的待编辑图像,支持多张图像。
* `positive_only_lora`: 仅在正向提示词中使用的 LoRA 权重。 * `positive_only_lora`: 仅在正向提示词中使用的 LoRA 权重。
如果显存不足,请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文"模型总览"中的表格。 如果显存不足,请开启[显存管理](../Pipeline_Usage/VRAM_management.md),我们在示例代码中提供了每个模型推荐的低显存配置,详见前文"模型总览"中的表格。
## 模型训练 ## 模型训练
@@ -137,7 +137,7 @@ Z-Image 系列模型统一通过 [`examples/z_image/model_training/train.py`](/e
modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
``` ```
我们为每个模型编写了推荐的训练脚本,请参考前文"模型总览"中的表格。关于如何编写模型训练脚本,请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](/docs/Training/)。 我们为每个模型编写了推荐的训练脚本,请参考前文"模型总览"中的表格。关于如何编写模型训练脚本,请参考[模型训练](../Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](/docs/Training/)。
训练技巧: 训练技巧:

View File

@@ -28,7 +28,7 @@ DIFFSYNTH_MODEL_BASE_PATH="./path_to_my_models" python xxx.py
## `DIFFSYNTH_ATTENTION_IMPLEMENTATION` ## `DIFFSYNTH_ATTENTION_IMPLEMENTATION`
注意力机制实现的方式,可以设置为 `flash_attention_3``flash_attention_2``sage_attention``xformers``torch`。详见 [`./core/attention.md`](/docs/zh/API_Reference/core/attention.md). 注意力机制实现的方式,可以设置为 `flash_attention_3``flash_attention_2``sage_attention``xformers``torch`。详见 [`./core/attention.md`](../API_Reference/core/attention.md).
## `DIFFSYNTH_DISK_MAP_BUFFER_SIZE` ## `DIFFSYNTH_DISK_MAP_BUFFER_SIZE`

View File

@@ -2,7 +2,7 @@
`DiffSynth-Studio` 支持多种 GPU/NPU本文介绍如何在这些设备上运行模型推理和训练。 `DiffSynth-Studio` 支持多种 GPU/NPU本文介绍如何在这些设备上运行模型推理和训练。
在开始前,请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)安装好 GPU/NPU 相关的依赖包。 在开始前,请参考[安装依赖](../Pipeline_Usage/Setup.md)安装好 GPU/NPU 相关的依赖包。
## NVIDIA GPU ## NVIDIA GPU

View File

@@ -22,7 +22,7 @@ pipe = QwenImagePipeline.from_pretrained(
) )
``` ```
其中 `torch_dtype``device` 是计算精度和计算设备(不是模型的精度和设备)。`model_configs` 可通过多种方式配置模型路径,关于本项目内部是如何加载模型的,请参考 [`diffsynth.core.loader`](/docs/zh/API_Reference/core/loader.md)。 其中 `torch_dtype``device` 是计算精度和计算设备(不是模型的精度和设备)。`model_configs` 可通过多种方式配置模型路径,关于本项目内部是如何加载模型的,请参考 [`diffsynth.core.loader`](../API_Reference/core/loader.md)。
<details> <details>
@@ -34,7 +34,7 @@ pipe = QwenImagePipeline.from_pretrained(
> ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"), > ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
> ``` > ```
> >
> 模型文件默认下载到 `./models` 路径,该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。 > 模型文件默认下载到 `./models` 路径,该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](../Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。
</details> </details>
@@ -61,7 +61,7 @@ pipe = QwenImagePipeline.from_pretrained(
</details> </details>
默认情况下,即使模型已经下载完毕,程序仍会向远程查询是否有遗漏文件,如果要完全关闭远程请求,请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True` 默认情况下,即使模型已经下载完毕,程序仍会向远程查询是否有遗漏文件,如果要完全关闭远程请求,请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](../Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True`
```shell ```shell
import os import os
@@ -69,7 +69,7 @@ os.environ["DIFFSYNTH_SKIP_DOWNLOAD"] = "True"
import diffsynth import diffsynth
``` ```
如需从 [HuggingFace](https://huggingface.co/) 下载模型,请将[环境变量 DIFFSYNTH_DOWNLOAD_SOURCE](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_download_source) 设置为 `huggingface` 如需从 [HuggingFace](https://huggingface.co/) 下载模型,请将[环境变量 DIFFSYNTH_DOWNLOAD_SOURCE](../Pipeline_Usage/Environment_Variables.md#diffsynth_download_source) 设置为 `huggingface`
```shell ```shell
import os import os
@@ -102,13 +102,13 @@ image.save("image.jpg")
每个模型 `Pipeline` 的输入参数不同,请参考各模型的文档。 每个模型 `Pipeline` 的输入参数不同,请参考各模型的文档。
如果模型参数量太大,导致显存不足,请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)。 如果模型参数量太大,导致显存不足,请开启[显存管理](../Pipeline_Usage/VRAM_management.md)。
## 加载 LoRA ## 加载 LoRA
LoRA 是一种轻量化的模型训练方式产生少量参数扩展模型的能力。DiffSynth-Studio 的 LoRA 加载有两种方式:冷加载和热加载。 LoRA 是一种轻量化的模型训练方式产生少量参数扩展模型的能力。DiffSynth-Studio 的 LoRA 加载有两种方式:冷加载和热加载。
* 冷加载:当基础模型未开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)时LoRA 会融合进基础模型权重此时推理速度没有变化LoRA 加载后无法卸载。 * 冷加载:当基础模型未开启[显存管理](../Pipeline_Usage/VRAM_management.md)时LoRA 会融合进基础模型权重此时推理速度没有变化LoRA 加载后无法卸载。
```python ```python
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
@@ -131,7 +131,7 @@ image = pipe(prompt, seed=0, num_inference_steps=40)
image.save("image.jpg") image.save("image.jpg")
``` ```
* 热加载:当基础模型开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)时LoRA 不会融合进基础模型权重此时推理速度会变慢LoRA 加载后可通过 `pipe.clear_lora()` 卸载。 * 热加载:当基础模型开启[显存管理](../Pipeline_Usage/VRAM_management.md)时LoRA 不会融合进基础模型权重此时推理速度会变慢LoRA 加载后可通过 `pipe.clear_lora()` 卸载。
```python ```python
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig

View File

@@ -65,7 +65,7 @@ image_1.jpg,"a dog"
image_2.jpg,"a cat" image_2.jpg,"a cat"
``` ```
我们构建了样例数据集,以方便您进行测试。了解通用数据集架构是如何实现的,请参考 [`diffsynth.core.data`](/docs/zh/API_Reference/core/data.md)。 我们构建了样例数据集,以方便您进行测试。了解通用数据集架构是如何实现的,请参考 [`diffsynth.core.data`](../API_Reference/core/data.md)。
<details> <details>
@@ -93,7 +93,7 @@ image_2.jpg,"a cat"
## 加载模型 ## 加载模型
类似于[推理时的模型加载](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型),我们支持多种方式配置模型路径,两种方式是可以混用的。 类似于[推理时的模型加载](../Pipeline_Usage/Model_Inference.md#加载模型),我们支持多种方式配置模型路径,两种方式是可以混用的。
<details> <details>
@@ -115,9 +115,9 @@ image_2.jpg,"a cat"
> --model_id_with_origin_paths "Qwen/Qwen-Image:transformer/diffusion_pytorch_model*.safetensors,Qwen/Qwen-Image:text_encoder/model*.safetensors,Qwen/Qwen-Image:vae/diffusion_pytorch_model.safetensors" > --model_id_with_origin_paths "Qwen/Qwen-Image:transformer/diffusion_pytorch_model*.safetensors,Qwen/Qwen-Image:text_encoder/model*.safetensors,Qwen/Qwen-Image:vae/diffusion_pytorch_model.safetensors"
> ``` > ```
> >
> 模型文件默认下载到 `./models` 路径,该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。 > 模型文件默认下载到 `./models` 路径,该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](../Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。
> >
> 默认情况下,即使模型已经下载完毕,程序仍会向远程查询是否有遗漏文件,如果要完全关闭远程请求,请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True`。 > 默认情况下,即使模型已经下载完毕,程序仍会向远程查询是否有遗漏文件,如果要完全关闭远程请求,请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](../Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True`。
</details> </details>
@@ -235,11 +235,11 @@ accelerate launch --config_file examples/qwen_image/model_training/full/accelera
## 训练注意事项 ## 训练注意事项
* 数据集的元数据除 `csv` 格式外,还支持 `json``jsonl` 格式,关于如何选择最佳的元数据格式,请参考[](/docs/zh/API_Reference/core/data.md#元数据) * 数据集的元数据除 `csv` 格式外,还支持 `json``jsonl` 格式,关于如何选择最佳的元数据格式,请参考[](../API_Reference/core/data.md#元数据)
* 通常训练效果与训练步数强相关,与 epoch 数量弱相关,因此我们更推荐使用参数 `--save_steps` 按训练步数间隔来保存模型文件。 * 通常训练效果与训练步数强相关,与 epoch 数量弱相关,因此我们更推荐使用参数 `--save_steps` 按训练步数间隔来保存模型文件。
* 当数据量 * `dataset_repeat` 超过 $10^9$ 时,我们观测到数据集的速度明显变慢,这似乎是 `PyTorch` 的 bug我们尚不确定新版本的 `PyTorch` 是否已经修复了这一问题。 * 当数据量 * `dataset_repeat` 超过 $10^9$ 时,我们观测到数据集的速度明显变慢,这似乎是 `PyTorch` 的 bug我们尚不确定新版本的 `PyTorch` 是否已经修复了这一问题。
* 学习率 `--learning_rate` 在 LoRA 训练中建议设置为 `1e-4`,在全量训练中建议设置为 `1e-5` * 学习率 `--learning_rate` 在 LoRA 训练中建议设置为 `1e-4`,在全量训练中建议设置为 `1e-5`
* 训练框架不支持 batch size > 1原因是复杂的详见 [Q&A: 为什么训练框架不支持 batch size > 1](/docs/zh/QA.md#为什么训练框架不支持-batch-size--1) * 训练框架不支持 batch size > 1原因是复杂的详见 [Q&A: 为什么训练框架不支持 batch size > 1](../QA.md#为什么训练框架不支持-batch-size--1)
* 少数模型包含冗余参数,例如 Qwen-Image 的 DiT 部分最后一层的文本编码部分,在训练这些模型时,需设置 `--find_unused_parameters` 避免在多 GPU 训练中报错。出于对开源社区模型兼容性的考虑,我们不打算删除这些冗余参数。 * 少数模型包含冗余参数,例如 Qwen-Image 的 DiT 部分最后一层的文本编码部分,在训练这些模型时,需设置 `--find_unused_parameters` 避免在多 GPU 训练中报错。出于对开源社区模型兼容性的考虑,我们不打算删除这些冗余参数。
* Diffusion 模型的损失函数值与实际效果的关系不大,因此我们在训练过程中不会记录损失函数值。我们建议把 `--num_epochs` 设置为足够大的数值,边训边测,直至效果收敛后手动关闭训练程序。 * Diffusion 模型的损失函数值与实际效果的关系不大,因此我们在训练过程中不会记录损失函数值。我们建议把 `--num_epochs` 设置为足够大的数值,边训边测,直至效果收敛后手动关闭训练程序。
* `--use_gradient_checkpointing` 通常是开启的,除非 GPU 显存足够;`--use_gradient_checkpointing_offload` 则按需开启,详见 [`diffsynth.core.gradient`](/docs/zh/API_Reference/core/gradient.md)。 * `--use_gradient_checkpointing` 通常是开启的,除非 GPU 显存足够;`--use_gradient_checkpointing_offload` 则按需开启,详见 [`diffsynth.core.gradient`](../API_Reference/core/gradient.md)。

View File

@@ -41,7 +41,7 @@ pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6
# x86 # x86
pip install -e .[npu] pip install -e .[npu]
使用 Ascend NPU 时,请将 Python 代码中的 `"cuda"` 改为 `"npu"`,详见[NPU 支持](/docs/zh/Pipeline_Usage/GPU_support.md#ascend-npu)。 使用 Ascend NPU 时,请将 Python 代码中的 `"cuda"` 改为 `"npu"`,详见[NPU 支持](../Pipeline_Usage/GPU_support.md#ascend-npu)。
## 其他安装问题 ## 其他安装问题

View File

@@ -140,7 +140,7 @@ image.save("image.jpg")
在更为极端的情况下当内存也不足以存储整个模型时Disk Offload 功能可以让模型参数惰性加载,即,模型中的每个 Layer 仅在调用 forward 时才会从硬盘中读取相应的参数。启用这一功能时,我们建议使用高速的 SSD 硬盘。 在更为极端的情况下当内存也不足以存储整个模型时Disk Offload 功能可以让模型参数惰性加载,即,模型中的每个 Layer 仅在调用 forward 时才会从硬盘中读取相应的参数。启用这一功能时,我们建议使用高速的 SSD 硬盘。
Disk Offload 是极为特殊的显存管理方案,只支持 `.safetensors` 格式文件,不支持 `.bin``.pth``.ckpt` 等二进制文件,不支持带 Tensor reshape 的 [state dict converter](/docs/zh/Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换)。 Disk Offload 是极为特殊的显存管理方案,只支持 `.safetensors` 格式文件,不支持 `.bin``.pth``.ckpt` 等二进制文件,不支持带 Tensor reshape 的 [state dict converter](../Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换)。
```python ```python
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
@@ -196,7 +196,7 @@ vram_config = {
* PreparingOnload 和 Computation 的中间状态在显存允许的前提下的暂存状态这个状态由显存管理机制控制切换当且仅当【vram_limit 设置为无限制】或【vram_limit 已设置且有空余显存】时会进入这一状态 * PreparingOnload 和 Computation 的中间状态在显存允许的前提下的暂存状态这个状态由显存管理机制控制切换当且仅当【vram_limit 设置为无限制】或【vram_limit 已设置且有空余显存】时会进入这一状态
* Computation模型正在计算过程中这个状态由显存管理机制控制切换仅在 `forward` 中临时进入 * Computation模型正在计算过程中这个状态由显存管理机制控制切换仅在 `forward` 中临时进入
如果你是模型开发者,希望自行控制某个模型的显存管理粒度,请参考[../Developer_Guide/Enabling_VRAM_management.md](/docs/zh/Developer_Guide/Enabling_VRAM_management.md)。 如果你是模型开发者,希望自行控制某个模型的显存管理粒度,请参考[../Developer_Guide/Enabling_VRAM_management.md](../Developer_Guide/Enabling_VRAM_management.md)。
## 最佳实践 ## 最佳实践

View File

@@ -29,7 +29,7 @@
## 如何在推理时动态加载 LoRA 模型? ## 如何在推理时动态加载 LoRA 模型?
我们支持 LoRA 模型的两种加载方式,详见[LoRA 加载](/docs/zh/Pipeline_Usage/Model_Inference.md#加载-lora) 我们支持 LoRA 模型的两种加载方式,详见[LoRA 加载](./Pipeline_Usage/Model_Inference.md#加载-lora)
* 冷加载:当基础模型未开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)时LoRA 会融合进基础模型权重此时推理速度没有变化LoRA 加载后无法卸载。 * 冷加载:当基础模型未开启[显存管理](./Pipeline_Usage/VRAM_management.md)时LoRA 会融合进基础模型权重此时推理速度没有变化LoRA 加载后无法卸载。
* 热加载:当基础模型开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)时LoRA 不会融合进基础模型权重此时推理速度会变慢LoRA 加载后可通过 `pipe.clear_lora()` 卸载。 * 热加载:当基础模型开启[显存管理](./Pipeline_Usage/VRAM_management.md)时LoRA 不会融合进基础模型权重此时推理速度会变慢LoRA 加载后可通过 `pipe.clear_lora()` 卸载。

View File

@@ -26,58 +26,58 @@ graph LR;
本节介绍 `DiffSynth-Studio` 的基本使用方式,包括如何启用显存管理从而在极低显存的 GPU 上进行推理以及如何训练任意基础模型、LoRA、ControlNet 等模型。 本节介绍 `DiffSynth-Studio` 的基本使用方式,包括如何启用显存管理从而在极低显存的 GPU 上进行推理以及如何训练任意基础模型、LoRA、ControlNet 等模型。
* [安装依赖](/docs/zh/Pipeline_Usage/Setup.md) * [安装依赖](./Pipeline_Usage/Setup.md)
* [模型推理](/docs/zh/Pipeline_Usage/Model_Inference.md) * [模型推理](./Pipeline_Usage/Model_Inference.md)
* [显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md) * [显存管理](./Pipeline_Usage/VRAM_management.md)
* [模型训练](/docs/zh/Pipeline_Usage/Model_Training.md) * [模型训练](./Pipeline_Usage/Model_Training.md)
* [环境变量](/docs/zh/Pipeline_Usage/Environment_Variables.md) * [环境变量](./Pipeline_Usage/Environment_Variables.md)
* [GPU/NPU 支持](/docs/zh/Pipeline_Usage/GPU_support.md) * [GPU/NPU 支持](./Pipeline_Usage/GPU_support.md)
## Section 2: 模型详解 ## Section 2: 模型详解
本节介绍 `DiffSynth-Studio` 所支持的 Diffusion 模型,部分模型 Pipeline 具备可控生成、并行加速等特色功能。 本节介绍 `DiffSynth-Studio` 所支持的 Diffusion 模型,部分模型 Pipeline 具备可控生成、并行加速等特色功能。
* [FLUX.1](/docs/zh/Model_Details/FLUX.md) * [FLUX.1](./Model_Details/FLUX.md)
* [Wan](/docs/zh/Model_Details/Wan.md) * [Wan](./Model_Details/Wan.md)
* [Qwen-Image](/docs/zh/Model_Details/Qwen-Image.md) * [Qwen-Image](./Model_Details/Qwen-Image.md)
* [FLUX.2](/docs/zh/Model_Details/FLUX2.md) * [FLUX.2](./Model_Details/FLUX2.md)
* [Z-Image](/docs/zh/Model_Details/Z-Image.md) * [Z-Image](./Model_Details/Z-Image.md)
## Section 3: 训练框架 ## Section 3: 训练框架
本节介绍 `DiffSynth-Studio` 中训练框架的设计思路,帮助开发者理解 Diffusion 模型训练算法的原理。 本节介绍 `DiffSynth-Studio` 中训练框架的设计思路,帮助开发者理解 Diffusion 模型训练算法的原理。
* [Diffusion 模型基本原理](/docs/zh/Training/Understanding_Diffusion_models.md) * [Diffusion 模型基本原理](./Training/Understanding_Diffusion_models.md)
* [标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md) * [标准监督训练](./Training/Supervised_Fine_Tuning.md)
* [在训练中启用 FP8 精度](/docs/zh/Training/FP8_Precision.md) * [在训练中启用 FP8 精度](./Training/FP8_Precision.md)
* [端到端的蒸馏加速训练](/docs/zh/Training/Direct_Distill.md) * [端到端的蒸馏加速训练](./Training/Direct_Distill.md)
* [两阶段拆分训练](/docs/zh/Training/Split_Training.md) * [两阶段拆分训练](./Training/Split_Training.md)
* [差分 LoRA 训练](/docs/zh/Training/Differential_LoRA.md) * [差分 LoRA 训练](./Training/Differential_LoRA.md)
## Section 4: 模型接入 ## Section 4: 模型接入
本节介绍如何将模型接入 `DiffSynth-Studio` 从而使用框架基础功能,帮助开发者为本项目提供新模型的支持,或进行私有化模型的推理和训练。 本节介绍如何将模型接入 `DiffSynth-Studio` 从而使用框架基础功能,帮助开发者为本项目提供新模型的支持,或进行私有化模型的推理和训练。
* [接入模型结构](/docs/zh/Developer_Guide/Integrating_Your_Model.md) * [接入模型结构](./Developer_Guide/Integrating_Your_Model.md)
* [接入 Pipeline](/docs/zh/Developer_Guide/Building_a_Pipeline.md) * [接入 Pipeline](./Developer_Guide/Building_a_Pipeline.md)
* [接入细粒度显存管理](/docs/zh/Developer_Guide/Enabling_VRAM_management.md) * [接入细粒度显存管理](./Developer_Guide/Enabling_VRAM_management.md)
* [接入模型训练](/docs/zh/Developer_Guide/Training_Diffusion_Models.md) * [接入模型训练](./Developer_Guide/Training_Diffusion_Models.md)
## Section 5: API 参考 ## Section 5: API 参考
本节介绍 `DiffSynth-Studio` 中的独立核心模块 `diffsynth.core`,介绍内部的功能是如何设计和运作的,开发者如有需要,可将其中的功能模块用于其他代码库的开发中。 本节介绍 `DiffSynth-Studio` 中的独立核心模块 `diffsynth.core`,介绍内部的功能是如何设计和运作的,开发者如有需要,可将其中的功能模块用于其他代码库的开发中。
* [`diffsynth.core.attention`](/docs/zh/API_Reference/core/attention.md): 注意力机制实现 * [`diffsynth.core.attention`](./API_Reference/core/attention.md): 注意力机制实现
* [`diffsynth.core.data`](/docs/zh/API_Reference/core/data.md): 数据处理算子与通用数据集 * [`diffsynth.core.data`](./API_Reference/core/data.md): 数据处理算子与通用数据集
* [`diffsynth.core.gradient`](/docs/zh/API_Reference/core/gradient.md): 梯度检查点 * [`diffsynth.core.gradient`](./API_Reference/core/gradient.md): 梯度检查点
* [`diffsynth.core.loader`](/docs/zh/API_Reference/core/loader.md): 模型下载与加载 * [`diffsynth.core.loader`](./API_Reference/core/loader.md): 模型下载与加载
* [`diffsynth.core.vram`](/docs/zh/API_Reference/core/vram.md): 显存管理 * [`diffsynth.core.vram`](./API_Reference/core/vram.md): 显存管理
## Section 6: 学术导引 ## Section 6: 学术导引
本节介绍如何利用 `DiffSynth-Studio` 训练新的模型,帮助科研工作者探索新的模型技术。 本节介绍如何利用 `DiffSynth-Studio` 训练新的模型,帮助科研工作者探索新的模型技术。
* [从零开始训练模型](/docs/zh/Research_Tutorial/train_from_scratch.md) * [从零开始训练模型](./Research_Tutorial/train_from_scratch.md)
* 推理改进优化技术【coming soon】 * 推理改进优化技术【coming soon】
* 设计可控生成模型【coming soon】 * 设计可控生成模型【coming soon】
* 创建新的训练范式【coming soon】 * 创建新的训练范式【coming soon】
@@ -86,4 +86,4 @@ graph LR;
本节总结了开发者常见的问题,如果你在使用和开发中遇到了问题,请参考本节内容,如果仍无法解决,请到 GitHub 上给我们提 issue。 本节总结了开发者常见的问题,如果你在使用和开发中遇到了问题,请参考本节内容,如果仍无法解决,请到 GitHub 上给我们提 issue。
* [常见问题](/docs/zh/QA.md) * [常见问题](./QA.md)

View File

@@ -12,7 +12,7 @@ DiffSynth-Studio 的训练引擎支持从零开始训练基础模型,本文介
* 文本张量(`prompt_embeds`):文本的编码,由文本编码器产生 * 文本张量(`prompt_embeds`):文本的编码,由文本编码器产生
* 时间步(`timestep`):标量,用于标记当前处于 Diffusion 过程的哪个阶段 * 时间步(`timestep`):标量,用于标记当前处于 Diffusion 过程的哪个阶段
模型的输出是与图像张量形状相同的张量,表示模型预测的去噪方向,关于 Diffusion 模型理论的细节,请参考 [Diffusion 模型基本原理](/docs/zh/Training/Understanding_Diffusion_models.md)。在本文中,我们构建一个仅含 0.1B 参数的 DiT 模型:`AAADiT` 模型的输出是与图像张量形状相同的张量,表示模型预测的去噪方向,关于 Diffusion 模型理论的细节,请参考 [Diffusion 模型基本原理](../Training/Understanding_Diffusion_models.md)。在本文中,我们构建一个仅含 0.1B 参数的 DiT 模型:`AAADiT`
<details> <details>
<summary>模型结构代码</summary> <summary>模型结构代码</summary>
@@ -141,7 +141,7 @@ class AAADiT(torch.nn.Module):
## 2. 构建 Pipeline ## 2. 构建 Pipeline
我们在文档 [接入 Pipeline](/docs/zh/Developer_Guide/Building_a_Pipeline.md) 中介绍了如何构建一个模型 Pipeline对于本文中的模型我们也需要构建一个 Pipeline连接文本编码器、Diffusion 模型、VAE 编解码器。 我们在文档 [接入 Pipeline](../Developer_Guide/Building_a_Pipeline.md) 中介绍了如何构建一个模型 Pipeline对于本文中的模型我们也需要构建一个 Pipeline连接文本编码器、Diffusion 模型、VAE 编解码器。
<details> <details>
<summary>Pipeline 代码</summary> <summary>Pipeline 代码</summary>
@@ -328,7 +328,7 @@ def model_fn_aaa(
## 3. 准备数据集 ## 3. 准备数据集
为了快速验证训练效果,我们使用数据集 [宝可梦-第一世代](https://modelscope.cn/datasets/DiffSynth-Studio/pokemon-gen1),这个数据集转载自开源项目 [pokemon-dataset-zh](https://github.com/42arch/pokemon-dataset-zh),包含从妙蛙种子到梦幻的 151 个第一世代宝可梦。如果你想使用其他数据集,请参考文档 [准备数据集](/docs/zh/Pipeline_Usage/Model_Training.md#准备数据集) 和 [`diffsynth.core.data`](/docs/zh/API_Reference/core/data.md)。 为了快速验证训练效果,我们使用数据集 [宝可梦-第一世代](https://modelscope.cn/datasets/DiffSynth-Studio/pokemon-gen1),这个数据集转载自开源项目 [pokemon-dataset-zh](https://github.com/42arch/pokemon-dataset-zh),包含从妙蛙种子到梦幻的 151 个第一世代宝可梦。如果你想使用其他数据集,请参考文档 [准备数据集](../Pipeline_Usage/Model_Training.md#准备数据集) 和 [`diffsynth.core.data`](../API_Reference/core/data.md)。
```shell ```shell
modelscope download --dataset DiffSynth-Studio/pokemon-gen1 --local_dir ./data modelscope download --dataset DiffSynth-Studio/pokemon-gen1 --local_dir ./data
@@ -336,7 +336,7 @@ modelscope download --dataset DiffSynth-Studio/pokemon-gen1 --local_dir ./data
### 4. 开始训练 ### 4. 开始训练
训练过程可使用 Pipeline 快速实现,我们已将完整的代码放在 [/docs/zh/Research_Tutorial/train_from_scratch.py](/docs/zh/Research_Tutorial/train_from_scratch.py),可直接通过 `python docs/zh/Research_Tutorial/train_from_scratch.py` 开始单 GPU 训练。 训练过程可使用 Pipeline 快速实现,我们已将完整的代码放在 [../Research_Tutorial/train_from_scratch.py](../Research_Tutorial/train_from_scratch.py),可直接通过 `python docs/zh/Research_Tutorial/train_from_scratch.py` 开始单 GPU 训练。
如需开启多 GPU 并行训练,请运行 `accelerate config` 设置相关参数,然后使用命令 `accelerate launch docs/zh/Research_Tutorial/train_from_scratch.py` 开始训练。 如需开启多 GPU 并行训练,请运行 `accelerate config` 设置相关参数,然后使用命令 `accelerate launch docs/zh/Research_Tutorial/train_from_scratch.py` 开始训练。

View File

@@ -8,8 +8,8 @@
假设我们有两张内容相似的图像:图 1 和图 2。例如两张图中分别有一辆车但图 1 中画面细节更少,图 2 中画面细节更多。在差分 LoRA 训练中,我们进行两步训练: 假设我们有两张内容相似的图像:图 1 和图 2。例如两张图中分别有一辆车但图 1 中画面细节更少,图 2 中画面细节更多。在差分 LoRA 训练中,我们进行两步训练:
* 以图 1 为训练数据,以[标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md)的方式,训练 LoRA 1 * 以图 1 为训练数据,以[标准监督训练](../Training/Supervised_Fine_Tuning.md)的方式,训练 LoRA 1
* 以图 2 为训练数据,将 LoRA 1 融入基础模型后,以[标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md)的方式,训练 LoRA 2 * 以图 2 为训练数据,将 LoRA 1 融入基础模型后,以[标准监督训练](../Training/Supervised_Fine_Tuning.md)的方式,训练 LoRA 2
在第一步训练中由于训练数据仅有一张图LoRA 模型很容易过拟合因此训练完成后LoRA 1 会让模型毫不犹豫地生成图 1无论随机种子是什么。在第二步训练中LoRA 模型再次过拟合,因此训练完成后,在 LoRA 1 和 LoRA 2 的共同作用下,模型会毫不犹豫地生成图 2。简言之 在第一步训练中由于训练数据仅有一张图LoRA 模型很容易过拟合因此训练完成后LoRA 1 会让模型毫不犹豫地生成图 1无论随机种子是什么。在第二步训练中LoRA 模型再次过拟合,因此训练完成后,在 LoRA 1 和 LoRA 2 的共同作用下,模型会毫不犹豫地生成图 2。简言之

View File

@@ -44,7 +44,7 @@ loss = torch.nn.functional.mse_loss(image_1, image_2)
## 在训练框架中使用蒸馏加速训练 ## 在训练框架中使用蒸馏加速训练
首先,需要生成训练数据,请参考[模型推理](/docs/zh/Pipeline_Usage/Model_Inference.md)部分编写推理代码,以足够多的推理步数生成训练数据。 首先,需要生成训练数据,请参考[模型推理](../Pipeline_Usage/Model_Inference.md)部分编写推理代码,以足够多的推理步数生成训练数据。
以 Qwen-Image 为例,以下代码可以生成一张图片: 以 Qwen-Image 为例,以下代码可以生成一张图片:
@@ -67,7 +67,7 @@ image = pipe(prompt, seed=0, num_inference_steps=40)
image.save("image.jpg") image.save("image.jpg")
``` ```
然后,我们把必要的信息编写成[元数据文件](/docs/zh/API_Reference/core/data.md#元数据) 然后,我们把必要的信息编写成[元数据文件](../API_Reference/core/data.md#元数据)
```csv ```csv
image,prompt,seed,rand_device,num_inference_steps,cfg_scale image,prompt,seed,rand_device,num_inference_steps,cfg_scale
@@ -86,11 +86,11 @@ modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir
bash examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh bash examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh
``` ```
请注意,在[训练脚本参数](/docs/zh/Pipeline_Usage/Model_Training.md#脚本参数)中,数据集的图像分辨率设置要避免触发缩放处理。当设定 `--height``--width` 以启用固定分辨率时,所有训练数据必须是以完全一致的宽高生成的;当设定 `--max_pixels` 以启用动态分辨率时,`--max_pixels` 的数值必须大于或等于任一训练图像的像素面积。 请注意,在[训练脚本参数](../Pipeline_Usage/Model_Training.md#脚本参数)中,数据集的图像分辨率设置要避免触发缩放处理。当设定 `--height``--width` 以启用固定分辨率时,所有训练数据必须是以完全一致的宽高生成的;当设定 `--max_pixels` 以启用动态分辨率时,`--max_pixels` 的数值必须大于或等于任一训练图像的像素面积。
## 训练框架设计思路 ## 训练框架设计思路
直接蒸馏与[标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md)相比,仅训练的损失函数不同,直接蒸馏的损失函数是 `diffsynth.diffusion.loss` 中的 `DirectDistillLoss` 直接蒸馏与[标准监督训练](../Training/Supervised_Fine_Tuning.md)相比,仅训练的损失函数不同,直接蒸馏的损失函数是 `diffsynth.diffusion.loss` 中的 `DirectDistillLoss`
## 未来工作 ## 未来工作

View File

@@ -1,8 +1,8 @@
# 在训练中启用 FP8 精度 # 在训练中启用 FP8 精度
尽管 `DiffSynth-Studio` 在模型推理中支持[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)但其中的大部分减少显存占用的技术不适合用于训练中Offload 会导致极为缓慢的训练过程。 尽管 `DiffSynth-Studio` 在模型推理中支持[显存管理](../Pipeline_Usage/VRAM_management.md)但其中的大部分减少显存占用的技术不适合用于训练中Offload 会导致极为缓慢的训练过程。
FP8 精度是唯一可在训练过程中启用的显存管理策略,但本框架目前不支持原生 FP8 精度训练,原因详见 [Q&A: 为什么训练框架不支持原生 FP8 精度训练?](/docs/zh/QA.md#为什么训练框架不支持原生-fp8-精度训练),仅支持将参数不被梯度更新的模型(不需要梯度回传,或梯度仅更新其 LoRA以 FP8 精度进行存储。 FP8 精度是唯一可在训练过程中启用的显存管理策略,但本框架目前不支持原生 FP8 精度训练,原因详见 [Q&A: 为什么训练框架不支持原生 FP8 精度训练?](../QA.md#为什么训练框架不支持原生-fp8-精度训练),仅支持将参数不被梯度更新的模型(不需要梯度回传,或梯度仅更新其 LoRA以 FP8 精度进行存储。
## 启用 FP8 ## 启用 FP8

View File

@@ -8,7 +8,7 @@
在大部分模型的训练过程中,大量计算发生在“前处理”中,即“与去噪模型无关的计算”,包括 VAE 编码、文本编码等。当对应的模型参数固定时,这部分计算的结果是重复的,在多个 epoch 中每个数据样本的计算结果完全相同,因此我们提供了“拆分训练”功能,该功能可以自动分析并拆分训练过程。 在大部分模型的训练过程中,大量计算发生在“前处理”中,即“与去噪模型无关的计算”,包括 VAE 编码、文本编码等。当对应的模型参数固定时,这部分计算的结果是重复的,在多个 epoch 中每个数据样本的计算结果完全相同,因此我们提供了“拆分训练”功能,该功能可以自动分析并拆分训练过程。
对于普通文生图模型的标准监督训练,拆分过程是非常简单的,只需要把所有 [`Pipeline Units`](/docs/zh/Developer_Guide/Building_a_Pipeline.md#units) 的计算拆分到第一阶段,将计算结果存储到硬盘中,然后在第二阶段从硬盘中读取这些结果并进行后续计算即可。但如果前处理过程中需要梯度回传,情况就变得极其复杂,为此,我们引入了一个计算图拆分算法用于分析如何拆分计算。 对于普通文生图模型的标准监督训练,拆分过程是非常简单的,只需要把所有 [`Pipeline Units`](../Developer_Guide/Building_a_Pipeline.md#units) 的计算拆分到第一阶段,将计算结果存储到硬盘中,然后在第二阶段从硬盘中读取这些结果并进行后续计算即可。但如果前处理过程中需要梯度回传,情况就变得极其复杂,为此,我们引入了一个计算图拆分算法用于分析如何拆分计算。
## 计算图拆分算法 ## 计算图拆分算法
@@ -16,7 +16,7 @@
## 使用拆分训练 ## 使用拆分训练
拆分训练已支持[标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md)和[直接蒸馏训练](/docs/zh/Training/Direct_Distill.md),在训练命令中通过 `--task` 参数控制,以 Qwen-Image 模型的 LoRA 训练为例,拆分前的训练命令为: 拆分训练已支持[标准监督训练](../Training/Supervised_Fine_Tuning.md)和[直接蒸馏训练](../Training/Direct_Distill.md),在训练命令中通过 `--task` 参数控制,以 Qwen-Image 模型的 LoRA 训练为例,拆分前的训练命令为:
```shell ```shell
accelerate launch examples/qwen_image/model_training/train.py \ accelerate launch examples/qwen_image/model_training/train.py \

View File

@@ -1,10 +1,10 @@
# 标准监督训练 # 标准监督训练
在理解 [Diffusion 模型基本原理](/docs/zh/Training/Understanding_Diffusion_models.md)之后,本文档介绍框架如何实现 Diffusion 模型的训练。本文档介绍框架的原理,帮助开发者编写新的训练代码,如需使用我们提供的默认训练功能,请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md)。 在理解 [Diffusion 模型基本原理](../Training/Understanding_Diffusion_models.md)之后,本文档介绍框架如何实现 Diffusion 模型的训练。本文档介绍框架的原理,帮助开发者编写新的训练代码,如需使用我们提供的默认训练功能,请参考[模型训练](../Pipeline_Usage/Model_Training.md)。
回顾前文中的模型训练伪代码,当我们实际编写代码时,情况会变得极为复杂。部分模型需要输入额外的引导条件并进行预处理,例如 ControlNet部分模型需要与去噪模型进行交叉式的计算例如 VACE部分模型因显存需求过大需要开启 Gradient Checkpointing例如 Qwen-Image 的 DiT。 回顾前文中的模型训练伪代码,当我们实际编写代码时,情况会变得极为复杂。部分模型需要输入额外的引导条件并进行预处理,例如 ControlNet部分模型需要与去噪模型进行交叉式的计算例如 VACE部分模型因显存需求过大需要开启 Gradient Checkpointing例如 Qwen-Image 的 DiT。
为了实现严格的推理和训练一致性,我们对 `Pipeline` 等组件进行了抽象封装,在训练过程中大量复用推理代码。请参考[接入 Pipeline](/docs/zh/Developer_Guide/Building_a_Pipeline.md) 了解 `Pipeline` 组件的设计。接下来我们介绍训练框架如何利用 `Pipeline` 组件构建训练算法。 为了实现严格的推理和训练一致性,我们对 `Pipeline` 等组件进行了抽象封装,在训练过程中大量复用推理代码。请参考[接入 Pipeline](../Developer_Guide/Building_a_Pipeline.md) 了解 `Pipeline` 组件的设计。接下来我们介绍训练框架如何利用 `Pipeline` 组件构建训练算法。
## 框架设计思路 ## 框架设计思路
@@ -48,13 +48,13 @@ class QwenImageTrainingModule(DiffusionTrainingModule):
) )
``` ```
加载模型的逻辑与推理时基本一致,支持从远程和本地路径加载模型,详见[模型推理](/docs/zh/Pipeline_Usage/Model_Inference.md),但请注意不要启用[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)。 加载模型的逻辑与推理时基本一致,支持从远程和本地路径加载模型,详见[模型推理](../Pipeline_Usage/Model_Inference.md),但请注意不要启用[显存管理](../Pipeline_Usage/VRAM_management.md)。
`switch_pipe_to_training_mode` 可以将模型切换到训练模式,详见 `switch_pipe_to_training_mode` `switch_pipe_to_training_mode` 可以将模型切换到训练模式,详见 `switch_pipe_to_training_mode`
### `forward` ### `forward`
`forward` 中需计算损失函数值,先进行前处理,然后经过 `Pipeline` 的 [`model_fn`](/docs/zh/Developer_Guide/Building_a_Pipeline.md#model_fn) 计算损失函数。 `forward` 中需计算损失函数值,先进行前处理,然后经过 `Pipeline` 的 [`model_fn`](../Developer_Guide/Building_a_Pipeline.md#model_fn) 计算损失函数。
```python ```python
def forward(self, data): def forward(self, data):
@@ -90,7 +90,7 @@ class QwenImageTrainingModule(DiffusionTrainingModule):
训练框架还需其他模块,包括: 训练框架还需其他模块,包括:
* accelerator: `accelerate` 提供的训练启动器,详见 [`accelerate`](https://huggingface.co/docs/accelerate/index) * accelerator: `accelerate` 提供的训练启动器,详见 [`accelerate`](https://huggingface.co/docs/accelerate/index)
* dataset: 通用数据集,详见 [`diffsynth.core.data`](/docs/zh/API_Reference/core/data.md) * dataset: 通用数据集,详见 [`diffsynth.core.data`](../API_Reference/core/data.md)
* model_logger: 模型记录器,详见 `diffsynth.diffusion.logger` * model_logger: 模型记录器,详见 `diffsynth.diffusion.logger`
```python ```python

View File

@@ -136,4 +136,4 @@ $$
## 本项目如何封装和实现模型训练? ## 本项目如何封装和实现模型训练?
请阅读下一文档:[标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md) 请阅读下一文档:[标准监督训练](../Training/Supervised_Fine_Tuning.md)

123
docs/zh/conf.py Normal file
View File

@@ -0,0 +1,123 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
# import sphinx_book_theme
sys.path.insert(0, os.path.abspath('../../'))
# -- Project information -----------------------------------------------------
project = 'diffsynth'
copyright = '2022-2025, Alibaba ModelScope'
author = 'ModelScope Authors'
version_file = '../../diffsynth/version.py'
html_theme = 'sphinx_rtd_theme'
language = 'zh_CN'
def get_version():
with open(version_file, 'r', encoding='utf-8') as f:
exec(compile(f.read(), version_file, 'exec'))
return locals()['__version__']
# The full version, including alpha/beta/rc tags
version = get_version()
release = version
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.napoleon',
'sphinx.ext.autosummary',
'sphinx.ext.autodoc',
'sphinx.ext.viewcode',
'sphinx_markdown_tables',
'sphinx_copybutton',
'myst_parser',
]
# build the templated autosummary files
autosummary_generate = True
numpydoc_show_class_members = False
# Enable overriding of function signatures in the first line of the docstring.
autodoc_docstring_signature = True
# Disable docstring inheritance
autodoc_inherit_docstrings = False
# Show type hints in the description
autodoc_typehints = 'description'
# Add parameter types if the parameter is documented in the docstring
autodoc_typehints_description_target = 'documented_params'
autodoc_default_options = {
'member-order': 'bysource',
}
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = ['.rst', '.md']
# The master toctree document.
root_doc = 'index'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['build']
# A list of glob-style patterns [1] that are used to find source files.
# They are matched against the source file names relative to the source directory,
# using slashes as directory separators on all platforms.
# The default is **, meaning that all files are recursively included from the source directory.
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
# html_theme = 'sphinx_book_theme'
# html_theme_path = [sphinx_book_theme.get_html_theme_path()]
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# html_css_files = ['css/readthedocs.css']
# -- Options for HTMLHelp output ---------------------------------------------
# Output file base name for HTML help builder.
# -- Extension configuration -------------------------------------------------
# Ignore >>> when copying code
copybutton_prompt_text = r'>>> |\.\.\. '
copybutton_prompt_is_regexp = True
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'https://docs.python.org/': None}
myst_enable_extensions = [
'amsmath',
'dollarmath',
'colon_fence',
]

77
docs/zh/index.rst Normal file
View File

@@ -0,0 +1,77 @@
欢迎来到 DiffSynth-Studio 的文档
=====================
.. toctree::
:maxdepth: 2
:caption: 文档介绍
README
.. toctree::
:maxdepth: 2
:caption: 上手使用
Pipeline_Usage/Setup
Pipeline_Usage/Model_Inference
Pipeline_Usage/VRAM_management
Pipeline_Usage/Model_Training
Pipeline_Usage/Environment_Variables
Pipeline_Usage/GPU_support
.. toctree::
:maxdepth: 2
:caption: 模型详解
Model_Details/FLUX
Model_Details/Wan
Model_Details/Qwen-Image
Model_Details/FLUX2
Model_Details/Z-Image
.. toctree::
:maxdepth: 2
:caption: 训练框架
Training/Understanding_Diffusion_models
Training/Supervised_Fine_Tuning
Training/FP8_Precision
Training/Direct_Distill
Training/Split_Training
Training/Differential_LoRA
.. toctree::
:maxdepth: 2
:caption: 模型接入
Developer_Guide/Integrating_Your_Model
Developer_Guide/Building_a_Pipeline
Developer_Guide/Enabling_VRAM_management
Developer_Guide/Training_Diffusion_Models
.. toctree::
:maxdepth: 2
:caption: API 参考
API_Reference/core/attention
API_Reference/core/data
API_Reference/core/gradient
API_Reference/core/loader
API_Reference/core/vram
.. toctree::
:maxdepth: 2
:caption: 学术导引
Research_Tutorial/train_from_scratch
.. toctree::
:maxdepth: 2
:caption: 常见问题
QA
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`