Add files via upload

再改一次
2026-03-22 00:38:11 +00:00 · 2024-10-22 09:56:03 +08:00
parent 157ba2e426
commit f6e676cdf9
46 changed files with 2525 additions and 0 deletions
--- a/docs/source_en/finetune/overview.md
+++ b/docs/source_en/finetune/overview.md
@@ -0,0 +1,98 @@
+# Training Framework
+
+We have implemented a training framework for text-to-image diffusion models, allowing users to effortlessly train LoRA models with our framework. Our provided scripts come with the following features:
+
+* **Comprehensive Functionality**: Our training framework supports multi-GPU and multi-node configurations, is optimized for acceleration with DeepSpeed, and includes gradient checkpointing to accommodate models with higher memory requirements.
+* **Succinct Code**: We have avoided large, complex code blocks. The general module is implemented in `diffsynth/trainers/text_to_image.py`, while model-specific training scripts contain only the minimal code necessary for the model architecture, facilitating ease of use for academic researchers.
+* **Modular Design**: Built on the versatile PyTorch-Lightning framework, our training framework is decoupled in functionality, enabling developers to easily incorporate additional training techniques by modifying our scripts to suit their specific needs.
+
+Examples of images fine-tuned with LoRA. Prompts are "一只小狗蹦蹦跳跳，周围是姹紫嫣红的鲜花，远处是山脉" (for Chinese models) or "a dog is jumping, flowers around the dog, the background is mountains and clouds" (for English models).
+
+||FLUX.1-dev|Kolors|Stable Diffusion 3|Hunyuan-DiT|
+|-|-|-|-|-|
+|Without LoRA|![image_without_lora](https://github.com/user-attachments/assets/df62cef6-d54f-4e3d-a602-5dd290079d49)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/9d79ed7a-e8cf-4d98-800a-f182809db318)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/ddb834a5-6366-412b-93dc-6d957230d66e)|![image_without_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/1aa21de5-a992-4b66-b14f-caa44e08876e)|
+|With LoRA|![image_with_lora](https://github.com/user-attachments/assets/4fd39890-0291-4d19-8a88-d70d0ae18533)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/02f62323-6ee5-4788-97a1-549732dbe4f0)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/8e7b2888-d874-4da4-a75b-11b6b214b9bf)|![image_with_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/83a0a41a-691f-4610-8e7b-d8e17c50a282)|
+
+## Install Additional Packages
+
+```bash
+pip install peft lightning
+```
+
+## Prepare the Dataset
+
+We provide an [example dataset](https://modelscope.cn/datasets/buptwq/lora-stable-diffusion-finetune/files). You need to organize your training dataset in the following structure:
+
+```
+data/dog/
+└── train
+    ├── 00.jpg
+    ├── 01.jpg
+    ├── 02.jpg
+    ├── 03.jpg
+    ├── 04.jpg
+    └── metadata.csv
+```
+
+`metadata.csv`:
+
+```
+file_name,text
+00.jpg,a dog
+01.jpg,a dog
+02.jpg,a dog
+03.jpg,a dog
+04.jpg,a dog
+```
+
+Please note that if the model is a Chinese model (e.g., Hunyuan-DiT and Kolors), we recommend using Chinese text in the dataset. For example:
+
+```
+file_name,text
+00.jpg,一只小狗
+01.jpg,一只小狗
+02.jpg,一只小狗
+03.jpg,一只小狗
+04.jpg,一只小狗
+```
+
+## Train LoRA Model
+
+General parameter options:
+
+```
+  --lora_target_modules LORA_TARGET_MODULES
+                        Layers where the LoRA modules are located.
+  --dataset_path DATASET_PATH
+                        Path to the dataset.
+  --output_path OUTPUT_PATH
+                        Path where the model will be saved.
+  --steps_per_epoch STEPS_PER_EPOCH
+                        Number of steps per epoch.
+  --height HEIGHT        The height of the image.
+  --width WIDTH          The width of the image.
+  --center_crop         Whether to center crop the input image to the specified resolution. If not set, the image will be randomly cropped. The image will be resized to the specified resolution before cropping.
+  --random_flip         Whether to randomly horizontally flip the image.
+  --batch_size BATCH_SIZE
+                        Batch size for the training data loader (per device).
+  --dataloader_num_workers DATALOADER_NUM_WORKERS
+                        The number of subprocesses used for data loading. A value of 0 means the data will be loaded in the main process.
+  --precision {32,16,16-mixed}
+                        The precision for training.
+  --learning_rate LEARNING_RATE
+                        The learning rate.
+  --lora_rank LORA_RANK
+                        The dimension of the LoRA update matrix.
+  --lora_alpha LORA_ALPHA
+                        The weight of the LoRA update matrix.
+  --use_gradient_checkpointing
+                        Whether to use gradient checkpointing.
+  --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
+                        The number of batches for gradient accumulation.
+  --training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3}
+                        The training strategy.
+  --max_epochs MAX_EPOCHS
+                        The number of training epochs.
+  --modelscope_model_id MODELSCOPE_MODEL_ID
+                        The model ID on ModelScope (https://www.modelscope.cn/). If the model ID is provided, the model will be automatically uploaded to ModelScope.
+```
--- a/docs/source_en/finetune/train_flux_lora.md
+++ b/docs/source_en/finetune/train_flux_lora.md
@@ -0,0 +1,70 @@
+# Training FLUX LoRA
+
+The following files will be used to build the FLUX model. You can download them from [huggingface](https://huggingface.co/black-forest-labs/FLUX.1-dev)或[modelscope](https://www.modelscope.cn/models/ai-modelscope/flux.1-dev), or you can use the following code to download these files:
+```python
+from diffsynth import download_models
+
+download_models(["FLUX.1-dev"])
+```
+
+```
+models/FLUX/
+└── FLUX.1-dev
+    ├── ae.safetensors
+    ├── flux1-dev.safetensors
+    ├── text_encoder
+    │   └── model.safetensors
+    └── text_encoder_2
+        ├── config.json
+        ├── model-00001-of-00002.safetensors
+        ├── model-00002-of-00002.safetensors
+        └── model.safetensors.index.json
+```
+
+Start the training task with the following command:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/flux/train_flux_lora.py \
+  --pretrained_text_encoder_path models/FLUX/FLUX.1-dev/text_encoder/model.safetensors \
+  --pretrained_text_encoder_2_path models/FLUX/FLUX.1-dev/text_encoder_2 \
+  --pretrained_dit_path models/FLUX/FLUX.1-dev/flux1-dev.safetensors \
+  --pretrained_vae_path models/FLUX/FLUX.1-dev/ae.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "bf16" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information on the parameters, please use `python examples/train/flux/train_flux_lora.py -h` to view detailed information.
+
+After the training is complete, use `model_manager.load_lora` to load the LoRA for inference.
+
+```python
+from diffsynth import ModelManager, FluxImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=[
+                                 "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
+                                 "models/FLUX/FLUX.1-dev/text_encoder_2",
+                                 "models/FLUX/FLUX.1-dev/ae.safetensors",
+                                 "models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
+                             ])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SDXLImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt=prompt,
+    num_inference_steps=30, embedded_guidance=3.5
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/finetune/train_hunyuan_dit_lora.md
+++ b/docs/source_en/finetune/train_hunyuan_dit_lora.md
@@ -0,0 +1,72 @@
+# Training Hunyuan-DiT LoRA
+
+Building the Hunyuan DiT model requires four files. You can download these files from [HuggingFace](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT) or [ModelScope](https://www.modelscope.cn/models/modelscope/HunyuanDiT/summary). You can use the following code to download these files:
+
+```python
+from diffsynth import download_models
+
+download_models(["HunyuanDiT"])
+```
+
+```
+models/HunyuanDiT/
+├── Put Hunyuan DiT checkpoints here.txt
+└── t2i
+    ├── clip_text_encoder
+    │   └── pytorch_model.bin
+    ├── model
+    │   └── pytorch_model_ema.pt
+    ├── mt5
+    │   └── pytorch_model.bin
+    └── sdxl-vae-fp16-fix
+        └── diffusion_pytorch_model.bin
+```
+
+Use the following command to start the training task:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py \
+  --pretrained_path models/HunyuanDiT/t2i \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information about the parameters, please use `python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py -h` to view detailed information.
+
+After the training is complete, use `model_manager.load_lora` to load the LoRA for inference.
+
+
+```python
+from diffsynth import ModelManager, HunyuanDiTImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=[
+                                 "models/HunyuanDiT/t2i/clip_text_encoder/pytorch_model.bin",
+                                 "models/HunyuanDiT/t2i/model/pytorch_model_ema.pt",
+                                 "models/HunyuanDiT/t2i/mt5/pytorch_model.bin",
+                                 "models/HunyuanDiT/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin"
+                             ])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = HunyuanDiTImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="A little puppy hops and jumps playfully, surrounded by a profusion of colorful flowers, with a mountain range visible in the distance.
+", 
+    negative_prompt="",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/finetune/train_kolors_lora.md
+++ b/docs/source_en/finetune/train_kolors_lora.md
@@ -0,0 +1,77 @@
+# Training Kolors LoRA
+The following files will be used to build Kolors. You can download Kolors from [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) or [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors). Due to precision overflow issues, we need to download an additional VAE model （from [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) or [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix). You can use the following code to download these files:
+
+```python
+from diffsynth import download_models
+
+download_models(["Kolors", "SDXL-vae-fp16-fix"])
+```
+
+```
+models
+├── kolors
+│   └── Kolors
+│       ├── text_encoder
+│       │   ├── config.json
+│       │   ├── pytorch_model-00001-of-00007.bin
+│       │   ├── pytorch_model-00002-of-00007.bin
+│       │   ├── pytorch_model-00003-of-00007.bin
+│       │   ├── pytorch_model-00004-of-00007.bin
+│       │   ├── pytorch_model-00005-of-00007.bin
+│       │   ├── pytorch_model-00006-of-00007.bin
+│       │   ├── pytorch_model-00007-of-00007.bin
+│       │   └── pytorch_model.bin.index.json
+│       ├── unet
+│       │   └── diffusion_pytorch_model.safetensors
+│       └── vae
+│           └── diffusion_pytorch_model.safetensors
+└── sdxl-vae-fp16-fix
+    └── diffusion_pytorch_model.safetensors
+```
+
+Use the following command to start the training task:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
+  --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \
+  --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \
+  --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information on the parameters, please use `python examples/train/kolors/train_kolors_lora.py -h` to view detailed information.
+
+After the training is complete, use `model_manager.load_lora` to load the LoRA for inference.
+
+
+
+
+```python
+from diffsynth import ModelManager, SD3ImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SD3ImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/finetune/train_sd3_lora.md
+++ b/docs/source_en/finetune/train_sd3_lora.md
@@ -0,0 +1,57 @@
+# Training Stable Diffusion 3 LoRA
+
+The training script only requires one file. You can use [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)（without T5 Encoder）或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)（with T5 Encoder）. Please use the following code to download these files:
+```python
+from diffsynth import download_models
+
+download_models(["StableDiffusion3", "StableDiffusion3_without_T5"])
+```
+
+```
+models/stable_diffusion_3/
+├── Put Stable Diffusion 3 checkpoints here.txt
+├── sd3_medium_incl_clips.safetensors
+└── sd3_medium_incl_clips_t5xxlfp16.safetensors
+```
+
+Use the following command to start the training task:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora.py \
+  --pretrained_path models/stable_diffusion_3/sd3_medium_incl_clips.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information on the parameters, please use `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` to view detailed information.
+
+After training is completed, use `model_manager.load_lora` to load LoRA for inference.
+
+```python
+from diffsynth import ModelManager, SD3ImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SD3ImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/finetune/train_sd_lora.md
+++ b/docs/source_en/finetune/train_sd_lora.md
@@ -0,0 +1,58 @@
+# Training Stable Diffusion LoRA
+
+The training script only requires one file. We support mainstream checkpoints on [CivitAI](https://civitai.com/). By default, we use the basic Stable Diffusion v1.5. You can download it from [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) or [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors). You can use the following code to download this file:
+
+```python
+from diffsynth import download_models
+
+download_models(["StableDiffusion_v15"])
+```
+
+```
+models/stable_diffusion
+├── Put Stable Diffusion checkpoints here.txt
+└── v1-5-pruned-emaonly.safetensors
+```
+
+Start the training task with the following command:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py \
+  --pretrained_path models/stable_diffusion/v1-5-pruned-emaonly.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 512 \
+  --width 512 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information about the parameters, please use `python examples/train/stable_diffusion/train_sd_lora.py -h` to view detailed information.
+
+After training is complete, use `model_manager.load_lora` to load LoRA for inference.
+
+
+```python
+from diffsynth import ModelManager, SDImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion/v1-5-pruned-emaonly.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SDImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=512, height=512,
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/finetune/train_sdxl_lora.md
+++ b/docs/source_en/finetune/train_sdxl_lora.md
@@ -0,0 +1,57 @@
+# Training Stable Diffusion XL LoRA
+
+
+The training script only requires one file. We support mainstream checkpoints on [CivitAI](https://civitai.com/). By default, we use the basic Stable Diffusion XL. You can download it from [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors). You can also use the following code to download this file:
+```python
+from diffsynth import download_models
+
+download_models(["StableDiffusionXL_v1"])
+```
+
+```
+models/stable_diffusion_xl
+├── Put Stable Diffusion XL checkpoints here.txt
+└── sd_xl_base_1.0.safetensors
+```
+
+We have observed that Stable Diffusion XL may experience numerical precision overflows when using float16 precision, so we recommend that users train with float32 precision. To start the training task, use the following command:
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lora.py \
+  --pretrained_path models/stable_diffusion_xl/sd_xl_base_1.0.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "32" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information about the parameters, please use `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` to view detailed information.
+
+After training is complete, use `model_manager.load_lora` to load LoRA for inference.
+
+
+```python
+from diffsynth import ModelManager, SDXLImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion_xl/sd_xl_base_1.0.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SDXLImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```