Add files via upload

电脑更换,完成到D:\translate\DiffSynth-Studio\docs\source_en\finetune,该写第四个文档
This commit is contained in:
yrk111222
2024-10-18 11:36:48 +08:00
committed by GitHub
parent 793062e141
commit 24b78148b8
46 changed files with 2863 additions and 1995 deletions

View File

@@ -0,0 +1,102 @@
Certainly, here is the continuation of the translation:
---
# Training Framework
We have implemented a training framework for text-to-image diffusion models, allowing users to effortlessly train LoRA models with our framework. Our provided scripts come with the following features:
* **Comprehensive Functionality**: Our training framework supports multi-GPU and multi-node configurations, is optimized for acceleration with DeepSpeed, and includes gradient checkpointing to accommodate models with higher memory requirements.
* **Succinct Code**: We have avoided large, complex code blocks. The general module is implemented in `diffsynth/trainers/text_to_image.py`, while model-specific training scripts contain only the minimal code necessary for the model architecture, facilitating ease of use for academic researchers.
* **Modular Design**: Built on the versatile PyTorch Lightning framework, our training framework is decoupled in functionality, enabling developers to easily incorporate additional training techniques by modifying our scripts to suit their specific needs.
Examples of images fine-tuned with LoRA. Prompts are "A little dog jumping around with colorful flowers around and mountains in the background" (for Chinese models) or "a dog is jumping, flowers around the dog, the background is mountains and clouds" (for English models).
||FLUX.1-dev|Kolors|Stable Diffusion 3|Hunyuan-DiT|
|-|-|-|-|-|
|Without LoRA|![image_without_lora](https://github.com/user-attachments/assets/df62cef6-d54f-4e3d-a602-5dd290079d49)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/9d79ed7a-e8cf-4d98-800a-f182809db318)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/ddb834a5-6366-412b-93dc-6d957230d66e)|![image_without_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/1aa21de5-a992-4b66-b14f-caa44e08876e)|
|With LoRA|![image_with_lora](https://github.com/user-attachments/assets/4fd39890-0291-4d19-8a88-d70d0ae18533)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/02f62323-6ee5-4788-97a1-549732dbe4f0)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/8e7b2888-d874-4da4-a75b-11b6b214b9bf)|![image_with_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/83a0a41a-691f-4610-8e7b-d8e17c50a282)|
## Install Additional Packages
```bash
pip install peft lightning
```
## Prepare the Dataset
We provide an [example dataset](https://modelscope.cn/datasets/buptwq/lora-stable-diffusion-finetune/files). You need to organize your training dataset in the following structure:
```
data/dog/
└── train
├── 00.jpg
├── 01.jpg
├── 02.jpg
├── 03.jpg
├── 04.jpg
└── metadata.csv
```
`metadata.csv`:
```
file_name,text
00.jpg,a dog
01.jpg,a dog
02.jpg,a dog
03.jpg,a dog
04.jpg,a dog
```
Please note that if the model is a Chinese model (e.g., Hunyuan-DiT and Kolors), we recommend using Chinese text in the dataset. For example:
```
file_name,text
00.jpg,a dog
01.jpg,a dog
02.jpg,a dog
03.jpg,a dog
04.jpg,a dog
```
## Train LoRA Model
General parameter options:
```
--lora_target_modules LORA_TARGET_MODULES
Layers where the LoRA modules are located.
--dataset_path DATASET_PATH
Path to the dataset.
--output_path OUTPUT_PATH
Path where the model will be saved.
--steps_per_epoch STEPS_PER_EPOCH
Number of steps per epoch.
--height HEIGHT The height of the image.
--width WIDTH The width of the image.
--center_crop Whether to center crop the input image to the specified resolution. If not set, the image will be randomly cropped. The image will be resized to the specified resolution before cropping.
--random_flip Whether to randomly horizontally flip the image.
--batch_size BATCH_SIZE
Batch size for the training data loader (per device).
--dataloader_num_workers DATALOADER_NUM_WORKERS
The number of subprocesses used for data loading. A value of 0 means the data will be loaded in the main process.
--precision {32,16,16-mixed}
The precision for training.
--learning_rate LEARNING_RATE
The learning rate.
--lora_rank LORA_RANK
The dimension of the LoRA update matrix.
--lora_alpha LORA_ALPHA
The weight of the LoRA update matrix.
--use_gradient_checkpointing
Whether to use gradient checkpointing.
--accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
The number of batches for gradient accumulation.
--training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3}
The training strategy.
--max_epochs MAX_EPOCHS
The number of training epochs.
--modelscope_model_id MODELSCOPE_MODEL_ID
The model ID on ModelScope (https://www.modelscope.cn/). If the model ID is provided, the model will be automatically uploaded to ModelScope.
```

View File

@@ -0,0 +1,70 @@
#Training FLUX LoRA
The following files will be used to build the FLUX model. You can download them from [huggingface](https://huggingface.co/black-forest-labs/FLUX.1-dev)或[modelscope](https://www.modelscope.cn/models/ai-modelscope/flux.1-dev), or you can use the following code to download these files:
```python
from diffsynth import download_models
download_models(["FLUX.1-dev"])
```
```
models/FLUX/
└── FLUX.1-dev
├── ae.safetensors
├── flux1-dev.safetensors
├── text_encoder
│ └── model.safetensors
└── text_encoder_2
├── config.json
├── model-00001-of-00002.safetensors
├── model-00002-of-00002.safetensors
└── model.safetensors.index.json
```
Start the training task with the following command:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/flux/train_flux_lora.py \
--pretrained_text_encoder_path models/FLUX/FLUX.1-dev/text_encoder/model.safetensors \
--pretrained_text_encoder_2_path models/FLUX/FLUX.1-dev/text_encoder_2 \
--pretrained_dit_path models/FLUX/FLUX.1-dev/flux1-dev.safetensors \
--pretrained_vae_path models/FLUX/FLUX.1-dev/ae.safetensors \
--dataset_path data/dog \
--output_path ./models \
--max_epochs 1 \
--steps_per_epoch 500 \
--height 1024 \
--width 1024 \
--center_crop \
--precision "bf16" \
--learning_rate 1e-4 \
--lora_rank 4 \
--lora_alpha 4 \
--use_gradient_checkpointing
```
For more information on the parameters, please use `python examples/train/flux/train_flux_lora.py -h` to view detailed information.
After the training is complete, use `model_manager.load_lora` to load the LoRA for inference.
```python
from diffsynth import ModelManager, FluxImagePipeline
import torch
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
file_path_list=[
"models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
"models/FLUX/FLUX.1-dev/text_encoder_2",
"models/FLUX/FLUX.1-dev/ae.safetensors",
"models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
])
model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
pipe = SDXLImagePipeline.from_model_manager(model_manager)
torch.manual_seed(0)
image = pipe(
prompt=prompt,
num_inference_steps=30, embedded_guidance=3.5
)
image.save("image_with_lora.jpg")
```

View File

@@ -0,0 +1,72 @@
# Training Hunyuan-DiT LoRA
Building the Hunyuan DiT model requires four files. You can download these files from [HuggingFace](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT) or [ModelScope](https://www.modelscope.cn/models/modelscope/HunyuanDiT/summary). You can use the following code to download these files:
```python
from diffsynth import download_models
download_models(["HunyuanDiT"])
```
```
models/HunyuanDiT/
├── Put Hunyuan DiT checkpoints here.txt
└── t2i
├── clip_text_encoder
│ └── pytorch_model.bin
├── model
│ └── pytorch_model_ema.pt
├── mt5
│ └── pytorch_model.bin
└── sdxl-vae-fp16-fix
└── diffusion_pytorch_model.bin
```
Use the following command to start the training task:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py \
--pretrained_path models/HunyuanDiT/t2i \
--dataset_path data/dog \
--output_path ./models \
--max_epochs 1 \
--steps_per_epoch 500 \
--height 1024 \
--width 1024 \
--center_crop \
--precision "16-mixed" \
--learning_rate 1e-4 \
--lora_rank 4 \
--lora_alpha 4 \
--use_gradient_checkpointing
```
For more information about the parameters, please use `python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py -h` to view detailed information.
After the training is complete, use `model_manager.load_lora` to load the LoRA for inference.
```python
from diffsynth import ModelManager, HunyuanDiTImagePipeline
import torch
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
file_path_list=[
"models/HunyuanDiT/t2i/clip_text_encoder/pytorch_model.bin",
"models/HunyuanDiT/t2i/model/pytorch_model_ema.pt",
"models/HunyuanDiT/t2i/mt5/pytorch_model.bin",
"models/HunyuanDiT/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin"
])
model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
pipe = HunyuanDiTImagePipeline.from_model_manager(model_manager)
torch.manual_seed(0)
image = pipe(
prompt="A little puppy hops and jumps playfully, surrounded by a profusion of colorful flowers, with a mountain range visible in the distance.
",
negative_prompt="",
cfg_scale=7.5,
num_inference_steps=100, width=1024, height=1024,
)
image.save("image_with_lora.jpg")
```

View File

@@ -0,0 +1,78 @@
# 训练 Kolors LoRA
以下文件将用于构建 Kolors。你可以从 [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) 或 [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors) 下载 Kolors。由于精度溢出问题我们需要下载额外的 VAE 模型(从 [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) 或 [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix))。你可以使用以下代码下载这些文件:
```python
from diffsynth import download_models
download_models(["Kolors", "SDXL-vae-fp16-fix"])
```
```
models
├── kolors
│ └── Kolors
│ ├── text_encoder
│ │ ├── config.json
│ │ ├── pytorch_model-00001-of-00007.bin
│ │ ├── pytorch_model-00002-of-00007.bin
│ │ ├── pytorch_model-00003-of-00007.bin
│ │ ├── pytorch_model-00004-of-00007.bin
│ │ ├── pytorch_model-00005-of-00007.bin
│ │ ├── pytorch_model-00006-of-00007.bin
│ │ ├── pytorch_model-00007-of-00007.bin
│ │ └── pytorch_model.bin.index.json
│ ├── unet
│ │ └── diffusion_pytorch_model.safetensors
│ └── vae
│ └── diffusion_pytorch_model.safetensors
└── sdxl-vae-fp16-fix
└── diffusion_pytorch_model.safetensors
```
使用下面的命令启动训练任务:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
--pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \
--pretrained_text_encoder_path models/kolors/Kolors/text_encoder \
--pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \
--dataset_path data/dog \
--output_path ./models \
--max_epochs 1 \
--steps_per_epoch 500 \
--height 1024 \
--width 1024 \
--center_crop \
--precision "16-mixed" \
--learning_rate 1e-4 \
--lora_rank 4 \
--lora_alpha 4 \
--use_gradient_checkpointing
```
有关参数的更多信息,请使用 `python examples/train/kolors/train_kolors_lora.py -h` 查看详细信息。
训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
```python
from diffsynth import ModelManager, SD3ImagePipeline
import torch
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"])
model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
pipe = SD3ImagePipeline.from_model_manager(model_manager)
torch.manual_seed(0)
image = pipe(
prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds",
negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
cfg_scale=7.5,
num_inference_steps=100, width=1024, height=1024,
)
image.save("image_with_lora.jpg")
```

View File

@@ -0,0 +1,59 @@
# 训练 Stable Diffusion 3 LoRA
训练脚本只需要一个文件。你可以使用 [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)(没有 T5 Encoder或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)(有 T5 Encoder。请使用以下代码下载这些文件
```python
from diffsynth import download_models
download_models(["StableDiffusion3", "StableDiffusion3_without_T5"])
```
```
models/stable_diffusion_3/
├── Put Stable Diffusion 3 checkpoints here.txt
├── sd3_medium_incl_clips.safetensors
└── sd3_medium_incl_clips_t5xxlfp16.safetensors
```
使用下面的命令启动训练任务:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora.py \
--pretrained_path models/stable_diffusion_3/sd3_medium_incl_clips.safetensors \
--dataset_path data/dog \
--output_path ./models \
--max_epochs 1 \
--steps_per_epoch 500 \
--height 1024 \
--width 1024 \
--center_crop \
--precision "16-mixed" \
--learning_rate 1e-4 \
--lora_rank 4 \
--lora_alpha 4 \
--use_gradient_checkpointing
```
有关参数的更多信息,请使用 `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` 查看详细信息。
训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
```python
from diffsynth import ModelManager, SD3ImagePipeline
import torch
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"])
model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
pipe = SD3ImagePipeline.from_model_manager(model_manager)
torch.manual_seed(0)
image = pipe(
prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds",
negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
cfg_scale=7.5,
num_inference_steps=100, width=1024, height=1024,
)
image.save("image_with_lora.jpg")
```

View File

@@ -0,0 +1,59 @@
# 训练 Stable Diffusion LoRA
训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下,我们使用基础的 Stable Diffusion v1.5。你可以从 [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors) 下载。你可以使用以下代码下载这个文件:
```python
from diffsynth import download_models
download_models(["StableDiffusion_v15"])
```
```
models/stable_diffusion
├── Put Stable Diffusion checkpoints here.txt
└── v1-5-pruned-emaonly.safetensors
```
使用以下命令启动训练任务:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py \
--pretrained_path models/stable_diffusion/v1-5-pruned-emaonly.safetensors \
--dataset_path data/dog \
--output_path ./models \
--max_epochs 1 \
--steps_per_epoch 500 \
--height 512 \
--width 512 \
--center_crop \
--precision "16-mixed" \
--learning_rate 1e-4 \
--lora_rank 4 \
--lora_alpha 4 \
--use_gradient_checkpointing
```
有关参数的更多信息,请使用 `python examples/train/stable_diffusion/train_sd_lora.py -h` 查看详细信息。
训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
```python
from diffsynth import ModelManager, SDImagePipeline
import torch
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
file_path_list=["models/stable_diffusion/v1-5-pruned-emaonly.safetensors"])
model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
pipe = SDImagePipeline.from_model_manager(model_manager)
torch.manual_seed(0)
image = pipe(
prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds",
negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
cfg_scale=7.5,
num_inference_steps=100, width=512, height=512,
)
image.save("image_with_lora.jpg")
```

View File

@@ -0,0 +1,57 @@
# 训练 Stable Diffusion XL LoRA
训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下,我们使用基础的 Stable Diffusion XL。你可以从 [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors) 下载。也可以使用以下代码下载这个文件:
```python
from diffsynth import download_models
download_models(["StableDiffusionXL_v1"])
```
```
models/stable_diffusion_xl
├── Put Stable Diffusion XL checkpoints here.txt
└── sd_xl_base_1.0.safetensors
```
我们观察到 Stable Diffusion XL 在 float16 精度下会出现数值精度溢出,因此我们建议用户使用 float32 精度训练,使用以下命令启动训练任务:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lora.py \
--pretrained_path models/stable_diffusion_xl/sd_xl_base_1.0.safetensors \
--dataset_path data/dog \
--output_path ./models \
--max_epochs 1 \
--steps_per_epoch 500 \
--height 1024 \
--width 1024 \
--center_crop \
--precision "32" \
--learning_rate 1e-4 \
--lora_rank 4 \
--lora_alpha 4 \
--use_gradient_checkpointing
```
有关参数的更多信息,请使用 `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` 查看详细信息。
训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
```python
from diffsynth import ModelManager, SDXLImagePipeline
import torch
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
file_path_list=["models/stable_diffusion_xl/sd_xl_base_1.0.safetensors"])
model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
pipe = SDXLImagePipeline.from_model_manager(model_manager)
torch.manual_seed(0)
image = pipe(
prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds",
negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
cfg_scale=7.5,
num_inference_steps=100, width=1024, height=1024,
)
image.save("image_with_lora.jpg")
```