Add files via upload

电脑更换，完成到D:\translate\DiffSynth-Studio\docs\source_en\finetune，该写第四个文档
2026-03-22 00:38:11 +00:00 · 2024-10-18 11:36:48 +08:00
parent 793062e141
commit 24b78148b8
46 changed files with 2863 additions and 1995 deletions
--- a/docs/source_en/GetStarted/A_simple_example.md
+++ b/docs/source_en/GetStarted/A_simple_example.md
@@ -1,82 +1,82 @@
-
-# A Simple Example: Text-to-Image Synthesis with Flux
-
-The following example shows how to use the FLUX.1 model for text-to-image tasks. The script provides a simple setup for generating images from text descriptions. It covers downloading the necessary models, configuring the pipeline, and generating images with and without classifier-free guidance.
-
-For other models supported by DiffSynth, see [Models.md](Models.md).
-
-## Setup
-
-First, ensure you have the necessary models downloaded and configured:
-
-```python
-import torch
-from diffsynth import ModelManager, FluxImagePipeline, download_models
-
-# Download the FLUX.1-dev model files
-download_models(["FLUX.1-dev"])
-```
-
-For instructions on downloading models, see [Download_models.md](Download_models.md).
-
-## Loading Models
-Initialize the model manager with your device and data type:
-
-```python
-model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
-model_manager.load_models([
-    "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
-    "models/FLUX/FLUX.1-dev/text_encoder_2",
-    "models/FLUX/FLUX.1-dev/ae.safetensors",
-    "models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
-])
-```
-
-For instructions on loading models, see [ModelManager.md](ModelManager.md).
-
-## Creating the Pipeline
-Create an instance of the FluxImagePipeline from the loaded model manager:
-
-
-```python
-pipe = FluxImagePipeline.from_model_manager(model_manager)
-```
-
-For instructions on using the Pipeline, see [Pipeline.md](Pipeline.md).
-## Text-to-Image Synthesis
-Generate an image using a short prompt. Below are examples of generating images with and without classifier-free guidance.
-
-### Basic Generation
-```python
-prompt = "A cute little turtle"
-negative_prompt = ""
-
-torch.manual_seed(6)
-image = pipe(
-    prompt=prompt,
-    num_inference_steps=30, embedded_guidance=3.5
-)
-image.save("image_1024.jpg")
-```
-
-### Generation with Classifier-Free Guidance
-```python
-torch.manual_seed(6)
-image = pipe(
-    prompt=prompt, negative_prompt=negative_prompt,
-    num_inference_steps=30, cfg_scale=2.0, embedded_guidance=3.5
-)
-image.save("image_1024_cfg.jpg")
-```
-
-### High-Resolution Fix
-```python
-torch.manual_seed(7)
-image = pipe(
-    prompt=prompt,
-    num_inference_steps=30, embedded_guidance=3.5,
-    input_image=image.resize((2048, 2048)), height=2048, width=2048, denoising_strength=0.6, tiled=True
-)
-image.save("image_2048_highres.jpg")
-```
-
+
+# A Simple Example: Text-to-Image Synthesis with Flux
+
+The following example shows how to use the FLUX.1 model for text-to-image tasks. The script provides a simple setup for generating images from text descriptions. It covers downloading the necessary models, configuring the pipeline, and generating images with and without classifier-free guidance.
+
+For other models supported by DiffSynth, see [Models.md](Models.md).
+
+## Setup
+
+First, ensure you have the necessary models downloaded and configured:
+
+```python
+import torch
+from diffsynth import ModelManager, FluxImagePipeline, download_models
+
+# Download the FLUX.1-dev model files
+download_models(["FLUX.1-dev"])
+```
+
+For instructions on downloading models, see [Download_models.md](Download_models.md).
+
+## Loading Models
+Initialize the model manager with your device and data type:
+
+```python
+model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
+model_manager.load_models([
+    "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
+    "models/FLUX/FLUX.1-dev/text_encoder_2",
+    "models/FLUX/FLUX.1-dev/ae.safetensors",
+    "models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
+])
+```
+
+For instructions on loading models, see [ModelManager.md](ModelManager.md).
+
+## Creating the Pipeline
+Create an instance of the FluxImagePipeline from the loaded model manager:
+
+
+```python
+pipe = FluxImagePipeline.from_model_manager(model_manager)
+```
+
+For instructions on using the Pipeline, see [Pipeline.md](Pipeline.md).
+## Text-to-Image Synthesis
+Generate an image using a short prompt. Below are examples of generating images with and without classifier-free guidance.
+
+### Basic Generation
+```python
+prompt = "A cute little turtle"
+negative_prompt = ""
+
+torch.manual_seed(6)
+image = pipe(
+    prompt=prompt,
+    num_inference_steps=30, embedded_guidance=3.5
+)
+image.save("image_1024.jpg")
+```
+
+### Generation with Classifier-Free Guidance
+```python
+torch.manual_seed(6)
+image = pipe(
+    prompt=prompt, negative_prompt=negative_prompt,
+    num_inference_steps=30, cfg_scale=2.0, embedded_guidance=3.5
+)
+image.save("image_1024_cfg.jpg")
+```
+
+### High-Resolution Fix
+```python
+torch.manual_seed(7)
+image = pipe(
+    prompt=prompt,
+    num_inference_steps=30, embedded_guidance=3.5,
+    input_image=image.resize((2048, 2048)), height=2048, width=2048, denoising_strength=0.6, tiled=True
+)
+image.save("image_2048_highres.jpg")
+```
+
--- a/docs/source_en/GetStarted/Download_models.md
+++ b/docs/source_en/GetStarted/Download_models.md
@@ -1,20 +1,20 @@
-# Download Models
-
-Download the pre-set models. Model IDs can be found in [config file](/diffsynth/configs/model_config.py).
-
-```python
-from diffsynth import download_models
-
-download_models(["FLUX.1-dev", "Kolors"])
-```
-
-To download non-pre-set models, you can choose models from either the [ModelScope](https://modelscope.cn/models) or [HuggingFace](https://huggingface.co/models) sources.
-
-```python
-from diffsynth.models.downloader import download_from_huggingface, download_from_modelscope
-
-# From Modelscope (recommended)
-download_from_modelscope("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.bin", "models/kolors/Kolors/vae")
-# From Huggingface
-download_from_huggingface("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.safetensors", "models/kolors/Kolors/vae")
-```
+# Download Models
+
+Download the pre-set models. Model IDs can be found in [config file](/diffsynth/configs/model_config.py).
+
+```python
+from diffsynth import download_models
+
+download_models(["FLUX.1-dev", "Kolors"])
+```
+
+To download non-pre-set models, you can choose models from either the [ModelScope](https://modelscope.cn/models) or [HuggingFace](https://huggingface.co/models) sources.
+
+```python
+from diffsynth.models.downloader import download_from_huggingface, download_from_modelscope
+
+# From Modelscope (recommended)
+download_from_modelscope("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.bin", "models/kolors/Kolors/vae")
+# From Huggingface
+download_from_huggingface("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.safetensors", "models/kolors/Kolors/vae")
+```
--- a/docs/source_en/GetStarted/Extensions.md
+++ b/docs/source_en/GetStarted/Extensions.md
@@ -1,10 +1,10 @@
-# Extensions
-
-This document introduces some relevant techniques beyond the diffusion models implemented in DiffSynth, which have significant application potential in image and video processing.
-
- **[RIFE](https://github.com/hzwer/ECCV2022-RIFE)**: FIRE (Real-Time Intermediate Flow Estimation Algorithm) is a frame interpolation (VFI) method based on real-time intermediate flow estimation. It includes an end-to-end efficient intermediate flow estimation network called IFNet, as well as an optical flow supervision framework based on privileged distillation. RIFE supports inserting frames at any moment between two frames, achieving state-of-the-art performance across multiple datasets without relying on any pre-trained models.
-
- **[ESRGAN](https://github.com/xinntao/ESRGAN)**: ESRGAN (Enhanced Super Resolution Generative Adversarial Network) is an improved method based on SRGAN, aimed at enhancing the visual quality of single image super-resolution. This approach significantly improves the realism of generated images by optimizing three key components of SRGAN: network architecture, adversarial loss, and perceptual loss.
-
- **[FastBlend](https://arxiv.org/abs/2311.09265)**: FastBlend is a model-free toolkit designed for smoothing videos, integrated with Diffusion models to create a powerful video processing workflow. This tool effectively eliminates flickering in videos, performs interpolation on keyframe sequences, and can process complete videos based on a single image.
-
+# Extensions
+
+This document introduces some relevant techniques beyond the diffusion models implemented in DiffSynth, which have significant application potential in image and video processing.
+
+- **[RIFE](https://github.com/hzwer/ECCV2022-RIFE)**: FIRE (Real-Time Intermediate Flow Estimation Algorithm) is a frame interpolation (VFI) method based on real-time intermediate flow estimation. It includes an end-to-end efficient intermediate flow estimation network called IFNet, as well as an optical flow supervision framework based on privileged distillation. RIFE supports inserting frames at any moment between two frames, achieving state-of-the-art performance across multiple datasets without relying on any pre-trained models.
+
+- **[ESRGAN](https://github.com/xinntao/ESRGAN)**: ESRGAN (Enhanced Super Resolution Generative Adversarial Network) is an improved method based on SRGAN, aimed at enhancing the visual quality of single image super-resolution. This approach significantly improves the realism of generated images by optimizing three key components of SRGAN: network architecture, adversarial loss, and perceptual loss.
+
+- **[FastBlend](https://arxiv.org/abs/2311.09265)**: FastBlend is a model-free toolkit designed for smoothing videos, integrated with Diffusion models to create a powerful video processing workflow. This tool effectively eliminates flickering in videos, performs interpolation on keyframe sequences, and can process complete videos based on a single image.
+
--- a/docs/source_en/GetStarted/Fine-tuning.md
+++ b/docs/source_en/GetStarted/Fine-tuning.md
@@ -1,426 +1,426 @@
-# Fine-Tuning
-
-We have implemented a training framework for text-to-image Diffusion models, enabling users to easily train LoRA models using our framework. Our provided scripts come with the following advantages:
-
-* **Comprehensive Functionality & User-Friendliness**: Our training framework supports multi-GPU and multi-machine setups, facilitates the use of DeepSpeed for acceleration, and includes gradient checkpointing optimizations for models with excessive memory demands.
-* **Code Conciseness & Researcher Accessibility**: We avoid large blocks of complicated code. General-purpose modules are implemented in `diffsynth/trainers/text_to_image.py`, while model-specific training scripts contain only minimal code pertinent to the model architecture, making it researcher-friendly.
-* **Modular Design & Developer Flexibility**: Built on the universal Pytorch-Lightning framework, our training framework is decoupled in terms of functionality, allowing developers to easily introduce additional training techniques by modifying our scripts to suit their needs.
-
-Image Examples of fine-tuned LoRA. The prompt is "一只小狗蹦蹦跳跳，周围是姹紫嫣红的鲜花，远处是山脉" (for Chinese models) or "a dog is jumping, flowers around the dog, the background is mountains and clouds" (for English models).
-
-||Kolors|Stable Diffusion 3|Hunyuan-DiT|
-|-|-|-|-|
-|Without LoRA|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/9d79ed7a-e8cf-4d98-800a-f182809db318)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/ddb834a5-6366-412b-93dc-6d957230d66e)|![image_without_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/1aa21de5-a992-4b66-b14f-caa44e08876e)|
-|With LoRA|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/02f62323-6ee5-4788-97a1-549732dbe4f0)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/8e7b2888-d874-4da4-a75b-11b6b214b9bf)|![image_with_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/83a0a41a-691f-4610-8e7b-d8e17c50a282)|
-
-## Install additional packages
-
-```bash
-pip install peft lightning
-```
-
-## Prepare your dataset
-
-We provide an example dataset [here](https://modelscope.cn/datasets/buptwq/lora-stable-diffusion-finetune/files). You need to manage the training images as follows:
-
-```
-data/dog/
-└── train
-    ├── 00.jpg
-    ├── 01.jpg
-    ├── 02.jpg
-    ├── 03.jpg
-    ├── 04.jpg
-    └── metadata.csv
-```
-
-`metadata.csv`:
-
-```
-file_name,text
-00.jpg,a dog
-01.jpg,a dog
-02.jpg,a dog
-03.jpg,a dog
-04.jpg,a dog
-```
-
-Note that if the model is Chinese model (for example, Hunyuan-DiT and Kolors), we recommand to use Chinese texts in the dataset. For example
-
-```
-file_name,text
-00.jpg,一只小狗
-01.jpg,一只小狗
-02.jpg,一只小狗
-03.jpg,一只小狗
-04.jpg,一只小狗
-```
-
-## Train a LoRA model
-
-General options:
-
-```
-  --lora_target_modules LORA_TARGET_MODULES
-                        Layers with LoRA modules.
-  --dataset_path DATASET_PATH
-                        The path of the Dataset.
-  --output_path OUTPUT_PATH
-                        Path to save the model.
-  --steps_per_epoch STEPS_PER_EPOCH
-                        Number of steps per epoch.
-  --height HEIGHT       Image height.
-  --width WIDTH         Image width.
-  --center_crop         Whether to center crop the input images to the resolution. If not set, the images will be randomly cropped. The images will be resized to the resolution first before cropping.
-  --random_flip         Whether to randomly flip images horizontally
-  --batch_size BATCH_SIZE
-                        Batch size (per device) for the training dataloader.
-  --dataloader_num_workers DATALOADER_NUM_WORKERS
-                        Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.
-  --precision {32,16,16-mixed}
-                        Training precision
-  --learning_rate LEARNING_RATE
-                        Learning rate.
-  --lora_rank LORA_RANK
-                        The dimension of the LoRA update matrices.
-  --lora_alpha LORA_ALPHA
-                        The weight of the LoRA update matrices.
-  --use_gradient_checkpointing
-                        Whether to use gradient checkpointing.
-  --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
-                        The number of batches in gradient accumulation.
-  --training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3}
-                        Training strategy
-  --max_epochs MAX_EPOCHS
-                        Number of epochs.
-  --modelscope_model_id MODELSCOPE_MODEL_ID
-                        Model ID on ModelScope (https://www.modelscope.cn/). The model will be uploaded to ModelScope automatically if you provide a Model ID.
-  --modelscope_access_token MODELSCOPE_ACCESS_TOKEN
-                        Access key on ModelScope (https://www.modelscope.cn/). Required if you want to upload the model to ModelScope.
-```
-
-### Kolors
-
-The following files will be used for constructing Kolors. You can download Kolors from [huggingface](https://huggingface.co/Kwai-Kolors/Kolors) or [modelscope](https://modelscope.cn/models/Kwai-Kolors/Kolors). Due to precision overflow issues, we need to download an additional VAE model (from [huggingface](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) or [modelscope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix)). You can use the following code to download these files:
-
-```python
-from diffsynth import download_models
-
-download_models(["Kolors", "SDXL-vae-fp16-fix"])
-```
-
-```
-models
-├── kolors
-│   └── Kolors
-│       ├── text_encoder
-│       │   ├── config.json
-│       │   ├── pytorch_model-00001-of-00007.bin
-│       │   ├── pytorch_model-00002-of-00007.bin
-│       │   ├── pytorch_model-00003-of-00007.bin
-│       │   ├── pytorch_model-00004-of-00007.bin
-│       │   ├── pytorch_model-00005-of-00007.bin
-│       │   ├── pytorch_model-00006-of-00007.bin
-│       │   ├── pytorch_model-00007-of-00007.bin
-│       │   └── pytorch_model.bin.index.json
-│       ├── unet
-│       │   └── diffusion_pytorch_model.safetensors
-│       └── vae
-│           └── diffusion_pytorch_model.safetensors
-└── sdxl-vae-fp16-fix
-    └── diffusion_pytorch_model.safetensors
-```
-
-Launch the training task using the following command:
-
-```
-CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
-  --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \
-  --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \
-  --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \
-  --dataset_path data/dog \
-  --output_path ./models \
-  --max_epochs 1 \
-  --steps_per_epoch 500 \
-  --height 1024 \
-  --width 1024 \
-  --center_crop \
-  --precision "16-mixed" \
-  --learning_rate 1e-4 \
-  --lora_rank 4 \
-  --lora_alpha 4 \
-  --use_gradient_checkpointing
-```
-
-For more information about the parameters, please use `python examples/train/kolors/train_kolors_lora.py -h` to see the details.
-
-After training, use `model_manager.load_lora` to load the LoRA for inference.
-
-```python
-from diffsynth import ModelManager, SDXLImagePipeline
-import torch
-
-model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
-                             file_path_list=[
-                                 "models/kolors/Kolors/text_encoder",
-                                 "models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors",
-                                 "models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors"
-                             ])
-model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
-pipe = SDXLImagePipeline.from_model_manager(model_manager)
-
-torch.manual_seed(0)
-image = pipe(
-    prompt="一只小狗蹦蹦跳跳，周围是姹紫嫣红的鲜花，远处是山脉", 
-    negative_prompt="",
-    cfg_scale=7.5,
-    num_inference_steps=100, width=1024, height=1024,
-)
-image.save("image_with_lora.jpg")
-```
-
-### Stable Diffusion 3
-
-Only one file is required in the training script. You can use [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors) (without T5 encoder) or [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors) (with T5 encoder). Please use the following code to download these files:
-
-```python
-from diffsynth import download_models
-
-download_models(["StableDiffusion3", "StableDiffusion3_without_T5"])
-```
-
-```
-models/stable_diffusion_3/
-├── Put Stable Diffusion 3 checkpoints here.txt
-├── sd3_medium_incl_clips.safetensors
-└── sd3_medium_incl_clips_t5xxlfp16.safetensors
-```
-
-Launch the training task using the following command:
-
-```
-CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora.py \
-  --pretrained_path models/stable_diffusion_3/sd3_medium_incl_clips.safetensors \
-  --dataset_path data/dog \
-  --output_path ./models \
-  --max_epochs 1 \
-  --steps_per_epoch 500 \
-  --height 1024 \
-  --width 1024 \
-  --center_crop \
-  --precision "16-mixed" \
-  --learning_rate 1e-4 \
-  --lora_rank 4 \
-  --lora_alpha 4 \
-  --use_gradient_checkpointing
-```
-
-For more information about the parameters, please use `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` to see the details.
-
-After training, use `model_manager.load_lora` to load the LoRA for inference.
-
-```python
-from diffsynth import ModelManager, SD3ImagePipeline
-import torch
-
-model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
-                             file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"])
-model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
-pipe = SD3ImagePipeline.from_model_manager(model_manager)
-
-torch.manual_seed(0)
-image = pipe(
-    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
-    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
-    cfg_scale=7.5,
-    num_inference_steps=100, width=1024, height=1024,
-)
-image.save("image_with_lora.jpg")
-```
-
-### Hunyuan-DiT
-
-Four files will be used for constructing Hunyuan DiT. You can download them from [huggingface](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT) or [modelscope](https://www.modelscope.cn/models/modelscope/HunyuanDiT/summary). You can use the following code to download these files:
-
-```python
-from diffsynth import download_models
-
-download_models(["HunyuanDiT"])
-```
-
-```
-models/HunyuanDiT/
-├── Put Hunyuan DiT checkpoints here.txt
-└── t2i
-    ├── clip_text_encoder
-    │   └── pytorch_model.bin
-    ├── model
-    │   └── pytorch_model_ema.pt
-    ├── mt5
-    │   └── pytorch_model.bin
-    └── sdxl-vae-fp16-fix
-        └── diffusion_pytorch_model.bin
-```
-
-Launch the training task using the following command:
-
-```
-CUDA_VISIBLE_DEVICES="0" python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py \
-  --pretrained_path models/HunyuanDiT/t2i \
-  --dataset_path data/dog \
-  --output_path ./models \
-  --max_epochs 1 \
-  --steps_per_epoch 500 \
-  --height 1024 \
-  --width 1024 \
-  --center_crop \
-  --precision "16-mixed" \
-  --learning_rate 1e-4 \
-  --lora_rank 4 \
-  --lora_alpha 4 \
-  --use_gradient_checkpointing
-```
-
-For more information about the parameters, please use `python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py -h` to see the details.
-
-After training, use `model_manager.load_lora` to load the LoRA for inference.
-
-```python
-from diffsynth import ModelManager, HunyuanDiTImagePipeline
-import torch
-
-model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
-                             file_path_list=[
-                                 "models/HunyuanDiT/t2i/clip_text_encoder/pytorch_model.bin",
-                                 "models/HunyuanDiT/t2i/model/pytorch_model_ema.pt",
-                                 "models/HunyuanDiT/t2i/mt5/pytorch_model.bin",
-                                 "models/HunyuanDiT/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin"
-                             ])
-model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
-pipe = HunyuanDiTImagePipeline.from_model_manager(model_manager)
-
-torch.manual_seed(0)
-image = pipe(
-    prompt="一只小狗蹦蹦跳跳，周围是姹紫嫣红的鲜花，远处是山脉", 
-    negative_prompt="",
-    cfg_scale=7.5,
-    num_inference_steps=100, width=1024, height=1024,
-)
-image.save("image_with_lora.jpg")
-```
-
-### Stable Diffusion
-
-Only one file is required in the training script. We support the mainstream checkpoints in [CivitAI](https://civitai.com/). By default, we use the base Stable Diffusion v1.5. You can download it from [huggingface](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) or [modelscope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors). You can use the following code to download this file:
-
-```python
-from diffsynth import download_models
-
-download_models(["StableDiffusion_v15"])
-```
-
-```
-models/stable_diffusion
-├── Put Stable Diffusion checkpoints here.txt
-└── v1-5-pruned-emaonly.safetensors
-```
-
-Launch the training task using the following command:
-
-```
-CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py \
-  --pretrained_path models/stable_diffusion/v1-5-pruned-emaonly.safetensors \
-  --dataset_path data/dog \
-  --output_path ./models \
-  --max_epochs 1 \
-  --steps_per_epoch 500 \
-  --height 512 \
-  --width 512 \
-  --center_crop \
-  --precision "16-mixed" \
-  --learning_rate 1e-4 \
-  --lora_rank 4 \
-  --lora_alpha 4 \
-  --use_gradient_checkpointing
-```
-
-For more information about the parameters, please use `python examples/train/stable_diffusion/train_sd_lora.py -h` to see the details.
-
-After training, use `model_manager.load_lora` to load the LoRA for inference.
-
-```python
-from diffsynth import ModelManager, SDImagePipeline
-import torch
-
-model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
-                             file_path_list=["models/stable_diffusion/v1-5-pruned-emaonly.safetensors"])
-model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
-pipe = SDImagePipeline.from_model_manager(model_manager)
-
-torch.manual_seed(0)
-image = pipe(
-    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
-    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
-    cfg_scale=7.5,
-    num_inference_steps=100, width=512, height=512,
-)
-image.save("image_with_lora.jpg")
-```
-
-### Stable Diffusion XL
-
-Only one file is required in the training script. We support the mainstream checkpoints in [CivitAI](https://civitai.com/). By default, we use the base Stable Diffusion XL. You can download it from [huggingface](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) or [modelscope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors). You can use the following code to download this file:
-
-```python
-from diffsynth import download_models
-
-download_models(["StableDiffusionXL_v1"])
-```
-
-```
-models/stable_diffusion_xl
-├── Put Stable Diffusion XL checkpoints here.txt
-└── sd_xl_base_1.0.safetensors
-```
-
-We observed that Stable Diffusion XL is not float16-safe, thus we recommand users to use float32.
-
-```
-CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lora.py \
-  --pretrained_path models/stable_diffusion_xl/sd_xl_base_1.0.safetensors \
-  --dataset_path data/dog \
-  --output_path ./models \
-  --max_epochs 1 \
-  --steps_per_epoch 500 \
-  --height 1024 \
-  --width 1024 \
-  --center_crop \
-  --precision "32" \
-  --learning_rate 1e-4 \
-  --lora_rank 4 \
-  --lora_alpha 4 \
-  --use_gradient_checkpointing
-```
-
-For more information about the parameters, please use `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` to see the details.
-
-After training, use `model_manager.load_lora` to load the LoRA for inference.
-
-```python
-from diffsynth import ModelManager, SDXLImagePipeline
-import torch
-
-model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
-                             file_path_list=["models/stable_diffusion_xl/sd_xl_base_1.0.safetensors"])
-model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
-pipe = SDXLImagePipeline.from_model_manager(model_manager)
-
-torch.manual_seed(0)
-image = pipe(
-    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
-    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
-    cfg_scale=7.5,
-    num_inference_steps=100, width=1024, height=1024,
-)
-image.save("image_with_lora.jpg")
-```
+# Fine-Tuning
+
+We have implemented a training framework for text-to-image Diffusion models, enabling users to easily train LoRA models using our framework. Our provided scripts come with the following advantages:
+
+* **Comprehensive Functionality & User-Friendliness**: Our training framework supports multi-GPU and multi-machine setups, facilitates the use of DeepSpeed for acceleration, and includes gradient checkpointing optimizations for models with excessive memory demands.
+* **Code Conciseness & Researcher Accessibility**: We avoid large blocks of complicated code. General-purpose modules are implemented in `diffsynth/trainers/text_to_image.py`, while model-specific training scripts contain only minimal code pertinent to the model architecture, making it researcher-friendly.
+* **Modular Design & Developer Flexibility**: Built on the universal Pytorch-Lightning framework, our training framework is decoupled in terms of functionality, allowing developers to easily introduce additional training techniques by modifying our scripts to suit their needs.
+
+Image Examples of fine-tuned LoRA. The prompt is "一只小狗蹦蹦跳跳，周围是姹紫嫣红的鲜花，远处是山脉" (for Chinese models) or "a dog is jumping, flowers around the dog, the background is mountains and clouds" (for English models).
+
+||Kolors|Stable Diffusion 3|Hunyuan-DiT|
+|-|-|-|-|
+|Without LoRA|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/9d79ed7a-e8cf-4d98-800a-f182809db318)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/ddb834a5-6366-412b-93dc-6d957230d66e)|![image_without_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/1aa21de5-a992-4b66-b14f-caa44e08876e)|
+|With LoRA|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/02f62323-6ee5-4788-97a1-549732dbe4f0)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/8e7b2888-d874-4da4-a75b-11b6b214b9bf)|![image_with_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/83a0a41a-691f-4610-8e7b-d8e17c50a282)|
+
+## Install additional packages
+
+```bash
+pip install peft lightning
+```
+
+## Prepare your dataset
+
+We provide an example dataset [here](https://modelscope.cn/datasets/buptwq/lora-stable-diffusion-finetune/files). You need to manage the training images as follows:
+
+```
+data/dog/
+└── train
+    ├── 00.jpg
+    ├── 01.jpg
+    ├── 02.jpg
+    ├── 03.jpg
+    ├── 04.jpg
+    └── metadata.csv
+```
+
+`metadata.csv`:
+
+```
+file_name,text
+00.jpg,a dog
+01.jpg,a dog
+02.jpg,a dog
+03.jpg,a dog
+04.jpg,a dog
+```
+
+Note that if the model is Chinese model (for example, Hunyuan-DiT and Kolors), we recommand to use Chinese texts in the dataset. For example
+
+```
+file_name,text
+00.jpg,一只小狗
+01.jpg,一只小狗
+02.jpg,一只小狗
+03.jpg,一只小狗
+04.jpg,一只小狗
+```
+
+## Train a LoRA model
+
+General options:
+
+```
+  --lora_target_modules LORA_TARGET_MODULES
+                        Layers with LoRA modules.
+  --dataset_path DATASET_PATH
+                        The path of the Dataset.
+  --output_path OUTPUT_PATH
+                        Path to save the model.
+  --steps_per_epoch STEPS_PER_EPOCH
+                        Number of steps per epoch.
+  --height HEIGHT       Image height.
+  --width WIDTH         Image width.
+  --center_crop         Whether to center crop the input images to the resolution. If not set, the images will be randomly cropped. The images will be resized to the resolution first before cropping.
+  --random_flip         Whether to randomly flip images horizontally
+  --batch_size BATCH_SIZE
+                        Batch size (per device) for the training dataloader.
+  --dataloader_num_workers DATALOADER_NUM_WORKERS
+                        Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.
+  --precision {32,16,16-mixed}
+                        Training precision
+  --learning_rate LEARNING_RATE
+                        Learning rate.
+  --lora_rank LORA_RANK
+                        The dimension of the LoRA update matrices.
+  --lora_alpha LORA_ALPHA
+                        The weight of the LoRA update matrices.
+  --use_gradient_checkpointing
+                        Whether to use gradient checkpointing.
+  --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
+                        The number of batches in gradient accumulation.
+  --training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3}
+                        Training strategy
+  --max_epochs MAX_EPOCHS
+                        Number of epochs.
+  --modelscope_model_id MODELSCOPE_MODEL_ID
+                        Model ID on ModelScope (https://www.modelscope.cn/). The model will be uploaded to ModelScope automatically if you provide a Model ID.
+  --modelscope_access_token MODELSCOPE_ACCESS_TOKEN
+                        Access key on ModelScope (https://www.modelscope.cn/). Required if you want to upload the model to ModelScope.
+```
+
+### Kolors
+
+The following files will be used for constructing Kolors. You can download Kolors from [huggingface](https://huggingface.co/Kwai-Kolors/Kolors) or [modelscope](https://modelscope.cn/models/Kwai-Kolors/Kolors). Due to precision overflow issues, we need to download an additional VAE model (from [huggingface](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) or [modelscope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix)). You can use the following code to download these files:
+
+```python
+from diffsynth import download_models
+
+download_models(["Kolors", "SDXL-vae-fp16-fix"])
+```
+
+```
+models
+├── kolors
+│   └── Kolors
+│       ├── text_encoder
+│       │   ├── config.json
+│       │   ├── pytorch_model-00001-of-00007.bin
+│       │   ├── pytorch_model-00002-of-00007.bin
+│       │   ├── pytorch_model-00003-of-00007.bin
+│       │   ├── pytorch_model-00004-of-00007.bin
+│       │   ├── pytorch_model-00005-of-00007.bin
+│       │   ├── pytorch_model-00006-of-00007.bin
+│       │   ├── pytorch_model-00007-of-00007.bin
+│       │   └── pytorch_model.bin.index.json
+│       ├── unet
+│       │   └── diffusion_pytorch_model.safetensors
+│       └── vae
+│           └── diffusion_pytorch_model.safetensors
+└── sdxl-vae-fp16-fix
+    └── diffusion_pytorch_model.safetensors
+```
+
+Launch the training task using the following command:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
+  --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \
+  --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \
+  --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information about the parameters, please use `python examples/train/kolors/train_kolors_lora.py -h` to see the details.
+
+After training, use `model_manager.load_lora` to load the LoRA for inference.
+
+```python
+from diffsynth import ModelManager, SDXLImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=[
+                                 "models/kolors/Kolors/text_encoder",
+                                 "models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors",
+                                 "models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors"
+                             ])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SDXLImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="一只小狗蹦蹦跳跳，周围是姹紫嫣红的鲜花，远处是山脉", 
+    negative_prompt="",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
+
+### Stable Diffusion 3
+
+Only one file is required in the training script. You can use [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors) (without T5 encoder) or [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors) (with T5 encoder). Please use the following code to download these files:
+
+```python
+from diffsynth import download_models
+
+download_models(["StableDiffusion3", "StableDiffusion3_without_T5"])
+```
+
+```
+models/stable_diffusion_3/
+├── Put Stable Diffusion 3 checkpoints here.txt
+├── sd3_medium_incl_clips.safetensors
+└── sd3_medium_incl_clips_t5xxlfp16.safetensors
+```
+
+Launch the training task using the following command:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora.py \
+  --pretrained_path models/stable_diffusion_3/sd3_medium_incl_clips.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information about the parameters, please use `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` to see the details.
+
+After training, use `model_manager.load_lora` to load the LoRA for inference.
+
+```python
+from diffsynth import ModelManager, SD3ImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SD3ImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
+
+### Hunyuan-DiT
+
+Four files will be used for constructing Hunyuan DiT. You can download them from [huggingface](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT) or [modelscope](https://www.modelscope.cn/models/modelscope/HunyuanDiT/summary). You can use the following code to download these files:
+
+```python
+from diffsynth import download_models
+
+download_models(["HunyuanDiT"])
+```
+
+```
+models/HunyuanDiT/
+├── Put Hunyuan DiT checkpoints here.txt
+└── t2i
+    ├── clip_text_encoder
+    │   └── pytorch_model.bin
+    ├── model
+    │   └── pytorch_model_ema.pt
+    ├── mt5
+    │   └── pytorch_model.bin
+    └── sdxl-vae-fp16-fix
+        └── diffusion_pytorch_model.bin
+```
+
+Launch the training task using the following command:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py \
+  --pretrained_path models/HunyuanDiT/t2i \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information about the parameters, please use `python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py -h` to see the details.
+
+After training, use `model_manager.load_lora` to load the LoRA for inference.
+
+```python
+from diffsynth import ModelManager, HunyuanDiTImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=[
+                                 "models/HunyuanDiT/t2i/clip_text_encoder/pytorch_model.bin",
+                                 "models/HunyuanDiT/t2i/model/pytorch_model_ema.pt",
+                                 "models/HunyuanDiT/t2i/mt5/pytorch_model.bin",
+                                 "models/HunyuanDiT/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin"
+                             ])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = HunyuanDiTImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="一只小狗蹦蹦跳跳，周围是姹紫嫣红的鲜花，远处是山脉", 
+    negative_prompt="",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
+
+### Stable Diffusion
+
+Only one file is required in the training script. We support the mainstream checkpoints in [CivitAI](https://civitai.com/). By default, we use the base Stable Diffusion v1.5. You can download it from [huggingface](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) or [modelscope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors). You can use the following code to download this file:
+
+```python
+from diffsynth import download_models
+
+download_models(["StableDiffusion_v15"])
+```
+
+```
+models/stable_diffusion
+├── Put Stable Diffusion checkpoints here.txt
+└── v1-5-pruned-emaonly.safetensors
+```
+
+Launch the training task using the following command:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py \
+  --pretrained_path models/stable_diffusion/v1-5-pruned-emaonly.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 512 \
+  --width 512 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information about the parameters, please use `python examples/train/stable_diffusion/train_sd_lora.py -h` to see the details.
+
+After training, use `model_manager.load_lora` to load the LoRA for inference.
+
+```python
+from diffsynth import ModelManager, SDImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion/v1-5-pruned-emaonly.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SDImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=512, height=512,
+)
+image.save("image_with_lora.jpg")
+```
+
+### Stable Diffusion XL
+
+Only one file is required in the training script. We support the mainstream checkpoints in [CivitAI](https://civitai.com/). By default, we use the base Stable Diffusion XL. You can download it from [huggingface](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) or [modelscope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors). You can use the following code to download this file:
+
+```python
+from diffsynth import download_models
+
+download_models(["StableDiffusionXL_v1"])
+```
+
+```
+models/stable_diffusion_xl
+├── Put Stable Diffusion XL checkpoints here.txt
+└── sd_xl_base_1.0.safetensors
+```
+
+We observed that Stable Diffusion XL is not float16-safe, thus we recommand users to use float32.
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lora.py \
+  --pretrained_path models/stable_diffusion_xl/sd_xl_base_1.0.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "32" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information about the parameters, please use `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` to see the details.
+
+After training, use `model_manager.load_lora` to load the LoRA for inference.
+
+```python
+from diffsynth import ModelManager, SDXLImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion_xl/sd_xl_base_1.0.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SDXLImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/GetStarted/Installation.md
+++ b/docs/source_en/GetStarted/Installation.md
@@ -1,24 +1,24 @@
-# Installation
-
-## From Source
-
-1. Clone the source repository:
-
-    ```bash
-    git clone https://github.com/modelscope/DiffSynth-Studio.git
-    ```
-
-2. Navigate to the project directory and install:
-
-    ```bash
-    cd DiffSynth-Studio
-    pip install -e .
-    ```
-
-## From PyPI
-
-Install directly via PyPI:
-
-```bash
-pip install diffsynth
+# Installation
+
+## From Source
+
+1. Clone the source repository:
+
+    ```bash
+    git clone https://github.com/modelscope/DiffSynth-Studio.git
+    ```
+
+2. Navigate to the project directory and install:
+
+    ```bash
+    cd DiffSynth-Studio
+    pip install -e .
+    ```
+
+## From PyPI
+
+Install directly via PyPI:
+
+```bash
+pip install diffsynth
 ```
--- a/docs/source_en/GetStarted/Models.md
+++ b/docs/source_en/GetStarted/Models.md
@@ -1,17 +1,17 @@
-# Models
-
-Until now, DiffSynth Studio has supported the following models:
-
-* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
-* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
-* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
-* [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
-* [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)
-* [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)
-* [RIFE](https://github.com/hzwer/ECCV2022-RIFE)
-* [ESRGAN](https://github.com/xinntao/ESRGAN)
-* [Ip-Adapter](https://github.com/tencent-ailab/IP-Adapter)
-* [AnimateDiff](https://github.com/guoyww/animatediff/)
-* [ControlNet](https://github.com/lllyasviel/ControlNet)
-* [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
-* [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)
+# Models
+
+Until now, DiffSynth Studio has supported the following models:
+
+* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
+* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
+* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
+* [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
+* [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)
+* [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)
+* [RIFE](https://github.com/hzwer/ECCV2022-RIFE)
+* [ESRGAN](https://github.com/xinntao/ESRGAN)
+* [Ip-Adapter](https://github.com/tencent-ailab/IP-Adapter)
+* [AnimateDiff](https://github.com/guoyww/animatediff/)
+* [ControlNet](https://github.com/lllyasviel/ControlNet)
+* [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
+* [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)
--- a/docs/source_en/GetStarted/Pipelines.md
+++ b/docs/source_en/GetStarted/Pipelines.md
@@ -1,27 +1,27 @@
-# Pipelines
-
-So far, the following table lists our pipelines and the models supported by each pipeline.
-
-## Image Pipelines
-
-Pipelines for generating images from text descriptions. Each pipeline relies on specific encoder and decoder models.
-
-| Pipeline                   | Models                                                     |
-|----------------------------|----------------------------------------------------------------|
-| HunyuanDiTImagePipeline     | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
-| SDImagePipeline             | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter |
-| SD3ImagePipeline            | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
-| SDXLImagePipeline           | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter |
-
-## Video Pipelines
-
-Pipelines for generating videos from text descriptions. In addition to the models required for image generation, they include models for handling motion modules.
-
-| Pipeline                   | Models                                                     |
-|----------------------------|----------------------------------------------------------------|
-| SDVideoPipeline            | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter<br>motion_modules: SDMotionModel |
-| SDXLVideoPipeline          | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter<br>motion_modules: SDXLMotionModel |
-| SVDVideoPipeline           | image_encoder: SVDImageEncoder<br>unet: SVDUNet<br>vae_encoder: SVDVAEEncoder<br>vae_decoder: SVDVAEDecoder |
-
-
-
+# Pipelines
+
+So far, the following table lists our pipelines and the models supported by each pipeline.
+
+## Image Pipelines
+
+Pipelines for generating images from text descriptions. Each pipeline relies on specific encoder and decoder models.
+
+| Pipeline                   | Models                                                     |
+|----------------------------|----------------------------------------------------------------|
+| HunyuanDiTImagePipeline     | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
+| SDImagePipeline             | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter |
+| SD3ImagePipeline            | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
+| SDXLImagePipeline           | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter |
+
+## Video Pipelines
+
+Pipelines for generating videos from text descriptions. In addition to the models required for image generation, they include models for handling motion modules.
+
+| Pipeline                   | Models                                                     |
+|----------------------------|----------------------------------------------------------------|
+| SDVideoPipeline            | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter<br>motion_modules: SDMotionModel |
+| SDXLVideoPipeline          | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter<br>motion_modules: SDXLMotionModel |
+| SVDVideoPipeline           | image_encoder: SVDImageEncoder<br>unet: SVDUNet<br>vae_encoder: SVDVAEEncoder<br>vae_decoder: SVDVAEDecoder |
+
+
+
--- a/docs/source_en/GetStarted/PromptProcessing.md
+++ b/docs/source_en/GetStarted/PromptProcessing.md
@@ -1,35 +1,35 @@
-# Prompt Processing
-
-DiffSynth includes prompt processing functionality, which is divided into:
-
- **Prompt Refiners (`prompt_refiner_classes`)**: Includes prompt refinement, prompt translation from Chinese to English, and both refinement and translation of prompts. Available parameters are as follows:
-
-    - **English Prompt Refinement**: 'BeautifulPrompt', using the model [pai-bloom-1b1-text2prompt-sd](https://modelscope.cn/models/AI-ModelScope/pai-bloom-1b1-text2prompt-sd).
-
-    - **Prompt Translation from Chinese to English**: 'Translator', using the model [opus-mt-zh-e](https://modelscope.cn/models/moxying/opus-mt-zh-en).
-
-    - **Prompt Translation and Refinement**: 'QwenPrompt', using the model [Qwen2-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct).
-
- **Prompt Extenders (`prompt_extender_classes`)**: Based on Omost's prompt partition control expansion. Available parameter is:
-
-    - **Prompt Partition Expansion**: 'OmostPromter'.
-
-## Usage Instructions
-
-### Prompt Refiners
-
-When loading the model pipeline, you can specify the desired prompt refiner functionality using the `prompt_refiner_classes` parameter. For example code, refer to [sd_prompt_refining.py](examples/image_synthesis/sd_prompt_refining.py).
-
-Available `prompt_refiner_classes` parameters include: Translator, BeautifulPrompt, QwenPrompt.
-
-```python
-pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[Translator, BeautifulPrompt])
-```
-
-### Prompt Extenders
-
-When loading the model pipeline, you can specify the desired prompt extender using the prompt_extender_classes parameter. For example code, refer to [omost_flux_text_to_image.py](examples/image_synthesis/omost_flux_text_to_image.py).
-
-```python
-pipe = FluxImagePipeline.from_model_manager(model_manager, prompt_extender_classes=[OmostPromter])
-```
+# Prompt Processing
+
+DiffSynth includes prompt processing functionality, which is divided into:
+
+- **Prompt Refiners (`prompt_refiner_classes`)**: Includes prompt refinement, prompt translation from Chinese to English, and both refinement and translation of prompts. Available parameters are as follows:
+
+    - **English Prompt Refinement**: 'BeautifulPrompt', using the model [pai-bloom-1b1-text2prompt-sd](https://modelscope.cn/models/AI-ModelScope/pai-bloom-1b1-text2prompt-sd).
+
+    - **Prompt Translation from Chinese to English**: 'Translator', using the model [opus-mt-zh-e](https://modelscope.cn/models/moxying/opus-mt-zh-en).
+
+    - **Prompt Translation and Refinement**: 'QwenPrompt', using the model [Qwen2-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct).
+
+- **Prompt Extenders (`prompt_extender_classes`)**: Based on Omost's prompt partition control expansion. Available parameter is:
+
+    - **Prompt Partition Expansion**: 'OmostPromter'.
+
+## Usage Instructions
+
+### Prompt Refiners
+
+When loading the model pipeline, you can specify the desired prompt refiner functionality using the `prompt_refiner_classes` parameter. For example code, refer to [sd_prompt_refining.py](examples/image_synthesis/sd_prompt_refining.py).
+
+Available `prompt_refiner_classes` parameters include: Translator, BeautifulPrompt, QwenPrompt.
+
+```python
+pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[Translator, BeautifulPrompt])
+```
+
+### Prompt Extenders
+
+When loading the model pipeline, you can specify the desired prompt extender using the prompt_extender_classes parameter. For example code, refer to [omost_flux_text_to_image.py](examples/image_synthesis/omost_flux_text_to_image.py).
+
+```python
+pipe = FluxImagePipeline.from_model_manager(model_manager, prompt_extender_classes=[OmostPromter])
+```
--- a/docs/source_en/GetStarted/Schedulers.md
+++ b/docs/source_en/GetStarted/Schedulers.md
@@ -1,11 +1,11 @@
-# Schedulers
-
-Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, requiring no additional configuration.
-
-The supported schedulers are:
-
- **EnhancedDDIMScheduler**: Extends the denoising process introduced in the Denoising Diffusion Probabilistic Models (DDPM) with non-Markovian guidance.
-
- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in Stable Diffusion 3.
-
+# Schedulers
+
+Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, requiring no additional configuration.
+
+The supported schedulers are:
+
+- **EnhancedDDIMScheduler**: Extends the denoising process introduced in the Denoising Diffusion Probabilistic Models (DDPM) with non-Markovian guidance.
+
+- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in Stable Diffusion 3.
+
 - **ContinuousODEScheduler**: A scheduler based on Ordinary Differential Equations (ODE).
--- a/docs/source_en/conf.py
+++ b/docs/source_en/conf.py
@@ -1,50 +1,50 @@
-# Configuration file for the Sphinx documentation builder.
-#
-# For the full list of built-in configuration values, see the documentation:
-# https://www.sphinx-doc.org/en/master/usage/configuration.html
-
-# -- Project information -----------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
-
-
-import os
-import sys
-sys.path.insert(0, os.path.abspath('../../diffsynth'))
-
-project = 'DiffSynth-Studio'
-copyright = '2024, ModelScope'
-author = 'ModelScope'
-release = '0.1.0'
-
-
-# -- General configuration ---------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
-
-extensions = [
-    'sphinx.ext.autodoc',
-    'sphinx.ext.napoleon',
-    'sphinx.ext.doctest',
-    'sphinx.ext.intersphinx',
-    'sphinx.ext.todo',
-    'sphinx.ext.coverage',
-    'sphinx.ext.imgmath',
-    'sphinx.ext.viewcode',
-    'recommonmark',
-    'sphinx_markdown_tables'
-]
-
-templates_path = ['_templates']
-exclude_patterns = []
-
-
-source_suffix = ['.rst', '.md']
-# -- Options for HTML output -------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
-
-html_theme = 'sphinx_rtd_theme'
-html_static_path = ['_static']
-# multi-language docs
-language = 'en'
-locale_dirs = ['../locales/']   # path is example but recommended.
-gettext_compact = False  # optional.
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Project information -----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+
+
+import os
+import sys
+sys.path.insert(0, os.path.abspath('../../diffsynth'))
+
+project = 'DiffSynth-Studio'
+copyright = '2024, ModelScope'
+author = 'ModelScope'
+release = '0.1.0'
+
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = [
+    'sphinx.ext.autodoc',
+    'sphinx.ext.napoleon',
+    'sphinx.ext.doctest',
+    'sphinx.ext.intersphinx',
+    'sphinx.ext.todo',
+    'sphinx.ext.coverage',
+    'sphinx.ext.imgmath',
+    'sphinx.ext.viewcode',
+    'recommonmark',
+    'sphinx_markdown_tables'
+]
+
+templates_path = ['_templates']
+exclude_patterns = []
+
+
+source_suffix = ['.rst', '.md']
+# -- Options for HTML output -------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+
+html_theme = 'sphinx_rtd_theme'
+html_static_path = ['_static']
+# multi-language docs
+language = 'en'
+locale_dirs = ['../locales/']   # path is example but recommended.
+gettext_compact = False  # optional.
 gettext_uuid = True  # optional.
--- a/docs/source_en/creating/AdaptersForImageSynthesis.md
+++ b/docs/source_en/creating/AdaptersForImageSynthesis.md
@@ -0,0 +1,135 @@
+# ControlNet、LoRA、IP-Adapter——Precision Control Technology
+
+Based on the VinVL model, various adapter-based models can be used to control the generation process.
+
+Let's download the models we'll be using in the upcoming examples:
+
+* A highly praised Stable Diffusion XL architecture anime-style model
+* A ControlNet model that supports multiple control modes
+* A LoRA model for the Stable Diffusion XL model
+* An IP-Adapter model and its corresponding image encoder
+
+Please note that the names of the models are kept in English as per your instruction to retain specific terminology.
+
+```python
+from diffsynth import download_models
+
+download_models([
+    "BluePencilXL_v200",
+    "ControlNet_union_sdxl_promax",
+    "SDXL_lora_zyd23ble_diffusion_xl/bluePencilXL_v200.safetensors"])
+pipe = SDXLImagePipeline.from_model_ma2_ChineseInkStyle_SDXL_v1_0",
+    "IP-Adapter-SDXL"
+])
+```
+
+Using basic text-to-image functionality to generate a picture.
+
+```python
+from diffsynth import ModelManager, SDXLImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
+model_manager.load_models(["models/stanager(model_manager)
+torch.manual_seed(1)
+image = pipe(
+    prompt="masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait,",
+    negative_prompt="worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw,",
+    cfg_scale=6, num_inference_steps=60,
+)
+image.save("image.jpg")
+```
+
+![image](https://github.com/user-attachments/assets/cc094e8f-ff6a-4f9e-ba05-7a5c2e0e609f)
+
+Next, let's transform this graceful underwater dancer into a fire mage! We'll activate the ControlNet to maintain the structure of the image while modifying the prompt.
+
+```python
+from diffsynth import ModelManager, SDXLImagePipeline, ControlNetConfigUnit
+import torch
+from PIL import Image
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
+model_manager.load_models([
+    "models/stable_diffusion_xl/bluePencilXL_v200.safetensors",
+    "models/ControlNet/controlnet_union/diffusion_pytorch_model_promax.safetensors"
+])
+pipe = SDXLImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
+    ControlNetConfigUnit("depth", "models/ControlNet/controlnet_union/diffusion_pytorch_model_promax.safetensors", scale=1)
+])
+torch.manual_seed(2)
+image = pipe(
+    prompt="masterpiece, best quality, solo, long hair, wavy hair, pink hair, red eyes, red dress, medium breasts, dress, fire ball, fire background, floating hair, refraction, portrait,",
+    negative_prompt="worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw, white background",
+    cfg_scale=6, num_inference_steps=60,
+    controlnet_image=Image.open("image.jpg")
+)
+image.save("image_controlnet.jpg")
+```
+
+![image_controlnet](https://github.com/user-attachments/assets/d50d173e-e81a-4d7e-93e3-b2787d69953e)
+
+Isn't that cool? There's more! Add a LoRA to make the image closer to the flat style of hand-drawn comics. This LoRA requires certain trigger words to take effect, which is mentioned on the original author's model page. Remember to add the trigger words at the beginning of the prompt.
+
+```python
+from diffsynth import ModelManager, SDXLImagePipeline, ControlNetConfigUnit
+import torch
+from PIL import Image
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
+model_manager.load_models([
+    "models/stable_diffusion_xl/bluePencilXL_v200.safetensors",
+    "models/ControlNet/controlnet_union/diffusion_pytorch_model_promax.safetensors"
+])
+model_manager.load_lora("models/lora/zyd232_ChineseInkStyle_SDXL_v1_0.safetensors", lora_alpha=1.0)
+pipe = SDXLImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
+    ControlNetConfigUnit("depth", "models/ControlNet/controlnet_union/diffusion_pytorch_model_promax.safetensors", scale=1.0)
+])
+torch.manual_seed(3)
+image = pipe(
+    prompt="zydink, ink sketch, flat anime, masterpiece, best quality, solo, long hair, wavy hair, pink hair, red eyes, red dress, medium breasts, dress, fire ball, fire background, floating hair, refraction, portrait,",
+    negative_prompt="worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw, white background",
+    cfg_scale=6, num_inference_steps=60,
+    controlnet_image=Image.open("image.jpg")
+)
+image.save("image_lora.jpg")
+```
+
+![image_lora](https://github.com/user-attachments/assets/c599b2f8-8351-4be5-a6ae-8380889cb9d8)
+
+Not done yet! Find a Chinese painting with ink-wash style as a style guide, activate the IP-Adapter, and let classical art collide with modern aesthetics!
+
+| Let's use this image as a style guide. |![ink_style](https://github.com/user-attachments/assets/e47c5a03-9c7b-402b-b260-d8bfd56abbc5)|
+|-|-|
+
+```python
+from diffsynth import ModelManager, SDXLImagePipeline, ControlNetConfigUnit
+import torch
+from PIL import Image
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
+model_manager.load_models([
+    "models/stable_diffusion_xl/bluePencilXL_v200.safetensors",
+    "models/ControlNet/controlnet_union/diffusion_pytorch_model_promax.safetensors",
+    "models/IpAdapter/stable_diffusion_xl/ip-adapter_sdxl.bin",
+    "models/IpAdapter/stable_diffusion_xl/image_encoder/model.safetensors",
+])
+model_manager.load_lora("models/lora/zyd232_ChineseInkStyle_SDXL_v1_0.safetensors", lora_alpha=1.0)
+pipe = SDXLImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
+    ControlNetConfigUnit("depth", "models/ControlNet/controlnet_union/diffusion_pytorch_model_promax.safetensors", scale=1.0)
+])
+torch.manual_seed(2)
+image = pipe(
+    prompt="zydink, ink sketch, flat anime, masterpiece, best quality, solo, long hair, wavy hair, pink hair, red eyes, red dress, medium breasts, dress, fire ball, fire background, floating hair, refraction, portrait,",
+    negative_prompt="worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw, white background",
+    cfg_scale=6, num_inference_steps=60,
+    controlnet_image=Image.open("image.jpg"),
+    ipadapter_images=[Image.open("ink_style.jpg")],
+    ipadapter_use_instant_style=True, ipadapter_scale=0.5
+)
+image.save("image_ipadapter.jpg")
+```
+
+![image_ipadapter](https://github.com/user-attachments/assets/e5924aef-03b0-4462-811f-a60e2523fd7f)
+
+The joy of generating images with Diffusion lies in the combination of various ecosystem models, which can realize all kinds of creative ideas.
--- a/docs/source_en/creating/BasicImageSynthesis.md
+++ b/docs/source_en/creating/BasicImageSynthesis.md
@@ -0,0 +1,64 @@
+# Text-to-Image, Image-to-Image, and High-Resolution Restoration - First Encounter with the Dazzling Diffusion.
+
+Load the text-to-image model, here we use an anime-style model from Civitai as an example.
+
+```python
+import torch
+from diffsynth import ModelManager, SDImagePipeline, download_models
+
+download_models(["AingDiffusion_v12"])
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
+model_manager.load_models(["models/stable_diffusion/aingdiffusion_v12.safetensors"])
+pipe = SDImagePipeline.from_model_manager(model_manager)
+```
+
+Generate a picture to give it a try.
+
+```python
+torch.manual_seed(0)
+image = pipe(
+    prompt="masterpiece, best quality, a girl with long silver hair",
+    negative_prompt="worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw,",
+    height=512, width=512, num_inference_steps=80,
+)
+image.save("image.jpg")
+```
+
+Ah, a lovely young lady.
+
+![image](https://github.com/user-attachments/assets/999100d2-1c39-4f18-b37e-aa9d5b4e519c)
+
+Use the image-to-image feature to turn her hair red, simply by adding `input_image` and `denoising_strength` as parameters. The `denoising_strength` controls the intensity of the noise added, when set to 0, the generated image will be identical to the input image, and when set to 1, it will be completely randomly generated.
+
+```python
+torch.manual_seed(1)
+image_edited = pipe(
+    prompt="masterpiece, best quality, a girl with long red hair",
+    negative_prompt="worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw,",
+    height=512, width=512, num_inference_steps=80,
+    input_image=image, denoising_strength=0.6,
+)
+image_edited.save("image_edited.jpg")
+```
+
+Ah, a cute girl with red hair.
+
+![image_edited](https://github.com/user-attachments/assets/e3de8bc1-037f-4d4d-aacf-8919143c2375)
+
+Since the model itself was trained at a resolution of 512*512, the image appears a bit blurry. However, we can utilize the model's own capabilities to refine the image and add details. Specifically, this involves increasing the resolution and then using image-to-image generation.
+```python
+torch.manual_seed(2)
+image_highres = pipe(
+    prompt="masterpiece, best quality, a girl with long red hair",
+    negative_prompt="worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw,",
+    height=1024, width=1024, num_inference_steps=80,
+    input_image=image_edited.resize((1024, 1024)), denoising_strength=0.6,
+)
+image_highres.save("image_highres.jpg")
+```
+
+Ah, a clear and lovely girl with red hair.
+
+![image_highres](https://github.com/user-attachments/assets/4466353e-662c-49f5-9211-b11bb0bb7fb7)
+
+It's worth noting that the image-to-image and high-resolution restoration features are globally supported, and currently, all of our image generation pipelines can be used in this way.
--- a/docs/source_en/creating/PromptRefine.md
+++ b/docs/source_en/creating/PromptRefine.md
@@ -0,0 +1,77 @@
+# Translation and Polishing — The Magic of Prompt Words
+
+When generating images, we need to write prompt words to describe the content of the image. Prompt words directly affect the outcome of the generation, but crafting them is also an art. Good prompt words can produce images with a high degree of aesthetic appeal. We offer a range of models to help users handle prompt words effectively.
+
+## Translation
+
+Most text-to-image models currently only support English prompt words, which can be challenging for users who are not native English speakers. To address this, we can use open-source translation models to translate the prompt words into English. In the following example, we take "一个女孩" (a girl) as the prompt word and use the model opus-mt-zh-en for translation(which can be downloaded from [HuggingFace](https://huggingface.co/Helsinki-NLP/opus-mt-zh-en) or [ModelScope](https://modelscope.cn/models/moxying/opus-mt-zh-en)).
+```python
+from diffsynth import ModelManager, SDXLImagePipeline, Translator
+import torch
+
+model_manager = ModelManager(
+    torch_dtype=torch.float16, device="cuda",
+    model_id_list=["BluePencilXL_v200", "opus-mt-zh-en"]
+)
+pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[Translator])
+
+torch.manual_seed(0)
+prompt = "一个女孩"
+image = pipe(
+    prompt=prompt, negative_prompt="",
+    height=1024, width=1024, num_inference_steps=30
+)
+image.save("image_1.jpg")
+```
+
+![image_1](https://github.com/user-attachments/assets/c8070a6b-3d2f-4faf-a806-c403b91f1a94)
+
+## Polishing
+
+Detailed prompt words can generate images with richer details. We can use a prompt polishing model like BeautifulPrompt(which can be downloaded from [HuggingFace](https://huggingface.co/Helsinki-NLP/opus-mt-zh-en) or [ModelScope](https://modelscope.cn/models/moxying/opus-mt-zh-en)) to embellish simple prompt words. This model can make the overall picture style more gorgeous.
+
+This module can be activated simultaneously with the translation module, but please pay attention to the order: translate first, then polish.
+
+```python
+from diffsynth import ModelManager, SDXLImagePipeline, Translator, BeautifulPrompt
+import torch
+
+model_manager = ModelManager(
+    torch_dtype=torch.float16, device="cuda",
+    model_id_list=["BluePencilXL_v200", "opus-mt-zh-en", "BeautifulPrompt"]
+)
+pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[Translator, BeautifulPrompt])
+
+torch.manual_seed(0)
+prompt = "一个女孩"
+image = pipe(
+    prompt=prompt, negative_prompt="",
+    height=1024, width=1024, num_inference_steps=30
+)
+image.save("image_2.jpg")
+```
+
+![image_2](https://github.com/user-attachments/assets/94f64a7d-b14a-41e2-a013-c9a74635a84d)
+
+We have also integrated a Tongyi Qwen model that can seamlessly complete the translation and polishing of prompt words in one step.
+
+```python
+from diffsynth import ModelManager, SDXLImagePipeline, QwenPrompt
+import torch
+
+model_manager = ModelManager(
+    torch_dtype=torch.float16, device="cuda",
+    model_id_list=["BluePencilXL_v200", "QwenPrompt"]
+)
+pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[QwenPrompt])
+
+torch.manual_seed(0)
+prompt = "一个女孩"
+image = pipe(
+    prompt=prompt, negative_prompt="",
+    height=1024, width=1024, num_inference_steps=30
+)
+image.save("image_3.jpg")
+```
+
+![image_3](https://github.com/user-attachments/assets/fc1a201d-aef1-4e6a-81d6-2e2249ffa230)
--- a/docs/source_en/creating/ToonShading.md
+++ b/docs/source_en/creating/ToonShading.md
@@ -0,0 +1,95 @@
+# When Image Models Meet AnimateDiff—Model Combination Technology
+
+We have already witnessed the powerful image generation capabilities of the Stable Diffusion model and its ecosystem models. Now, we introduce a new module: AnimateDiff, which allows us to transfer the capabilities of image models to videos. In this article, we showcase an anime-style video rendering solution built on DiffSynth-Studio: Diffutoon.
+
+## Download Models
+
+The following examples will use many models, so let's download them first.
+
+* An anime-style Stable Diffusion architecture model
+* Two ControlNet models
+* A Textual Inversion model
+* An AnimateDiff model
+
+```python
+from diffsynth import download_models
+
+download_models([
+    "AingDiffusion_v12",
+    "AnimateDiff_v2",
+    "ControlNet_v11p_sd15_lineart",
+    "ControlNet_v11f1e_sd15_tile",
+    "TextualInversion_VeryBadImageNegative_v1.3"
+])
+```
+
+## Download Video
+
+You can choose any video you like. We use [this video](https://www.bilibili.com/video/BV1iG411a7sQ) as a demonstration. You can download this video file with the following command, but please note, do not use it for commercial purposes without obtaining the commercial copyright from the original video creator.
+
+```
+modelscope download --dataset Artiprocher/examples_in_diffsynth data/examples/diffutoon/input_video.mp4 --local_dir ./
+```
+
+## Generate Anime
+
+```python
+from diffsynth import ModelManager, SDVideoPipeline, ControlNetConfigUnit, VideoData, save_video
+import torch
+
+# Load models
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
+model_manager.load_models([
+    "models/stable_diffusion/aingdiffusion_v12.safetensors",
+    "models/AnimateDiff/mm_sd_v15_v2.ckpt",
+    "models/ControlNet/control_v11p_sd15_lineart.pth",
+    "models/ControlNet/control_v11f1e_sd15_tile.pth",
+])
+
+# Build pipeline
+pipe = SDVideoPipeline.from_model_manager(
+    model_manager,
+    [
+        ControlNetConfigUnit(
+            processor_id="tile",
+            model_path="models/ControlNet/control_v11f1e_sd15_tile.pth",
+            scale=0.5
+        ),
+        ControlNetConfigUnit(
+            processor_id="lineart",
+            model_path="models/ControlNet/control_v11p_sd15_lineart.pth",
+            scale=0.5
+        )
+    ]
+)
+pipe.prompter.load_textual_inversions(["models/textual_inversion/verybadimagenegative_v1.3.pt"])
+
+# Load video
+video = VideoData(
+    video_file="data/examples/diffutoon/input_video.mp4",
+    height=1536, width=1536
+)
+input_video = [video[i] for i in range(30)]
+
+# Generate
+torch.manual_seed(0)
+output_video = pipe(
+    prompt="best quality, perfect anime illustration, light, a girl is dancing, smile, solo",
+    negative_prompt="verybadimagenegative_v1.3",
+    cfg_scale=7, clip_skip=2,
+    input_frames=input_video, denoising_strength=1.0,
+    controlnet_frames=input_video, num_frames=len(input_video),
+    num_inference_steps=10, height=1536, width=1536,
+    animatediff_batch_size=16, animatediff_stride=8,
+)
+
+# Save video
+save_video(output_video, "output_video.mp4", fps=30)
+```
+
+## Effect Display
+
+<video width="512" height="256" controls>
+  <source src="https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-4709-be5e-b39af82404dd" type="video/mp4">
+Your browser does not support the Video tag.
+</video>
--- a/docs/source_en/finetune/overview.md
+++ b/docs/source_en/finetune/overview.md
@@ -0,0 +1,102 @@
+Certainly, here is the continuation of the translation:
+
+---
+
+# Training Framework
+
+We have implemented a training framework for text-to-image diffusion models, allowing users to effortlessly train LoRA models with our framework. Our provided scripts come with the following features:
+
+* **Comprehensive Functionality**: Our training framework supports multi-GPU and multi-node configurations, is optimized for acceleration with DeepSpeed, and includes gradient checkpointing to accommodate models with higher memory requirements.
+* **Succinct Code**: We have avoided large, complex code blocks. The general module is implemented in `diffsynth/trainers/text_to_image.py`, while model-specific training scripts contain only the minimal code necessary for the model architecture, facilitating ease of use for academic researchers.
+* **Modular Design**: Built on the versatile PyTorch Lightning framework, our training framework is decoupled in functionality, enabling developers to easily incorporate additional training techniques by modifying our scripts to suit their specific needs.
+
+Examples of images fine-tuned with LoRA. Prompts are "A little dog jumping around with colorful flowers around and mountains in the background" (for Chinese models) or "a dog is jumping, flowers around the dog, the background is mountains and clouds" (for English models).
+
+||FLUX.1-dev|Kolors|Stable Diffusion 3|Hunyuan-DiT|
+|-|-|-|-|-|
+|Without LoRA|![image_without_lora](https://github.com/user-attachments/assets/df62cef6-d54f-4e3d-a602-5dd290079d49)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/9d79ed7a-e8cf-4d98-800a-f182809db318)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/ddb834a5-6366-412b-93dc-6d957230d66e)|![image_without_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/1aa21de5-a992-4b66-b14f-caa44e08876e)|
+|With LoRA|![image_with_lora](https://github.com/user-attachments/assets/4fd39890-0291-4d19-8a88-d70d0ae18533)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/02f62323-6ee5-4788-97a1-549732dbe4f0)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/8e7b2888-d874-4da4-a75b-11b6b214b9bf)|![image_with_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/83a0a41a-691f-4610-8e7b-d8e17c50a282)|
+
+## Install Additional Packages
+
+```bash
+pip install peft lightning
+```
+
+## Prepare the Dataset
+
+We provide an [example dataset](https://modelscope.cn/datasets/buptwq/lora-stable-diffusion-finetune/files). You need to organize your training dataset in the following structure:
+
+```
+data/dog/
+└── train
+    ├── 00.jpg
+    ├── 01.jpg
+    ├── 02.jpg
+    ├── 03.jpg
+    ├── 04.jpg
+    └── metadata.csv
+```
+
+`metadata.csv`:
+
+```
+file_name,text
+00.jpg,a dog
+01.jpg,a dog
+02.jpg,a dog
+03.jpg,a dog
+04.jpg,a dog
+```
+
+Please note that if the model is a Chinese model (e.g., Hunyuan-DiT and Kolors), we recommend using Chinese text in the dataset. For example:
+
+```
+file_name,text
+00.jpg,a dog
+01.jpg,a dog
+02.jpg,a dog
+03.jpg,a dog
+04.jpg,a dog
+```
+
+## Train LoRA Model
+
+General parameter options:
+
+```
+  --lora_target_modules LORA_TARGET_MODULES
+                        Layers where the LoRA modules are located.
+  --dataset_path DATASET_PATH
+                        Path to the dataset.
+  --output_path OUTPUT_PATH
+                        Path where the model will be saved.
+  --steps_per_epoch STEPS_PER_EPOCH
+                        Number of steps per epoch.
+  --height HEIGHT        The height of the image.
+  --width WIDTH          The width of the image.
+  --center_crop         Whether to center crop the input image to the specified resolution. If not set, the image will be randomly cropped. The image will be resized to the specified resolution before cropping.
+  --random_flip         Whether to randomly horizontally flip the image.
+  --batch_size BATCH_SIZE
+                        Batch size for the training data loader (per device).
+  --dataloader_num_workers DATALOADER_NUM_WORKERS
+                        The number of subprocesses used for data loading. A value of 0 means the data will be loaded in the main process.
+  --precision {32,16,16-mixed}
+                        The precision for training.
+  --learning_rate LEARNING_RATE
+                        The learning rate.
+  --lora_rank LORA_RANK
+                        The dimension of the LoRA update matrix.
+  --lora_alpha LORA_ALPHA
+                        The weight of the LoRA update matrix.
+  --use_gradient_checkpointing
+                        Whether to use gradient checkpointing.
+  --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
+                        The number of batches for gradient accumulation.
+  --training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3}
+                        The training strategy.
+  --max_epochs MAX_EPOCHS
+                        The number of training epochs.
+  --modelscope_model_id MODELSCOPE_MODEL_ID
+                        The model ID on ModelScope (https://www.modelscope.cn/). If the model ID is provided, the model will be automatically uploaded to ModelScope.
+```
--- a/docs/source_en/finetune/train_flux_lora.md
+++ b/docs/source_en/finetune/train_flux_lora.md
@@ -0,0 +1,70 @@
+#Training FLUX LoRA
+
+The following files will be used to build the FLUX model. You can download them from [huggingface](https://huggingface.co/black-forest-labs/FLUX.1-dev)或[modelscope](https://www.modelscope.cn/models/ai-modelscope/flux.1-dev), or you can use the following code to download these files:
+```python
+from diffsynth import download_models
+
+download_models(["FLUX.1-dev"])
+```
+
+```
+models/FLUX/
+└── FLUX.1-dev
+    ├── ae.safetensors
+    ├── flux1-dev.safetensors
+    ├── text_encoder
+    │   └── model.safetensors
+    └── text_encoder_2
+        ├── config.json
+        ├── model-00001-of-00002.safetensors
+        ├── model-00002-of-00002.safetensors
+        └── model.safetensors.index.json
+```
+
+Start the training task with the following command:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/flux/train_flux_lora.py \
+  --pretrained_text_encoder_path models/FLUX/FLUX.1-dev/text_encoder/model.safetensors \
+  --pretrained_text_encoder_2_path models/FLUX/FLUX.1-dev/text_encoder_2 \
+  --pretrained_dit_path models/FLUX/FLUX.1-dev/flux1-dev.safetensors \
+  --pretrained_vae_path models/FLUX/FLUX.1-dev/ae.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "bf16" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information on the parameters, please use `python examples/train/flux/train_flux_lora.py -h` to view detailed information.
+
+After the training is complete, use `model_manager.load_lora` to load the LoRA for inference.
+
+```python
+from diffsynth import ModelManager, FluxImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=[
+                                 "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
+                                 "models/FLUX/FLUX.1-dev/text_encoder_2",
+                                 "models/FLUX/FLUX.1-dev/ae.safetensors",
+                                 "models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
+                             ])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SDXLImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt=prompt,
+    num_inference_steps=30, embedded_guidance=3.5
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/finetune/train_hunyuan_dit_lora.md
+++ b/docs/source_en/finetune/train_hunyuan_dit_lora.md
@@ -0,0 +1,72 @@
+# Training Hunyuan-DiT LoRA
+
+Building the Hunyuan DiT model requires four files. You can download these files from [HuggingFace](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT) or [ModelScope](https://www.modelscope.cn/models/modelscope/HunyuanDiT/summary). You can use the following code to download these files:
+
+```python
+from diffsynth import download_models
+
+download_models(["HunyuanDiT"])
+```
+
+```
+models/HunyuanDiT/
+├── Put Hunyuan DiT checkpoints here.txt
+└── t2i
+    ├── clip_text_encoder
+    │   └── pytorch_model.bin
+    ├── model
+    │   └── pytorch_model_ema.pt
+    ├── mt5
+    │   └── pytorch_model.bin
+    └── sdxl-vae-fp16-fix
+        └── diffusion_pytorch_model.bin
+```
+
+Use the following command to start the training task:
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py \
+  --pretrained_path models/HunyuanDiT/t2i \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+For more information about the parameters, please use `python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py -h` to view detailed information.
+
+After the training is complete, use `model_manager.load_lora` to load the LoRA for inference.
+
+
+```python
+from diffsynth import ModelManager, HunyuanDiTImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=[
+                                 "models/HunyuanDiT/t2i/clip_text_encoder/pytorch_model.bin",
+                                 "models/HunyuanDiT/t2i/model/pytorch_model_ema.pt",
+                                 "models/HunyuanDiT/t2i/mt5/pytorch_model.bin",
+                                 "models/HunyuanDiT/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin"
+                             ])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = HunyuanDiTImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="A little puppy hops and jumps playfully, surrounded by a profusion of colorful flowers, with a mountain range visible in the distance.
+", 
+    negative_prompt="",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/finetune/train_kolors_lora.md
+++ b/docs/source_en/finetune/train_kolors_lora.md
@@ -0,0 +1,78 @@
+# 训练 Kolors LoRA
+
+以下文件将用于构建 Kolors。你可以从 [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) 或 [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors) 下载 Kolors。由于精度溢出问题，我们需要下载额外的 VAE 模型（从 [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) 或 [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix)）。你可以使用以下代码下载这些文件：
+
+
+```python
+from diffsynth import download_models
+
+download_models(["Kolors", "SDXL-vae-fp16-fix"])
+```
+
+```
+models
+├── kolors
+│   └── Kolors
+│       ├── text_encoder
+│       │   ├── config.json
+│       │   ├── pytorch_model-00001-of-00007.bin
+│       │   ├── pytorch_model-00002-of-00007.bin
+│       │   ├── pytorch_model-00003-of-00007.bin
+│       │   ├── pytorch_model-00004-of-00007.bin
+│       │   ├── pytorch_model-00005-of-00007.bin
+│       │   ├── pytorch_model-00006-of-00007.bin
+│       │   ├── pytorch_model-00007-of-00007.bin
+│       │   └── pytorch_model.bin.index.json
+│       ├── unet
+│       │   └── diffusion_pytorch_model.safetensors
+│       └── vae
+│           └── diffusion_pytorch_model.safetensors
+└── sdxl-vae-fp16-fix
+    └── diffusion_pytorch_model.safetensors
+```
+
+使用下面的命令启动训练任务：
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
+  --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \
+  --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \
+  --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+有关参数的更多信息，请使用 `python examples/train/kolors/train_kolors_lora.py -h` 查看详细信息。
+
+训练完成后，使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
+
+
+
+```python
+from diffsynth import ModelManager, SD3ImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SD3ImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/finetune/train_sd3_lora.md
+++ b/docs/source_en/finetune/train_sd3_lora.md
@@ -0,0 +1,59 @@
+# 训练 Stable Diffusion 3 LoRA
+
+训练脚本只需要一个文件。你可以使用 [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)（没有 T5 Encoder）或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)（有 T5 Encoder）。请使用以下代码下载这些文件：
+
+
+```python
+from diffsynth import download_models
+
+download_models(["StableDiffusion3", "StableDiffusion3_without_T5"])
+```
+
+```
+models/stable_diffusion_3/
+├── Put Stable Diffusion 3 checkpoints here.txt
+├── sd3_medium_incl_clips.safetensors
+└── sd3_medium_incl_clips_t5xxlfp16.safetensors
+```
+
+使用下面的命令启动训练任务：
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora.py \
+  --pretrained_path models/stable_diffusion_3/sd3_medium_incl_clips.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+有关参数的更多信息，请使用 `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` 查看详细信息。
+
+训练完成后，使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
+
+```python
+from diffsynth import ModelManager, SD3ImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SD3ImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/finetune/train_sd_lora.md
+++ b/docs/source_en/finetune/train_sd_lora.md
@@ -0,0 +1,59 @@
+# 训练 Stable Diffusion LoRA
+
+训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下，我们使用基础的 Stable Diffusion v1.5。你可以从 [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors) 下载。你可以使用以下代码下载这个文件：
+
+```python
+from diffsynth import download_models
+
+download_models(["StableDiffusion_v15"])
+```
+
+```
+models/stable_diffusion
+├── Put Stable Diffusion checkpoints here.txt
+└── v1-5-pruned-emaonly.safetensors
+```
+
+使用以下命令启动训练任务：
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py \
+  --pretrained_path models/stable_diffusion/v1-5-pruned-emaonly.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 512 \
+  --width 512 \
+  --center_crop \
+  --precision "16-mixed" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+有关参数的更多信息，请使用 `python examples/train/stable_diffusion/train_sd_lora.py -h` 查看详细信息。
+
+训练完成后，使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
+
+
+
+```python
+from diffsynth import ModelManager, SDImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion/v1-5-pruned-emaonly.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SDImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=512, height=512,
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/finetune/train_sdxl_lora.md
+++ b/docs/source_en/finetune/train_sdxl_lora.md
@@ -0,0 +1,57 @@
+# 训练 Stable Diffusion XL LoRA
+
+训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下，我们使用基础的 Stable Diffusion XL。你可以从 [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors) 下载。也可以使用以下代码下载这个文件：
+
+```python
+from diffsynth import download_models
+
+download_models(["StableDiffusionXL_v1"])
+```
+
+```
+models/stable_diffusion_xl
+├── Put Stable Diffusion XL checkpoints here.txt
+└── sd_xl_base_1.0.safetensors
+```
+
+我们观察到 Stable Diffusion XL 在 float16 精度下会出现数值精度溢出，因此我们建议用户使用 float32 精度训练，使用以下命令启动训练任务：
+
+```
+CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lora.py \
+  --pretrained_path models/stable_diffusion_xl/sd_xl_base_1.0.safetensors \
+  --dataset_path data/dog \
+  --output_path ./models \
+  --max_epochs 1 \
+  --steps_per_epoch 500 \
+  --height 1024 \
+  --width 1024 \
+  --center_crop \
+  --precision "32" \
+  --learning_rate 1e-4 \
+  --lora_rank 4 \
+  --lora_alpha 4 \
+  --use_gradient_checkpointing
+```
+
+有关参数的更多信息，请使用 `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` 查看详细信息。
+
+训练完成后，使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
+
+```python
+from diffsynth import ModelManager, SDXLImagePipeline
+import torch
+
+model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
+                             file_path_list=["models/stable_diffusion_xl/sd_xl_base_1.0.safetensors"])
+model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0)
+pipe = SDXLImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", 
+    negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails",
+    cfg_scale=7.5,
+    num_inference_steps=100, width=1024, height=1024,
+)
+image.save("image_with_lora.jpg")
+```
--- a/docs/source_en/index.rst
+++ b/docs/source_en/index.rst
@@ -1,32 +1,32 @@
-.. DiffSynth-Studio documentation master file, created by
-   sphinx-quickstart on Thu Sep  5 16:39:24 2024.
-   You can adapt this file completely to your liking, but it should at least
-   contain the root `toctree` directive.
-
-DiffSynth-Studio documentation
-==============================
-
-Add your content using ``reStructuredText`` syntax. See the
-`reStructuredText <https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html>`_
-documentation for details.
-
-
-.. toctree::
-   :maxdepth: 1
-   :caption: Contents:
-
-   GetStarted/A_simple_example.md
-   GetStarted/Download_models.md
-   GetStarted/ModelManager.md
-   GetStarted/Models.md
-   GetStarted/Pipelines.md
-   GetStarted/PromptProcessing.md
-   GetStarted/Schedulers.md
-   GetStarted/Fine-tuning.md
-   GetStarted/Extensions.md
-   GetStarted/WebUI.md
-
-
-.. toctree::
-   :maxdepth: 1
-   :caption: API Docs
+.. DiffSynth-Studio documentation master file, created by
+   sphinx-quickstart on Thu Sep  5 16:39:24 2024.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+DiffSynth-Studio documentation
+==============================
+
+Add your content using ``reStructuredText`` syntax. See the
+`reStructuredText <https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html>`_
+documentation for details.
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Contents:
+
+   GetStarted/A_simple_example.md
+   GetStarted/Download_models.md
+   GetStarted/ModelManager.md
+   GetStarted/Models.md
+   GetStarted/Pipelines.md
+   GetStarted/PromptProcessing.md
+   GetStarted/Schedulers.md
+   GetStarted/Fine-tuning.md
+   GetStarted/Extensions.md
+   GetStarted/WebUI.md
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: API Docs
--- a/docs/source_en/requirement.txt
+++ b/docs/source_en/requirement.txt
@@ -1,4 +1,4 @@
-recommonmark
-sphinx_rtd_theme
-myst-parser
+recommonmark
+sphinx_rtd_theme
+myst-parser
 sphinx-markdown-tables