From 883d26abb4b0006e5ba68d5dff4859fad87a15fc Mon Sep 17 00:00:00 2001
From: yrk111222 <2493404415@qq.com>
Date: Fri, 18 Oct 2024 18:02:52 +0800
Subject: [PATCH] Add files via upload
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
第一版翻译完成,保留了getStart目录,有一些名词还是需要重新检查
---
docs/source_en/GetStarted/Models.md | 1 +
docs/source_en/GetStarted/Pipelines.md | 15 ++--
docs/source_en/GetStarted/Schedulers.md | 4 +-
docs/source_en/finetune/train_kolors_lora.md | 13 ++-
docs/source_en/finetune/train_sd3_lora.md | 12 ++-
docs/source_en/finetune/train_sd_lora.md | 11 ++-
docs/source_en/finetune/train_sdxl_lora.md | 12 +--
docs/source_en/tutorial/ASimpleExample.md | 85 ++++++++++++++++++++
docs/source_en/tutorial/DownloadModels.md | 34 ++++++++
docs/source_en/tutorial/Extensions.md | 49 +++++++++++
docs/source_en/tutorial/Installation.md | 24 ++++++
docs/source_en/tutorial/Models.md | 18 +++++
docs/source_en/tutorial/Pipelines.md | 22 +++++
docs/source_en/tutorial/PromptProcessing.md | 35 ++++++++
docs/source_en/tutorial/Schedulers.md | 11 +++
15 files changed, 308 insertions(+), 38 deletions(-)
create mode 100644 docs/source_en/tutorial/ASimpleExample.md
create mode 100644 docs/source_en/tutorial/DownloadModels.md
create mode 100644 docs/source_en/tutorial/Extensions.md
create mode 100644 docs/source_en/tutorial/Installation.md
create mode 100644 docs/source_en/tutorial/Models.md
create mode 100644 docs/source_en/tutorial/Pipelines.md
create mode 100644 docs/source_en/tutorial/PromptProcessing.md
create mode 100644 docs/source_en/tutorial/Schedulers.md
diff --git a/docs/source_en/GetStarted/Models.md b/docs/source_en/GetStarted/Models.md
index b7127db..1c42a41 100644
--- a/docs/source_en/GetStarted/Models.md
+++ b/docs/source_en/GetStarted/Models.md
@@ -2,6 +2,7 @@
Until now, DiffSynth Studio has supported the following models:
+* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)
* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
diff --git a/docs/source_en/GetStarted/Pipelines.md b/docs/source_en/GetStarted/Pipelines.md
index 9ca2e73..67e6c1c 100644
--- a/docs/source_en/GetStarted/Pipelines.md
+++ b/docs/source_en/GetStarted/Pipelines.md
@@ -1,27 +1,22 @@
# Pipelines
-So far, the following table lists our pipelines and the models supported by each pipeline.
+DiffSynth-Studio includes multiple pipelines, categorized into two types: image generation and video generation.
## Image Pipelines
-Pipelines for generating images from text descriptions. Each pipeline relies on specific encoder and decoder models.
-
| Pipeline | Models |
|----------------------------|----------------------------------------------------------------|
-| HunyuanDiTImagePipeline | text_encoder: HunyuanDiTCLIPTextEncoder
text_encoder_t5: HunyuanDiTT5TextEncoder
dit: HunyuanDiT
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder |
| SDImagePipeline | text_encoder: SDTextEncoder
unet: SDUNet
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder
controlnet: MultiControlNetManager
ipadapter_image_encoder: IpAdapterCLIPImageEmbedder
ipadapter: SDIpAdapter |
-| SD3ImagePipeline | text_encoder_1: SD3TextEncoder1
text_encoder_2: SD3TextEncoder2
text_encoder_3: SD3TextEncoder3
dit: SD3DiT
vae_decoder: SD3VAEDecoder
vae_encoder: SD3VAEEncoder |
| SDXLImagePipeline | text_encoder: SDXLTextEncoder
text_encoder_2: SDXLTextEncoder2
text_encoder_kolors: ChatGLMModel
unet: SDXLUNet
vae_decoder: SDXLVAEDecoder
vae_encoder: SDXLVAEEncoder
controlnet: MultiControlNetManager
ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder
ipadapter: SDXLIpAdapter |
+| SD3ImagePipeline | text_encoder_1: SD3TextEncoder1
text_encoder_2: SD3TextEncoder2
text_encoder_3: SD3TextEncoder3
dit: SD3DiT
vae_decoder: SD3VAEDecoder
vae_encoder: SD3VAEEncoder |
+| HunyuanDiTImagePipeline | text_encoder: HunyuanDiTCLIPTextEncoder
text_encoder_t5: HunyuanDiTT5TextEncoder
dit: HunyuanDiT
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder |
+| FluxImagePipeline | text_encoder_1: FluxTextEncoder1
text_encoder_2: FluxTextEncoder2
dit: FluxDiT
vae_decoder: FluxVAEDecoder
vae_encoder: FluxVAEEncoder |
## Video Pipelines
-Pipelines for generating videos from text descriptions. In addition to the models required for image generation, they include models for handling motion modules.
-
| Pipeline | Models |
|----------------------------|----------------------------------------------------------------|
| SDVideoPipeline | text_encoder: SDTextEncoder
unet: SDUNet
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder
controlnet: MultiControlNetManager
ipadapter_image_encoder: IpAdapterCLIPImageEmbedder
ipadapter: SDIpAdapter
motion_modules: SDMotionModel |
| SDXLVideoPipeline | text_encoder: SDXLTextEncoder
text_encoder_2: SDXLTextEncoder2
text_encoder_kolors: ChatGLMModel
unet: SDXLUNet
vae_decoder: SDXLVAEDecoder
vae_encoder: SDXLVAEEncoder
ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder
ipadapter: SDXLIpAdapter
motion_modules: SDXLMotionModel |
| SVDVideoPipeline | image_encoder: SVDImageEncoder
unet: SVDUNet
vae_encoder: SVDVAEEncoder
vae_decoder: SVDVAEDecoder |
-
-
-
+| CogVideoPipeline | text_encoder: FluxTextEncoder2
dit: CogDiT
vae_encoder: CogVAEEncoder
vae_decoder: CogVAEDecoder |
diff --git a/docs/source_en/GetStarted/Schedulers.md b/docs/source_en/GetStarted/Schedulers.md
index 495293f..757d9ba 100644
--- a/docs/source_en/GetStarted/Schedulers.md
+++ b/docs/source_en/GetStarted/Schedulers.md
@@ -1,11 +1,11 @@
# Schedulers
-Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, requiring no additional configuration.
+Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, **requiring no additional configuration**.
The supported schedulers are:
- **EnhancedDDIMScheduler**: Extends the denoising process introduced in the Denoising Diffusion Probabilistic Models (DDPM) with non-Markovian guidance.
-- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in Stable Diffusion 3.
+- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in [Stable Diffusion 3](https://arxiv.org/abs/2403.03206).
- **ContinuousODEScheduler**: A scheduler based on Ordinary Differential Equations (ODE).
\ No newline at end of file
diff --git a/docs/source_en/finetune/train_kolors_lora.md b/docs/source_en/finetune/train_kolors_lora.md
index dae9d5c..c9a102c 100644
--- a/docs/source_en/finetune/train_kolors_lora.md
+++ b/docs/source_en/finetune/train_kolors_lora.md
@@ -1,7 +1,5 @@
-# 训练 Kolors LoRA
-
-以下文件将用于构建 Kolors。你可以从 [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) 或 [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors) 下载 Kolors。由于精度溢出问题,我们需要下载额外的 VAE 模型(从 [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) 或 [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix))。你可以使用以下代码下载这些文件:
-
+# Training Kolors LoRA
+The following files will be used to build Kolors. You can download Kolors from [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) or [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors). Due to precision overflow issues, we need to download an additional VAE model (from [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) or [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix). You can use the following code to download these files:
```python
from diffsynth import download_models
@@ -31,7 +29,7 @@ models
└── diffusion_pytorch_model.safetensors
```
-使用下面的命令启动训练任务:
+Use the following command to start the training task:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
@@ -52,9 +50,10 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
--use_gradient_checkpointing
```
-有关参数的更多信息,请使用 `python examples/train/kolors/train_kolors_lora.py -h` 查看详细信息。
+For more information on the parameters, please use `python examples/train/kolors/train_kolors_lora.py -h` to view detailed information.
+
+After the training is complete, use `model_manager.load_lora` to load the LoRA for inference.
-训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
diff --git a/docs/source_en/finetune/train_sd3_lora.md b/docs/source_en/finetune/train_sd3_lora.md
index e370175..fef67ab 100644
--- a/docs/source_en/finetune/train_sd3_lora.md
+++ b/docs/source_en/finetune/train_sd3_lora.md
@@ -1,8 +1,6 @@
-# 训练 Stable Diffusion 3 LoRA
-
-训练脚本只需要一个文件。你可以使用 [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)(没有 T5 Encoder)或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)(有 T5 Encoder)。请使用以下代码下载这些文件:
-
+# Training Stable Diffusion 3 LoRA
+The training script only requires one file. You can use [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)(without T5 Encoder)或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)(with T5 Encoder). Please use the following code to download these files:
```python
from diffsynth import download_models
@@ -16,7 +14,7 @@ models/stable_diffusion_3/
└── sd3_medium_incl_clips_t5xxlfp16.safetensors
```
-使用下面的命令启动训练任务:
+Use the following command to start the training task:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora.py \
@@ -35,9 +33,9 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora
--use_gradient_checkpointing
```
-有关参数的更多信息,请使用 `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` 查看详细信息。
+For more information on the parameters, please use `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` to view detailed information.
-训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
+After training is completed, use `model_manager.load_lora` to load LoRA for inference.
```python
from diffsynth import ModelManager, SD3ImagePipeline
diff --git a/docs/source_en/finetune/train_sd_lora.md b/docs/source_en/finetune/train_sd_lora.md
index e3d1abb..63a9c66 100644
--- a/docs/source_en/finetune/train_sd_lora.md
+++ b/docs/source_en/finetune/train_sd_lora.md
@@ -1,6 +1,6 @@
-# 训练 Stable Diffusion LoRA
+# Training Stable Diffusion LoRA
-训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下,我们使用基础的 Stable Diffusion v1.5。你可以从 [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors) 下载。你可以使用以下代码下载这个文件:
+The training script only requires one file. We support mainstream checkpoints on [CivitAI](https://civitai.com/). By default, we use the basic Stable Diffusion v1.5. You can download it from [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) or [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors). You can use the following code to download this file:
```python
from diffsynth import download_models
@@ -14,7 +14,7 @@ models/stable_diffusion
└── v1-5-pruned-emaonly.safetensors
```
-使用以下命令启动训练任务:
+To initiate the training process, please use the following command:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py \
@@ -33,10 +33,9 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py
--use_gradient_checkpointing
```
-有关参数的更多信息,请使用 `python examples/train/stable_diffusion/train_sd_lora.py -h` 查看详细信息。
-
-训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
+For more information about the parameters, please use `python examples/train/stable_diffusion/train_sd_lora.py -h` to view detailed information.
+After training is complete, use `model_manager.load_lora` to load LoRA for inference.
```python
diff --git a/docs/source_en/finetune/train_sdxl_lora.md b/docs/source_en/finetune/train_sdxl_lora.md
index 0b0b746..2029d7e 100644
--- a/docs/source_en/finetune/train_sdxl_lora.md
+++ b/docs/source_en/finetune/train_sdxl_lora.md
@@ -1,7 +1,7 @@
-# 训练 Stable Diffusion XL LoRA
+# Training Stable Diffusion XL LoRA
-训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下,我们使用基础的 Stable Diffusion XL。你可以从 [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors) 下载。也可以使用以下代码下载这个文件:
+The training script only requires one file. We support mainstream checkpoints on [CivitAI](https://civitai.com/). By default, we use the basic Stable Diffusion XL. You can download it from [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors). You can also use the following code to download this file:
```python
from diffsynth import download_models
@@ -14,8 +14,7 @@ models/stable_diffusion_xl
└── sd_xl_base_1.0.safetensors
```
-我们观察到 Stable Diffusion XL 在 float16 精度下会出现数值精度溢出,因此我们建议用户使用 float32 精度训练,使用以下命令启动训练任务:
-
+We have observed that Stable Diffusion XL may experience numerical precision overflows when using float16 precision, so we recommend that users train with float32 precision. To start the training task, use the following command:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lora.py \
--pretrained_path models/stable_diffusion_xl/sd_xl_base_1.0.safetensors \
@@ -33,9 +32,10 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lo
--use_gradient_checkpointing
```
-有关参数的更多信息,请使用 `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` 查看详细信息。
+For more information about the parameters, please use `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` to view detailed information.
+
+After training is complete, use `model_manager.load_lora` to load LoRA for inference.
-训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
```python
from diffsynth import ModelManager, SDXLImagePipeline
diff --git a/docs/source_en/tutorial/ASimpleExample.md b/docs/source_en/tutorial/ASimpleExample.md
new file mode 100644
index 0000000..8a9da10
--- /dev/null
+++ b/docs/source_en/tutorial/ASimpleExample.md
@@ -0,0 +1,85 @@
+# Quick Start
+
+In this document, we introduce how to quickly get started with DiffSynth-Studio for creation through a piece of code.
+
+## Installation
+
+Use the following command to clone and install DiffSynth-Studio from GitHub. For more information, please refer to [Installation](./Installation.md).
+
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+
+## One-click Run!
+
+By running the following code, we will download the model, load the model, and generate an image.
+
+```python
+import torch
+from diffsynth import ModelManager, FluxImagePipeline
+
+model_manager = ModelManager(
+ torch_dtype=torch.bfloat16,
+ device="cuda",
+ model_id_list=["FLUX.1-dev"]
+)
+pipe = FluxImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+ prompt="In a forest, a wooden plank sign reading DiffSynth",
+ height=576, width=1024,
+)
+image.save("image.jpg")
+```
+
+
+
+From this example, we can see that there are two key modules in DiffSynth: `ModelManager` and `Pipeline`. We will introduce them in detail next.
+
+## Downloading and Loading Models
+
+`ModelManager` is responsible for downloading and loading models, which can be done in one step with the following code.
+
+```python
+import torch
+from diffsynth import ModelManager
+
+model_manager = ModelManager(
+ torch_dtype=torch.bfloat16,
+ device="cuda",
+ model_id_list=["FLUX.1-dev"]
+)
+```
+
+Of course, we also support completing this step by step, and the following code is equivalent to the above.
+
+```python
+import torch
+from diffsynth import download_models, ModelManager
+
+download_models(["FLUX.1-dev"])
+model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
+model_manager.load_models([
+ "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
+ "models/FLUX/FLUX.1-dev/text_encoder_2",
+ "models/FLUX/FLUX.1-dev/ae.safetensors",
+ "models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
+])
+```
+
+When downloading models, we support downloading from [ModelScope](https://www.modelscope.cn/) and [HuggingFace](https://huggingface.co/), and we also support downloading non-preset models. For more information about model downloading, please refer to [Model Download](./DownloadModels.md).
+
+When loading models, you can put all the model paths you want to load into it. For model weight files in formats such as `.safetensors`, `ModelManager` will automatically determine the model type after loading; for folder format models, `ModelManager` will try to parse the `config.json` file within and try to call the corresponding module in third-party libraries such as `transformers`. For models supported by DiffSynth-Studio, please refer to [Supported Models](./Models.md).
+
+## Building Pipeline
+
+DiffSynth-Studio provides multiple inference `Pipeline`s, which can be directly obtained through `ModelManager` to get the required models and initialize. For example, the text-to-image `Pipeline` for the FLUX.1-dev model can be constructed as follows:
+
+```python
+pipe = FluxImagePipeline.from_model_manager(model_manager)
+```
+
+For more `Pipeline`s used for image generation and video generation, see [Inference Pipelines](./Pipelines.md).
diff --git a/docs/source_en/tutorial/DownloadModels.md b/docs/source_en/tutorial/DownloadModels.md
new file mode 100644
index 0000000..ad4769f
--- /dev/null
+++ b/docs/source_en/tutorial/DownloadModels.md
@@ -0,0 +1,34 @@
+# Download Models
+
+We have preset some mainstream Diffusion model download links in DiffSynth-Studio, which you can download and use.
+
+## Download Preset Models
+
+You can directly use the `download_models` function to download the preset model files, where the model ID can refer to the [config file](/diffsynth/configs/model_config.py).
+
+```python
+from diffsynth import download_models
+
+download_models(["FLUX.1-dev"])
+```
+
+For VSCode users, after activating Pylance or other Python language services, typing `""` in the code will display all supported model IDs.
+
+
+
+## Download Non-Preset Models
+
+You can select models from two download sources: [ModelScope](https://modelscope.cn/models) and [HuggingFace](https://huggingface.co/models). Of course, you can also manually download the models you need through browsers or other tools.
+
+```python
+from diffsynth import download_customized_models
+
+download_customized_models(
+ model_id="Kwai-Kolors/Kolors",
+ origin_file_path="vae/diffusion_pytorch_model.fp16.bin",
+ local_dir="models/kolors/Kolors/vae",
+ downloading_priority=["ModelScope", "HuggingFace"]
+)
+```
+
+In this code snippet, we will prioritize downloading from `ModelScope` according to the download priority, and download the file `vae/diffusion_pytorch_model.fp16.bin` from the model repository with ID `Kwai-Kolors/Kolors` in the [model library](https://modelscope.cn/models/Kwai-Kolors/Kolors) to the local path `models/kolors/Kolors/vae`.
diff --git a/docs/source_en/tutorial/Extensions.md b/docs/source_en/tutorial/Extensions.md
new file mode 100644
index 0000000..41511cc
--- /dev/null
+++ b/docs/source_en/tutorial/Extensions.md
@@ -0,0 +1,49 @@
+# Extension Features
+
+This document introduces some technologies related to the Diffusion models implemented in DiffSynth, which have significant application potential in image and video processing.
+
+- **[RIFE](https://github.com/hzwer/ECCV2022-RIFE)**: RIFE is a frame interpolation method based on real-time intermediate flow estimation. It uses a model with an IFNet structure that can quickly estimate intermediate flows end-to-end. RIFE does not rely on pre-trained optical flow models and supports frame interpolation at arbitrary time steps, processing through time-encoded inputs.
+
+ In this code snippet, we use the RIFE model to double the frame rate of a video.
+
+ ```python
+ from diffsynth import VideoData, ModelManager, save_video
+ from diffsynth.extensions.RIFE import RIFEInterpolater
+
+ model_manager = ModelManager(model_id_list=["RIFE"])
+ rife = RIFEInterpolater.from_model_manager(model_manager)
+ video = VideoData("input_video.mp4", height=512, width=768).raw_data()
+ video = rife.interpolate(video)
+ save_video(video, "output_video.mp4", fps=60)
+ ```
+
+- **[ESRGAN](https://github.com/xinntao/ESRGAN)**: ESRGAN is an image super-resolution model that can achieve a fourfold increase in resolution. This method significantly enhances the realism of generated images by optimizing network architecture, adversarial loss, and perceptual loss.
+
+ In this code snippet, we use the ESRGAN model to quadruple the resolution of an image.
+
+ ```python
+ from PIL import Image
+ from diffsynth import ModelManager
+ from diffsynth.extensions.ESRGAN import ESRGAN
+
+ model_manager = ModelManager(model_id_list=["ESRGAN_x4"])
+ esrgan = ESRGAN.from_model_manager(model_manager)
+ image = Image.open("input_image.jpg")
+ image = esrgan.upscale(image)
+ image.save("output_image.jpg")
+ ```
+
+- **[FastBlend](https://arxiv.org/abs/2311.09265)**: FastBlend is a model-free video de-flickering algorithm. Flicker often occurs in style videos processed frame by frame using image generation models. FastBlend can eliminate flicker in style videos based on the motion features in the original video (guide video).
+
+ In this code snippet, we use FastBlend to remove the flicker effect from a style video.
+
+ ```python
+ from diffsynth import VideoData, save_video
+ from diffsynth.extensions.FastBlend import FastBlendSmoother
+
+ fastblend = FastBlendSmoother()
+ guide_video = VideoData("guide_video.mp4", height=512, width=768).raw_data()
+ style_video = VideoData("style_video.mp4", height=512, width=768).raw_data()
+ output_video = fastblend(style_video, original_frames=guide_video)
+ save_video(output_video, "output_video.mp4", fps=30)
+ ```
diff --git a/docs/source_en/tutorial/Installation.md b/docs/source_en/tutorial/Installation.md
new file mode 100644
index 0000000..6bfb809
--- /dev/null
+++ b/docs/source_en/tutorial/Installation.md
@@ -0,0 +1,24 @@
+# Installation
+
+## From Source
+
+1. Clone the source repository:
+
+ ```bash
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
+ ```
+
+2. Navigate to the project directory and install:
+
+ ```bash
+ cd DiffSynth-Studio
+ pip install -e .
+ ```
+
+## From PyPI
+
+Install directly via PyPI:
+
+```bash
+pip install diffsynth
+```
\ No newline at end of file
diff --git a/docs/source_en/tutorial/Models.md b/docs/source_en/tutorial/Models.md
new file mode 100644
index 0000000..d1a7ed0
--- /dev/null
+++ b/docs/source_en/tutorial/Models.md
@@ -0,0 +1,18 @@
+# 模型
+
+目前为止,DiffSynth Studio 支持的模型如下所示:
+
+* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)
+* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
+* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
+* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
+* [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
+* [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)
+* [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)
+* [RIFE](https://github.com/hzwer/ECCV2022-RIFE)
+* [ESRGAN](https://github.com/xinntao/ESRGAN)
+* [Ip-Adapter](https://github.com/tencent-ailab/IP-Adapter)
+* [AnimateDiff](https://github.com/guoyww/animatediff/)
+* [ControlNet](https://github.com/lllyasviel/ControlNet)
+* [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
+* [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)
diff --git a/docs/source_en/tutorial/Pipelines.md b/docs/source_en/tutorial/Pipelines.md
new file mode 100644
index 0000000..67e6c1c
--- /dev/null
+++ b/docs/source_en/tutorial/Pipelines.md
@@ -0,0 +1,22 @@
+# Pipelines
+
+DiffSynth-Studio includes multiple pipelines, categorized into two types: image generation and video generation.
+
+## Image Pipelines
+
+| Pipeline | Models |
+|----------------------------|----------------------------------------------------------------|
+| SDImagePipeline | text_encoder: SDTextEncoder
unet: SDUNet
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder
controlnet: MultiControlNetManager
ipadapter_image_encoder: IpAdapterCLIPImageEmbedder
ipadapter: SDIpAdapter |
+| SDXLImagePipeline | text_encoder: SDXLTextEncoder
text_encoder_2: SDXLTextEncoder2
text_encoder_kolors: ChatGLMModel
unet: SDXLUNet
vae_decoder: SDXLVAEDecoder
vae_encoder: SDXLVAEEncoder
controlnet: MultiControlNetManager
ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder
ipadapter: SDXLIpAdapter |
+| SD3ImagePipeline | text_encoder_1: SD3TextEncoder1
text_encoder_2: SD3TextEncoder2
text_encoder_3: SD3TextEncoder3
dit: SD3DiT
vae_decoder: SD3VAEDecoder
vae_encoder: SD3VAEEncoder |
+| HunyuanDiTImagePipeline | text_encoder: HunyuanDiTCLIPTextEncoder
text_encoder_t5: HunyuanDiTT5TextEncoder
dit: HunyuanDiT
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder |
+| FluxImagePipeline | text_encoder_1: FluxTextEncoder1
text_encoder_2: FluxTextEncoder2
dit: FluxDiT
vae_decoder: FluxVAEDecoder
vae_encoder: FluxVAEEncoder |
+
+## Video Pipelines
+
+| Pipeline | Models |
+|----------------------------|----------------------------------------------------------------|
+| SDVideoPipeline | text_encoder: SDTextEncoder
unet: SDUNet
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder
controlnet: MultiControlNetManager
ipadapter_image_encoder: IpAdapterCLIPImageEmbedder
ipadapter: SDIpAdapter
motion_modules: SDMotionModel |
+| SDXLVideoPipeline | text_encoder: SDXLTextEncoder
text_encoder_2: SDXLTextEncoder2
text_encoder_kolors: ChatGLMModel
unet: SDXLUNet
vae_decoder: SDXLVAEDecoder
vae_encoder: SDXLVAEEncoder
ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder
ipadapter: SDXLIpAdapter
motion_modules: SDXLMotionModel |
+| SVDVideoPipeline | image_encoder: SVDImageEncoder
unet: SVDUNet
vae_encoder: SVDVAEEncoder
vae_decoder: SVDVAEDecoder |
+| CogVideoPipeline | text_encoder: FluxTextEncoder2
dit: CogDiT
vae_encoder: CogVAEEncoder
vae_decoder: CogVAEDecoder |
diff --git a/docs/source_en/tutorial/PromptProcessing.md b/docs/source_en/tutorial/PromptProcessing.md
new file mode 100644
index 0000000..a2043b0
--- /dev/null
+++ b/docs/source_en/tutorial/PromptProcessing.md
@@ -0,0 +1,35 @@
+# Prompt Processing
+
+DiffSynth includes prompt processing functionality, which is divided into:
+
+- **Prompt Refiners (`prompt_refiner_classes`)**: Includes prompt refinement, prompt translation from Chinese to English, and both refinement and translation of prompts. Available parameters are as follows:
+
+ - **English Prompt Refinement**: 'BeautifulPrompt', using the model [pai-bloom-1b1-text2prompt-sd](https://modelscope.cn/models/AI-ModelScope/pai-bloom-1b1-text2prompt-sd).
+
+ - **Prompt Translation from Chinese to English**: 'Translator', using the model [opus-mt-zh-e](https://modelscope.cn/models/moxying/opus-mt-zh-en).
+
+ - **Prompt Translation and Refinement**: 'QwenPrompt', using the model [Qwen2-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct).
+
+- **Prompt Extenders (`prompt_extender_classes`)**: Based on Omost's prompt partition control expansion. Available parameter is:
+
+ - **Prompt Partition Expansion**: 'OmostPromter'.
+
+## Usage Instructions
+
+### Prompt Refiners
+
+When loading the model pipeline, you can specify the desired prompt refiner functionality using the `prompt_refiner_classes` parameter. For example code, refer to [sd_prompt_refining.py](examples/image_synthesis/sd_prompt_refining.py).
+
+Available `prompt_refiner_classes` parameters include: Translator, BeautifulPrompt, QwenPrompt.
+
+```python
+pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[Translator, BeautifulPrompt])
+```
+
+### Prompt Extenders
+
+When loading the model pipeline, you can specify the desired prompt extender using the prompt_extender_classes parameter. For example code, refer to [omost_flux_text_to_image.py](examples/image_synthesis/omost_flux_text_to_image.py).
+
+```python
+pipe = FluxImagePipeline.from_model_manager(model_manager, prompt_extender_classes=[OmostPromter])
+```
diff --git a/docs/source_en/tutorial/Schedulers.md b/docs/source_en/tutorial/Schedulers.md
new file mode 100644
index 0000000..757d9ba
--- /dev/null
+++ b/docs/source_en/tutorial/Schedulers.md
@@ -0,0 +1,11 @@
+# Schedulers
+
+Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, **requiring no additional configuration**.
+
+The supported schedulers are:
+
+- **EnhancedDDIMScheduler**: Extends the denoising process introduced in the Denoising Diffusion Probabilistic Models (DDPM) with non-Markovian guidance.
+
+- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in [Stable Diffusion 3](https://arxiv.org/abs/2403.03206).
+
+- **ContinuousODEScheduler**: A scheduler based on Ordinary Differential Equations (ODE).
\ No newline at end of file