Add files via upload

第一版翻译完成，保留了getStart目录，有一些名词还是需要重新检查
2026-04-08 08:58:20 +00:00 · 2024-10-18 18:02:52 +08:00
parent 24b78148b8
commit 883d26abb4
15 changed files with 308 additions and 38 deletions
--- a/docs/source_en/GetStarted/Models.md
+++ b/docs/source_en/GetStarted/Models.md
@@ -2,6 +2,7 @@

 Until now, DiffSynth Studio has supported the following models:

+* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)
 * [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
 * [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
 * [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
--- a/docs/source_en/GetStarted/Pipelines.md
+++ b/docs/source_en/GetStarted/Pipelines.md
@@ -1,27 +1,22 @@
 # Pipelines

-So far, the following table lists our pipelines and the models supported by each pipeline.
+DiffSynth-Studio includes multiple pipelines, categorized into two types: image generation and video generation.

 ## Image Pipelines

-Pipelines for generating images from text descriptions. Each pipeline relies on specific encoder and decoder models.
-
 | Pipeline                   | Models                                                     |
 |----------------------------|----------------------------------------------------------------|
-| HunyuanDiTImagePipeline     | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
 | SDImagePipeline             | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter |
-| SD3ImagePipeline            | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
 | SDXLImagePipeline           | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter |
+| SD3ImagePipeline            | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
+| HunyuanDiTImagePipeline     | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
+| FluxImagePipeline     | text_encoder_1: FluxTextEncoder1<br>text_encoder_2: FluxTextEncoder2<br>dit: FluxDiT<br>vae_decoder: FluxVAEDecoder<br>vae_encoder: FluxVAEEncoder |

 ## Video Pipelines

-Pipelines for generating videos from text descriptions. In addition to the models required for image generation, they include models for handling motion modules.
-
 | Pipeline                   | Models                                                     |
 |----------------------------|----------------------------------------------------------------|
 | SDVideoPipeline            | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter<br>motion_modules: SDMotionModel |
 | SDXLVideoPipeline          | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter<br>motion_modules: SDXLMotionModel |
 | SVDVideoPipeline           | image_encoder: SVDImageEncoder<br>unet: SVDUNet<br>vae_encoder: SVDVAEEncoder<br>vae_decoder: SVDVAEDecoder |
-
-
-
+| CogVideoPipeline           | text_encoder: FluxTextEncoder2<br>dit: CogDiT<br>vae_encoder: CogVAEEncoder<br>vae_decoder: CogVAEDecoder |
--- a/docs/source_en/GetStarted/Schedulers.md
+++ b/docs/source_en/GetStarted/Schedulers.md
@@ -1,11 +1,11 @@
 # Schedulers

-Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, requiring no additional configuration.
+Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, **requiring no additional configuration**.

 The supported schedulers are:

 - **EnhancedDDIMScheduler**: Extends the denoising process introduced in the Denoising Diffusion Probabilistic Models (DDPM) with non-Markovian guidance.

- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in Stable Diffusion 3.
+- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in [Stable Diffusion 3](https://arxiv.org/abs/2403.03206).

 - **ContinuousODEScheduler**: A scheduler based on Ordinary Differential Equations (ODE).
--- a/docs/source_en/finetune/train_kolors_lora.md
+++ b/docs/source_en/finetune/train_kolors_lora.md
@@ -1,7 +1,5 @@
-# 训练 Kolors LoRA
-
-以下文件将用于构建 Kolors。你可以从 [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) 或 [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors) 下载 Kolors。由于精度溢出问题，我们需要下载额外的 VAE 模型（从 [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) 或 [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix)）。你可以使用以下代码下载这些文件：
-
+# Training Kolors LoRA
+The following files will be used to build Kolors. You can download Kolors from [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) or [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors). Due to precision overflow issues, we need to download an additional VAE model （from [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) or [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix). You can use the following code to download these files:

 ```python
 from diffsynth import download_models
@@ -31,7 +29,7 @@ models
    └── diffusion_pytorch_model.safetensors
 ```

-使用下面的命令启动训练任务：
+Use the following command to start the training task:

 ```
 CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
@@ -52,9 +50,10 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
  --use_gradient_checkpointing
 ```

-有关参数的更多信息，请使用 `python examples/train/kolors/train_kolors_lora.py -h` 查看详细信息。
+For more information on the parameters, please use `python examples/train/kolors/train_kolors_lora.py -h` to view detailed information.
+
+After the training is complete, use `model_manager.load_lora` to load the LoRA for inference.

-训练完成后，使用 `model_manager.load_lora` 加载 LoRA 以进行推理。



--- a/docs/source_en/finetune/train_sd3_lora.md
+++ b/docs/source_en/finetune/train_sd3_lora.md
@@ -1,8 +1,6 @@
-# 训练 Stable Diffusion 3 LoRA
-
-训练脚本只需要一个文件。你可以使用 [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)（没有 T5 Encoder）或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)（有 T5 Encoder）。请使用以下代码下载这些文件：
-
+# Training Stable Diffusion 3 LoRA

+The training script only requires one file. You can use [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)（without T5 Encoder）或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)（with T5 Encoder）. Please use the following code to download these files:
 ```python
 from diffsynth import download_models

@@ -16,7 +14,7 @@ models/stable_diffusion_3/
 └── sd3_medium_incl_clips_t5xxlfp16.safetensors
 ```

-使用下面的命令启动训练任务：
+Use the following command to start the training task:

 ```
 CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora.py \
@@ -35,9 +33,9 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora
  --use_gradient_checkpointing
 ```

-有关参数的更多信息，请使用 `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` 查看详细信息。
+For more information on the parameters, please use `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` to view detailed information.

-训练完成后，使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
+After training is completed, use `model_manager.load_lora` to load LoRA for inference.

 ```python
 from diffsynth import ModelManager, SD3ImagePipeline
--- a/docs/source_en/finetune/train_sd_lora.md
+++ b/docs/source_en/finetune/train_sd_lora.md
@@ -1,6 +1,6 @@
-# 训练 Stable Diffusion LoRA
+# Training Stable Diffusion LoRA

-训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下，我们使用基础的 Stable Diffusion v1.5。你可以从 [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors) 下载。你可以使用以下代码下载这个文件：
+The training script only requires one file. We support mainstream checkpoints on [CivitAI](https://civitai.com/). By default, we use the basic Stable Diffusion v1.5. You can download it from [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) or [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors). You can use the following code to download this file:

 ```python
 from diffsynth import download_models
@@ -14,7 +14,7 @@ models/stable_diffusion
 └── v1-5-pruned-emaonly.safetensors
 ```

-使用以下命令启动训练任务：
+To initiate the training process, please use the following command:

 ```
 CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py \
@@ -33,10 +33,9 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py
  --use_gradient_checkpointing
 ```

-有关参数的更多信息，请使用 `python examples/train/stable_diffusion/train_sd_lora.py -h` 查看详细信息。
-
-训练完成后，使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
+For more information about the parameters, please use `python examples/train/stable_diffusion/train_sd_lora.py -h` to view detailed information.

+After training is complete, use `model_manager.load_lora` to load LoRA for inference.


 ```python
--- a/docs/source_en/finetune/train_sdxl_lora.md
+++ b/docs/source_en/finetune/train_sdxl_lora.md
@@ -1,7 +1,7 @@
-# 训练 Stable Diffusion XL LoRA
+# Training Stable Diffusion XL LoRA

-训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下，我们使用基础的 Stable Diffusion XL。你可以从 [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors) 下载。也可以使用以下代码下载这个文件：

+The training script only requires one file. We support mainstream checkpoints on [CivitAI](https://civitai.com/). By default, we use the basic Stable Diffusion XL. You can download it from [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors). You can also use the following code to download this file:
 ```python
 from diffsynth import download_models

@@ -14,8 +14,7 @@ models/stable_diffusion_xl
 └── sd_xl_base_1.0.safetensors
 ```

-我们观察到 Stable Diffusion XL 在 float16 精度下会出现数值精度溢出，因此我们建议用户使用 float32 精度训练，使用以下命令启动训练任务：
-
+We have observed that Stable Diffusion XL may experience numerical precision overflows when using float16 precision, so we recommend that users train with float32 precision. To start the training task, use the following command:
 ```
 CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lora.py \
  --pretrained_path models/stable_diffusion_xl/sd_xl_base_1.0.safetensors \
@@ -33,9 +32,10 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lo
  --use_gradient_checkpointing
 ```

-有关参数的更多信息，请使用 `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` 查看详细信息。
+For more information about the parameters, please use `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` to view detailed information.
+
+After training is complete, use `model_manager.load_lora` to load LoRA for inference.

-训练完成后，使用 `model_manager.load_lora` 加载 LoRA 以进行推理。

 ```python
 from diffsynth import ModelManager, SDXLImagePipeline
--- a/docs/source_en/tutorial/ASimpleExample.md
+++ b/docs/source_en/tutorial/ASimpleExample.md
@@ -0,0 +1,85 @@
+# Quick Start
+
+In this document, we introduce how to quickly get started with DiffSynth-Studio for creation through a piece of code.
+
+## Installation
+
+Use the following command to clone and install DiffSynth-Studio from GitHub. For more information, please refer to [Installation](./Installation.md).
+
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+
+## One-click Run!
+
+By running the following code, we will download the model, load the model, and generate an image.
+
+```python
+import torch
+from diffsynth import ModelManager, FluxImagePipeline
+
+model_manager = ModelManager(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_id_list=["FLUX.1-dev"]
+)
+pipe = FluxImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="In a forest, a wooden plank sign reading DiffSynth",
+    height=576, width=1024,
+)
+image.save("image.jpg")
+```
+
+![image](https://github.com/user-attachments/assets/15a52a2b-2f18-46fe-810c-cb3ad2853919)
+
+From this example, we can see that there are two key modules in DiffSynth: `ModelManager` and `Pipeline`. We will introduce them in detail next.
+
+## Downloading and Loading Models
+
+`ModelManager` is responsible for downloading and loading models, which can be done in one step with the following code.
+
+```python
+import torch
+from diffsynth import ModelManager
+
+model_manager = ModelManager(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_id_list=["FLUX.1-dev"]
+)
+```
+
+Of course, we also support completing this step by step, and the following code is equivalent to the above.
+
+```python
+import torch
+from diffsynth import download_models, ModelManager
+
+download_models(["FLUX.1-dev"])
+model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
+model_manager.load_models([
+    "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
+    "models/FLUX/FLUX.1-dev/text_encoder_2",
+    "models/FLUX/FLUX.1-dev/ae.safetensors",
+    "models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
+])
+```
+
+When downloading models, we support downloading from [ModelScope](https://www.modelscope.cn/) and [HuggingFace](https://huggingface.co/), and we also support downloading non-preset models. For more information about model downloading, please refer to [Model Download](./DownloadModels.md).
+
+When loading models, you can put all the model paths you want to load into it. For model weight files in formats such as `.safetensors`, `ModelManager` will automatically determine the model type after loading; for folder format models, `ModelManager` will try to parse the `config.json` file within and try to call the corresponding module in third-party libraries such as `transformers`. For models supported by DiffSynth-Studio, please refer to [Supported Models](./Models.md).
+
+## Building Pipeline
+
+DiffSynth-Studio provides multiple inference `Pipeline`s, which can be directly obtained through `ModelManager` to get the required models and initialize. For example, the text-to-image `Pipeline` for the FLUX.1-dev model can be constructed as follows:
+
+```python
+pipe = FluxImagePipeline.from_model_manager(model_manager)
+```
+
+For more `Pipeline`s used for image generation and video generation, see [Inference Pipelines](./Pipelines.md).
--- a/docs/source_en/tutorial/DownloadModels.md
+++ b/docs/source_en/tutorial/DownloadModels.md
@@ -0,0 +1,34 @@
+# Download Models
+
+We have preset some mainstream Diffusion model download links in DiffSynth-Studio, which you can download and use.
+
+## Download Preset Models
+
+You can directly use the `download_models` function to download the preset model files, where the model ID can refer to the [config file](/diffsynth/configs/model_config.py).
+
+```python
+from diffsynth import download_models
+
+download_models(["FLUX.1-dev"])
+```
+
+For VSCode users, after activating Pylance or other Python language services, typing `""` in the code will display all supported model IDs.
+
+![image](https://github.com/user-attachments/assets/2bbfec32-e015-45a7-98d9-57af13200b7c)
+
+## Download Non-Preset Models
+
+You can select models from two download sources: [ModelScope](https://modelscope.cn/models) and [HuggingFace](https://huggingface.co/models). Of course, you can also manually download the models you need through browsers or other tools.
+
+```python
+from diffsynth import download_customized_models
+
+download_customized_models(
+    model_id="Kwai-Kolors/Kolors",
+    origin_file_path="vae/diffusion_pytorch_model.fp16.bin",
+    local_dir="models/kolors/Kolors/vae",
+    downloading_priority=["ModelScope", "HuggingFace"]
+)
+```
+
+In this code snippet, we will prioritize downloading from `ModelScope` according to the download priority, and download the file `vae/diffusion_pytorch_model.fp16.bin` from the model repository with ID `Kwai-Kolors/Kolors` in the [model library](https://modelscope.cn/models/Kwai-Kolors/Kolors) to the local path `models/kolors/Kolors/vae`.
--- a/docs/source_en/tutorial/Extensions.md
+++ b/docs/source_en/tutorial/Extensions.md
@@ -0,0 +1,49 @@
+# Extension Features
+
+This document introduces some technologies related to the Diffusion models implemented in DiffSynth, which have significant application potential in image and video processing.
+
+- **[RIFE](https://github.com/hzwer/ECCV2022-RIFE)**: RIFE is a frame interpolation method based on real-time intermediate flow estimation. It uses a model with an IFNet structure that can quickly estimate intermediate flows end-to-end. RIFE does not rely on pre-trained optical flow models and supports frame interpolation at arbitrary time steps, processing through time-encoded inputs.
+
+    In this code snippet, we use the RIFE model to double the frame rate of a video.
+
+    ```python
+    from diffsynth import VideoData, ModelManager, save_video
+    from diffsynth.extensions.RIFE import RIFEInterpolater
+
+    model_manager = ModelManager(model_id_list=["RIFE"])
+    rife = RIFEInterpolater.from_model_manager(model_manager)
+    video = VideoData("input_video.mp4", height=512, width=768).raw_data()
+    video = rife.interpolate(video)
+    save_video(video, "output_video.mp4", fps=60)
+    ```
+
+- **[ESRGAN](https://github.com/xinntao/ESRGAN)**: ESRGAN is an image super-resolution model that can achieve a fourfold increase in resolution. This method significantly enhances the realism of generated images by optimizing network architecture, adversarial loss, and perceptual loss.
+
+    In this code snippet, we use the ESRGAN model to quadruple the resolution of an image.
+
+    ```python
+    from PIL import Image
+    from diffsynth import ModelManager
+    from diffsynth.extensions.ESRGAN import ESRGAN
+
+    model_manager = ModelManager(model_id_list=["ESRGAN_x4"])
+    esrgan = ESRGAN.from_model_manager(model_manager)
+    image = Image.open("input_image.jpg")
+    image = esrgan.upscale(image)
+    image.save("output_image.jpg")
+    ```
+
+- **[FastBlend](https://arxiv.org/abs/2311.09265)**: FastBlend is a model-free video de-flickering algorithm. Flicker often occurs in style videos processed frame by frame using image generation models. FastBlend can eliminate flicker in style videos based on the motion features in the original video (guide video).
+
+    In this code snippet, we use FastBlend to remove the flicker effect from a style video.
+
+    ```python
+    from diffsynth import VideoData, save_video
+    from diffsynth.extensions.FastBlend import FastBlendSmoother
+
+    fastblend = FastBlendSmoother()
+    guide_video = VideoData("guide_video.mp4", height=512, width=768).raw_data()
+    style_video = VideoData("style_video.mp4", height=512, width=768).raw_data()
+    output_video = fastblend(style_video, original_frames=guide_video)
+    save_video(output_video, "output_video.mp4", fps=30)
+    ```
--- a/docs/source_en/tutorial/Installation.md
+++ b/docs/source_en/tutorial/Installation.md
@@ -0,0 +1,24 @@
+# Installation
+
+## From Source
+
+1. Clone the source repository:
+
+    ```bash
+    git clone https://github.com/modelscope/DiffSynth-Studio.git
+    ```
+
+2. Navigate to the project directory and install:
+
+    ```bash
+    cd DiffSynth-Studio
+    pip install -e .
+    ```
+
+## From PyPI
+
+Install directly via PyPI:
+
+```bash
+pip install diffsynth
+```
--- a/docs/source_en/tutorial/Models.md
+++ b/docs/source_en/tutorial/Models.md
@@ -0,0 +1,18 @@
+# 模型
+
+目前为止，DiffSynth Studio 支持的模型如下所示：
+
+* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)
+* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
+* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
+* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
+* [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
+* [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)
+* [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)
+* [RIFE](https://github.com/hzwer/ECCV2022-RIFE)
+* [ESRGAN](https://github.com/xinntao/ESRGAN)
+* [Ip-Adapter](https://github.com/tencent-ailab/IP-Adapter)
+* [AnimateDiff](https://github.com/guoyww/animatediff/)
+* [ControlNet](https://github.com/lllyasviel/ControlNet)
+* [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
+* [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)
--- a/docs/source_en/tutorial/Pipelines.md
+++ b/docs/source_en/tutorial/Pipelines.md
@@ -0,0 +1,22 @@
+# Pipelines
+
+DiffSynth-Studio includes multiple pipelines, categorized into two types: image generation and video generation.
+
+## Image Pipelines
+
+| Pipeline                   | Models                                                     |
+|----------------------------|----------------------------------------------------------------|
+| SDImagePipeline             | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter |
+| SDXLImagePipeline           | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter |
+| SD3ImagePipeline            | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
+| HunyuanDiTImagePipeline     | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
+| FluxImagePipeline     | text_encoder_1: FluxTextEncoder1<br>text_encoder_2: FluxTextEncoder2<br>dit: FluxDiT<br>vae_decoder: FluxVAEDecoder<br>vae_encoder: FluxVAEEncoder |
+
+## Video Pipelines
+
+| Pipeline                   | Models                                                     |
+|----------------------------|----------------------------------------------------------------|
+| SDVideoPipeline            | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter<br>motion_modules: SDMotionModel |
+| SDXLVideoPipeline          | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter<br>motion_modules: SDXLMotionModel |
+| SVDVideoPipeline           | image_encoder: SVDImageEncoder<br>unet: SVDUNet<br>vae_encoder: SVDVAEEncoder<br>vae_decoder: SVDVAEDecoder |
+| CogVideoPipeline           | text_encoder: FluxTextEncoder2<br>dit: CogDiT<br>vae_encoder: CogVAEEncoder<br>vae_decoder: CogVAEDecoder |
--- a/docs/source_en/tutorial/PromptProcessing.md
+++ b/docs/source_en/tutorial/PromptProcessing.md
@@ -0,0 +1,35 @@
+# Prompt Processing
+
+DiffSynth includes prompt processing functionality, which is divided into:
+
+- **Prompt Refiners (`prompt_refiner_classes`)**: Includes prompt refinement, prompt translation from Chinese to English, and both refinement and translation of prompts. Available parameters are as follows:
+
+    - **English Prompt Refinement**: 'BeautifulPrompt', using the model [pai-bloom-1b1-text2prompt-sd](https://modelscope.cn/models/AI-ModelScope/pai-bloom-1b1-text2prompt-sd).
+
+    - **Prompt Translation from Chinese to English**: 'Translator', using the model [opus-mt-zh-e](https://modelscope.cn/models/moxying/opus-mt-zh-en).
+
+    - **Prompt Translation and Refinement**: 'QwenPrompt', using the model [Qwen2-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct).
+
+- **Prompt Extenders (`prompt_extender_classes`)**: Based on Omost's prompt partition control expansion. Available parameter is:
+
+    - **Prompt Partition Expansion**: 'OmostPromter'.
+
+## Usage Instructions
+
+### Prompt Refiners
+
+When loading the model pipeline, you can specify the desired prompt refiner functionality using the `prompt_refiner_classes` parameter. For example code, refer to [sd_prompt_refining.py](examples/image_synthesis/sd_prompt_refining.py).
+
+Available `prompt_refiner_classes` parameters include: Translator, BeautifulPrompt, QwenPrompt.
+
+```python
+pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[Translator, BeautifulPrompt])
+```
+
+### Prompt Extenders
+
+When loading the model pipeline, you can specify the desired prompt extender using the prompt_extender_classes parameter. For example code, refer to [omost_flux_text_to_image.py](examples/image_synthesis/omost_flux_text_to_image.py).
+
+```python
+pipe = FluxImagePipeline.from_model_manager(model_manager, prompt_extender_classes=[OmostPromter])
+```
--- a/docs/source_en/tutorial/Schedulers.md
+++ b/docs/source_en/tutorial/Schedulers.md
@@ -0,0 +1,11 @@
+# Schedulers
+
+Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, **requiring no additional configuration**.
+
+The supported schedulers are:
+
+- **EnhancedDDIMScheduler**: Extends the denoising process introduced in the Denoising Diffusion Probabilistic Models (DDPM) with non-Markovian guidance.
+
+- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in [Stable Diffusion 3](https://arxiv.org/abs/2403.03206).
+
+- **ContinuousODEScheduler**: A scheduler based on Ordinary Differential Equations (ODE).