Add files via upload

第一版翻译完成,保留了getStart目录,有一些名词还是需要重新检查
This commit is contained in:
yrk111222
2024-10-18 18:02:52 +08:00
committed by GitHub
parent 24b78148b8
commit 883d26abb4
15 changed files with 308 additions and 38 deletions

View File

@@ -2,6 +2,7 @@
Until now, DiffSynth Studio has supported the following models:
* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)
* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)

View File

@@ -1,27 +1,22 @@
# Pipelines
So far, the following table lists our pipelines and the models supported by each pipeline.
DiffSynth-Studio includes multiple pipelines, categorized into two types: image generation and video generation.
## Image Pipelines
Pipelines for generating images from text descriptions. Each pipeline relies on specific encoder and decoder models.
| Pipeline | Models |
|----------------------------|----------------------------------------------------------------|
| HunyuanDiTImagePipeline | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
| SDImagePipeline | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter |
| SD3ImagePipeline | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
| SDXLImagePipeline | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter |
| SD3ImagePipeline | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
| HunyuanDiTImagePipeline | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
| FluxImagePipeline | text_encoder_1: FluxTextEncoder1<br>text_encoder_2: FluxTextEncoder2<br>dit: FluxDiT<br>vae_decoder: FluxVAEDecoder<br>vae_encoder: FluxVAEEncoder |
## Video Pipelines
Pipelines for generating videos from text descriptions. In addition to the models required for image generation, they include models for handling motion modules.
| Pipeline | Models |
|----------------------------|----------------------------------------------------------------|
| SDVideoPipeline | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter<br>motion_modules: SDMotionModel |
| SDXLVideoPipeline | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter<br>motion_modules: SDXLMotionModel |
| SVDVideoPipeline | image_encoder: SVDImageEncoder<br>unet: SVDUNet<br>vae_encoder: SVDVAEEncoder<br>vae_decoder: SVDVAEDecoder |
| CogVideoPipeline | text_encoder: FluxTextEncoder2<br>dit: CogDiT<br>vae_encoder: CogVAEEncoder<br>vae_decoder: CogVAEDecoder |

View File

@@ -1,11 +1,11 @@
# Schedulers
Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, requiring no additional configuration.
Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, **requiring no additional configuration**.
The supported schedulers are:
- **EnhancedDDIMScheduler**: Extends the denoising process introduced in the Denoising Diffusion Probabilistic Models (DDPM) with non-Markovian guidance.
- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in Stable Diffusion 3.
- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in [Stable Diffusion 3](https://arxiv.org/abs/2403.03206).
- **ContinuousODEScheduler**: A scheduler based on Ordinary Differential Equations (ODE).

View File

@@ -1,7 +1,5 @@
# 训练 Kolors LoRA
以下文件将用于构建 Kolors。你可以从 [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) 或 [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors) 下载 Kolors。由于精度溢出问题我们需要下载额外的 VAE 模型(从 [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) 或 [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix))。你可以使用以下代码下载这些文件:
# Training Kolors LoRA
The following files will be used to build Kolors. You can download Kolors from [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) or [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors). Due to precision overflow issues, we need to download an additional VAE model from [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) or [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix). You can use the following code to download these files:
```python
from diffsynth import download_models
@@ -31,7 +29,7 @@ models
└── diffusion_pytorch_model.safetensors
```
使用下面的命令启动训练任务:
Use the following command to start the training task:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
@@ -52,9 +50,10 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
--use_gradient_checkpointing
```
有关参数的更多信息,请使用 `python examples/train/kolors/train_kolors_lora.py -h` 查看详细信息。
For more information on the parameters, please use `python examples/train/kolors/train_kolors_lora.py -h` to view detailed information.
After the training is complete, use `model_manager.load_lora` to load the LoRA for inference.
训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。

View File

@@ -1,8 +1,6 @@
# 训练 Stable Diffusion 3 LoRA
训练脚本只需要一个文件。你可以使用 [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)(没有 T5 Encoder或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)(有 T5 Encoder。请使用以下代码下载这些文件
# Training Stable Diffusion 3 LoRA
The training script only requires one file. You can use [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)without T5 Encoder或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)with T5 Encoder. Please use the following code to download these files:
```python
from diffsynth import download_models
@@ -16,7 +14,7 @@ models/stable_diffusion_3/
└── sd3_medium_incl_clips_t5xxlfp16.safetensors
```
使用下面的命令启动训练任务:
Use the following command to start the training task:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora.py \
@@ -35,9 +33,9 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora
--use_gradient_checkpointing
```
有关参数的更多信息,请使用 `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` 查看详细信息。
For more information on the parameters, please use `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` to view detailed information.
训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
After training is completed, use `model_manager.load_lora` to load LoRA for inference.
```python
from diffsynth import ModelManager, SD3ImagePipeline

View File

@@ -1,6 +1,6 @@
# 训练 Stable Diffusion LoRA
# Training Stable Diffusion LoRA
训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下,我们使用基础的 Stable Diffusion v1.5。你可以从 [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors) 下载。你可以使用以下代码下载这个文件:
The training script only requires one file. We support mainstream checkpoints on [CivitAI](https://civitai.com/). By default, we use the basic Stable Diffusion v1.5. You can download it from [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) or [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors). You can use the following code to download this file:
```python
from diffsynth import download_models
@@ -14,7 +14,7 @@ models/stable_diffusion
└── v1-5-pruned-emaonly.safetensors
```
使用以下命令启动训练任务:
To initiate the training process, please use the following command:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py \
@@ -33,10 +33,9 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py
--use_gradient_checkpointing
```
有关参数的更多信息,请使用 `python examples/train/stable_diffusion/train_sd_lora.py -h` 查看详细信息。
训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
For more information about the parameters, please use `python examples/train/stable_diffusion/train_sd_lora.py -h` to view detailed information.
After training is complete, use `model_manager.load_lora` to load LoRA for inference.
```python

View File

@@ -1,7 +1,7 @@
# 训练 Stable Diffusion XL LoRA
# Training Stable Diffusion XL LoRA
训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下,我们使用基础的 Stable Diffusion XL。你可以从 [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors) 下载。也可以使用以下代码下载这个文件:
The training script only requires one file. We support mainstream checkpoints on [CivitAI](https://civitai.com/). By default, we use the basic Stable Diffusion XL. You can download it from [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors). You can also use the following code to download this file:
```python
from diffsynth import download_models
@@ -14,8 +14,7 @@ models/stable_diffusion_xl
└── sd_xl_base_1.0.safetensors
```
我们观察到 Stable Diffusion XL 在 float16 精度下会出现数值精度溢出,因此我们建议用户使用 float32 精度训练,使用以下命令启动训练任务:
We have observed that Stable Diffusion XL may experience numerical precision overflows when using float16 precision, so we recommend that users train with float32 precision. To start the training task, use the following command:
```
CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lora.py \
--pretrained_path models/stable_diffusion_xl/sd_xl_base_1.0.safetensors \
@@ -33,9 +32,10 @@ CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lo
--use_gradient_checkpointing
```
有关参数的更多信息,请使用 `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` 查看详细信息。
For more information about the parameters, please use `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` to view detailed information.
After training is complete, use `model_manager.load_lora` to load LoRA for inference.
训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。
```python
from diffsynth import ModelManager, SDXLImagePipeline

View File

@@ -0,0 +1,85 @@
# Quick Start
In this document, we introduce how to quickly get started with DiffSynth-Studio for creation through a piece of code.
## Installation
Use the following command to clone and install DiffSynth-Studio from GitHub. For more information, please refer to [Installation](./Installation.md).
```shell
git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .
```
## One-click Run!
By running the following code, we will download the model, load the model, and generate an image.
```python
import torch
from diffsynth import ModelManager, FluxImagePipeline
model_manager = ModelManager(
torch_dtype=torch.bfloat16,
device="cuda",
model_id_list=["FLUX.1-dev"]
)
pipe = FluxImagePipeline.from_model_manager(model_manager)
torch.manual_seed(0)
image = pipe(
prompt="In a forest, a wooden plank sign reading DiffSynth",
height=576, width=1024,
)
image.save("image.jpg")
```
![image](https://github.com/user-attachments/assets/15a52a2b-2f18-46fe-810c-cb3ad2853919)
From this example, we can see that there are two key modules in DiffSynth: `ModelManager` and `Pipeline`. We will introduce them in detail next.
## Downloading and Loading Models
`ModelManager` is responsible for downloading and loading models, which can be done in one step with the following code.
```python
import torch
from diffsynth import ModelManager
model_manager = ModelManager(
torch_dtype=torch.bfloat16,
device="cuda",
model_id_list=["FLUX.1-dev"]
)
```
Of course, we also support completing this step by step, and the following code is equivalent to the above.
```python
import torch
from diffsynth import download_models, ModelManager
download_models(["FLUX.1-dev"])
model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
model_manager.load_models([
"models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
"models/FLUX/FLUX.1-dev/text_encoder_2",
"models/FLUX/FLUX.1-dev/ae.safetensors",
"models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
])
```
When downloading models, we support downloading from [ModelScope](https://www.modelscope.cn/) and [HuggingFace](https://huggingface.co/), and we also support downloading non-preset models. For more information about model downloading, please refer to [Model Download](./DownloadModels.md).
When loading models, you can put all the model paths you want to load into it. For model weight files in formats such as `.safetensors`, `ModelManager` will automatically determine the model type after loading; for folder format models, `ModelManager` will try to parse the `config.json` file within and try to call the corresponding module in third-party libraries such as `transformers`. For models supported by DiffSynth-Studio, please refer to [Supported Models](./Models.md).
## Building Pipeline
DiffSynth-Studio provides multiple inference `Pipeline`s, which can be directly obtained through `ModelManager` to get the required models and initialize. For example, the text-to-image `Pipeline` for the FLUX.1-dev model can be constructed as follows:
```python
pipe = FluxImagePipeline.from_model_manager(model_manager)
```
For more `Pipeline`s used for image generation and video generation, see [Inference Pipelines](./Pipelines.md).

View File

@@ -0,0 +1,34 @@
# Download Models
We have preset some mainstream Diffusion model download links in DiffSynth-Studio, which you can download and use.
## Download Preset Models
You can directly use the `download_models` function to download the preset model files, where the model ID can refer to the [config file](/diffsynth/configs/model_config.py).
```python
from diffsynth import download_models
download_models(["FLUX.1-dev"])
```
For VSCode users, after activating Pylance or other Python language services, typing `""` in the code will display all supported model IDs.
![image](https://github.com/user-attachments/assets/2bbfec32-e015-45a7-98d9-57af13200b7c)
## Download Non-Preset Models
You can select models from two download sources: [ModelScope](https://modelscope.cn/models) and [HuggingFace](https://huggingface.co/models). Of course, you can also manually download the models you need through browsers or other tools.
```python
from diffsynth import download_customized_models
download_customized_models(
model_id="Kwai-Kolors/Kolors",
origin_file_path="vae/diffusion_pytorch_model.fp16.bin",
local_dir="models/kolors/Kolors/vae",
downloading_priority=["ModelScope", "HuggingFace"]
)
```
In this code snippet, we will prioritize downloading from `ModelScope` according to the download priority, and download the file `vae/diffusion_pytorch_model.fp16.bin` from the model repository with ID `Kwai-Kolors/Kolors` in the [model library](https://modelscope.cn/models/Kwai-Kolors/Kolors) to the local path `models/kolors/Kolors/vae`.

View File

@@ -0,0 +1,49 @@
# Extension Features
This document introduces some technologies related to the Diffusion models implemented in DiffSynth, which have significant application potential in image and video processing.
- **[RIFE](https://github.com/hzwer/ECCV2022-RIFE)**: RIFE is a frame interpolation method based on real-time intermediate flow estimation. It uses a model with an IFNet structure that can quickly estimate intermediate flows end-to-end. RIFE does not rely on pre-trained optical flow models and supports frame interpolation at arbitrary time steps, processing through time-encoded inputs.
In this code snippet, we use the RIFE model to double the frame rate of a video.
```python
from diffsynth import VideoData, ModelManager, save_video
from diffsynth.extensions.RIFE import RIFEInterpolater
model_manager = ModelManager(model_id_list=["RIFE"])
rife = RIFEInterpolater.from_model_manager(model_manager)
video = VideoData("input_video.mp4", height=512, width=768).raw_data()
video = rife.interpolate(video)
save_video(video, "output_video.mp4", fps=60)
```
- **[ESRGAN](https://github.com/xinntao/ESRGAN)**: ESRGAN is an image super-resolution model that can achieve a fourfold increase in resolution. This method significantly enhances the realism of generated images by optimizing network architecture, adversarial loss, and perceptual loss.
In this code snippet, we use the ESRGAN model to quadruple the resolution of an image.
```python
from PIL import Image
from diffsynth import ModelManager
from diffsynth.extensions.ESRGAN import ESRGAN
model_manager = ModelManager(model_id_list=["ESRGAN_x4"])
esrgan = ESRGAN.from_model_manager(model_manager)
image = Image.open("input_image.jpg")
image = esrgan.upscale(image)
image.save("output_image.jpg")
```
- **[FastBlend](https://arxiv.org/abs/2311.09265)**: FastBlend is a model-free video de-flickering algorithm. Flicker often occurs in style videos processed frame by frame using image generation models. FastBlend can eliminate flicker in style videos based on the motion features in the original video (guide video).
In this code snippet, we use FastBlend to remove the flicker effect from a style video.
```python
from diffsynth import VideoData, save_video
from diffsynth.extensions.FastBlend import FastBlendSmoother
fastblend = FastBlendSmoother()
guide_video = VideoData("guide_video.mp4", height=512, width=768).raw_data()
style_video = VideoData("style_video.mp4", height=512, width=768).raw_data()
output_video = fastblend(style_video, original_frames=guide_video)
save_video(output_video, "output_video.mp4", fps=30)
```

View File

@@ -0,0 +1,24 @@
# Installation
## From Source
1. Clone the source repository:
```bash
git clone https://github.com/modelscope/DiffSynth-Studio.git
```
2. Navigate to the project directory and install:
```bash
cd DiffSynth-Studio
pip install -e .
```
## From PyPI
Install directly via PyPI:
```bash
pip install diffsynth
```

View File

@@ -0,0 +1,18 @@
# 模型
目前为止DiffSynth Studio 支持的模型如下所示:
* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)
* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
* [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
* [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)
* [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)
* [RIFE](https://github.com/hzwer/ECCV2022-RIFE)
* [ESRGAN](https://github.com/xinntao/ESRGAN)
* [Ip-Adapter](https://github.com/tencent-ailab/IP-Adapter)
* [AnimateDiff](https://github.com/guoyww/animatediff/)
* [ControlNet](https://github.com/lllyasviel/ControlNet)
* [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
* [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)

View File

@@ -0,0 +1,22 @@
# Pipelines
DiffSynth-Studio includes multiple pipelines, categorized into two types: image generation and video generation.
## Image Pipelines
| Pipeline | Models |
|----------------------------|----------------------------------------------------------------|
| SDImagePipeline | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter |
| SDXLImagePipeline | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter |
| SD3ImagePipeline | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
| HunyuanDiTImagePipeline | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
| FluxImagePipeline | text_encoder_1: FluxTextEncoder1<br>text_encoder_2: FluxTextEncoder2<br>dit: FluxDiT<br>vae_decoder: FluxVAEDecoder<br>vae_encoder: FluxVAEEncoder |
## Video Pipelines
| Pipeline | Models |
|----------------------------|----------------------------------------------------------------|
| SDVideoPipeline | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter<br>motion_modules: SDMotionModel |
| SDXLVideoPipeline | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter<br>motion_modules: SDXLMotionModel |
| SVDVideoPipeline | image_encoder: SVDImageEncoder<br>unet: SVDUNet<br>vae_encoder: SVDVAEEncoder<br>vae_decoder: SVDVAEDecoder |
| CogVideoPipeline | text_encoder: FluxTextEncoder2<br>dit: CogDiT<br>vae_encoder: CogVAEEncoder<br>vae_decoder: CogVAEDecoder |

View File

@@ -0,0 +1,35 @@
# Prompt Processing
DiffSynth includes prompt processing functionality, which is divided into:
- **Prompt Refiners (`prompt_refiner_classes`)**: Includes prompt refinement, prompt translation from Chinese to English, and both refinement and translation of prompts. Available parameters are as follows:
- **English Prompt Refinement**: 'BeautifulPrompt', using the model [pai-bloom-1b1-text2prompt-sd](https://modelscope.cn/models/AI-ModelScope/pai-bloom-1b1-text2prompt-sd).
- **Prompt Translation from Chinese to English**: 'Translator', using the model [opus-mt-zh-e](https://modelscope.cn/models/moxying/opus-mt-zh-en).
- **Prompt Translation and Refinement**: 'QwenPrompt', using the model [Qwen2-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct).
- **Prompt Extenders (`prompt_extender_classes`)**: Based on Omost's prompt partition control expansion. Available parameter is:
- **Prompt Partition Expansion**: 'OmostPromter'.
## Usage Instructions
### Prompt Refiners
When loading the model pipeline, you can specify the desired prompt refiner functionality using the `prompt_refiner_classes` parameter. For example code, refer to [sd_prompt_refining.py](examples/image_synthesis/sd_prompt_refining.py).
Available `prompt_refiner_classes` parameters include: Translator, BeautifulPrompt, QwenPrompt.
```python
pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[Translator, BeautifulPrompt])
```
### Prompt Extenders
When loading the model pipeline, you can specify the desired prompt extender using the prompt_extender_classes parameter. For example code, refer to [omost_flux_text_to_image.py](examples/image_synthesis/omost_flux_text_to_image.py).
```python
pipe = FluxImagePipeline.from_model_manager(model_manager, prompt_extender_classes=[OmostPromter])
```

View File

@@ -0,0 +1,11 @@
# Schedulers
Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, **requiring no additional configuration**.
The supported schedulers are:
- **EnhancedDDIMScheduler**: Extends the denoising process introduced in the Denoising Diffusion Probabilistic Models (DDPM) with non-Markovian guidance.
- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in [Stable Diffusion 3](https://arxiv.org/abs/2403.03206).
- **ContinuousODEScheduler**: A scheduler based on Ordinary Differential Equations (ODE).