Add files via upload

再改一次
2026-03-19 14:58:12 +00:00 · 2024-10-22 09:56:03 +08:00
parent 157ba2e426
commit f6e676cdf9
46 changed files with 2525 additions and 0 deletions
--- a/docs/source/tutorial/ASimpleExample.md
+++ b/docs/source/tutorial/ASimpleExample.md
@@ -0,0 +1,85 @@
+# 快速开始
+
+在这篇文档中，我们通过一段代码为你介绍如何快速上手使用 DiffSynth-Studio 进行创作。
+
+## 安装
+
+使用以下命令从 GitHub 克隆并安装 DiffSynth-Studio。更多信息请参考[安装](./Installation.md)。
+
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+
+## 一键运行！
+
+通过运行以下代码，我们将会下载模型、加载模型、生成图像。
+
+```python
+import torch
+from diffsynth import ModelManager, FluxImagePipeline
+
+model_manager = ModelManager(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_id_list=["FLUX.1-dev"]
+)
+pipe = FluxImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="In a forest, a wooden plank sign reading DiffSynth",
+    height=576, width=1024,
+)
+image.save("image.jpg")
+```
+
+![image](https://github.com/user-attachments/assets/15a52a2b-2f18-46fe-810c-cb3ad2853919)
+
+从这个例子中，我们可以看到，DiffSynth 中有两个关键模块：`ModelManager` 和 `Pipeline`，接下来我们详细介绍。
+
+## 下载和加载模型
+
+`ModelManager` 负责下载和加载模型，通过以下代码可以直接一步完成。
+
+```python
+import torch
+from diffsynth import ModelManager
+
+model_manager = ModelManager(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_id_list=["FLUX.1-dev"]
+)
+```
+
+当然，我们也支持分步完成，以下代码和上述代码的行为是等价的。
+
+```python
+import torch
+from diffsynth import download_models, ModelManager
+
+download_models(["FLUX.1-dev"])
+model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
+model_manager.load_models([
+    "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
+    "models/FLUX/FLUX.1-dev/text_encoder_2",
+    "models/FLUX/FLUX.1-dev/ae.safetensors",
+    "models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
+])
+```
+
+下载模型时，我们支持从 [ModelScope](https://www.modelscope.cn/) 和 [HuggingFace](https://huggingface.co/) 下载模型，也支持下载非预置的模型，关于模型下载的更多信息请参考[模型下载](./DownloadModels.md)。
+
+加载模型时，你可以把所有想要加载的模型路径放入其中。对于 `.safetensors` 等格式的模型权重文件，`ModelManager` 在加载后会自动判断模型类型；对于文件夹格式的模型，`ModelManager` 会尝试解析其中的 `config.json` 文件并尝试调用 `transformers` 等第三方库中的对应模块。关于 DiffSynth-Studio 支持的模型，请参考[支持的模型](./Models.md)。
+
+## 构建 Pipeline
+
+DiffSynth-Studio 提供了多个推理 `Pipeline`，这些 `Pipeline` 可以直接通过 `ModelManager` 获取所需的模型并初始化。例如，FLUX.1-dev 模型的文生图 `Pipeline` 可以这样构建：
+
+```python
+pipe = FluxImagePipeline.from_model_manager(model_manager)
+```
+
+更多用于图像生成和视频生成的 `Pipeline` 详见[推理流水线](./Pipelines.md)。
--- a/docs/source/tutorial/DownloadModels.md
+++ b/docs/source/tutorial/DownloadModels.md
@@ -0,0 +1,34 @@
+# 下载模型
+
+我们在 DiffSynth-Studio 中预置了一些主流 Diffusion 模型的下载链接，你可以下载并使用这些模型。
+
+## 下载预置模型
+
+你可以直接使用 `download_models` 函数下载预置的模型文件，其中模型 ID 可参考 [config file](/diffsynth/configs/model_config.py)。
+
+```python
+from diffsynth import download_models
+
+download_models(["FLUX.1-dev"])
+```
+
+对于 VSCode 用户，激活 Pylance 或其他 Python 语言服务后，在代码中输入 `""` 即可显示支持的所有模型 ID。
+
+![image](https://github.com/user-attachments/assets/2bbfec32-e015-45a7-98d9-57af13200b7c)
+
+## 下载非预置模型
+
+你可以选择 [ModelScope](https://modelscope.cn/models) 和 [HuggingFace](https://huggingface.co/models) 两个下载源中的模型。当然，你也可以通过浏览器等工具选择手动下载自己所需的模型。
+
+```python
+from diffsynth import download_customized_models
+
+download_customized_models(
+    model_id="Kwai-Kolors/Kolors",
+    origin_file_path="vae/diffusion_pytorch_model.fp16.bin",
+    local_dir="models/kolors/Kolors/vae",
+    downloading_priority=["ModelScope", "HuggingFace"]
+)
+```
+
+在这段代码中，我们将会按照下载的优先级，优先从 `ModelScope` 下载，在 ID 为 `Kwai-Kolors/Kolors` 的[模型库](https://modelscope.cn/models/Kwai-Kolors/Kolors)中，把文件 `vae/diffusion_pytorch_model.fp16.bin` 下载到本地的路径 `models/kolors/Kolors/vae` 中。
--- a/docs/source/tutorial/Extensions.md
+++ b/docs/source/tutorial/Extensions.md
@@ -0,0 +1,49 @@
+# 扩展功能
+
+本文档介绍了一些在 DiffSynth 实现的 Diffusion 模型之外的相关技术，这些模型在图像和视频处理方面具有显著的应用潜力。
+
+- **[RIFE](https://github.com/hzwer/ECCV2022-RIFE)**：RIFE 是一个基于实时中间流估计的帧插值方法。采用 IFNet 结构的模型，能够以很快的速度端到端估计中间流。RIFE 不依赖于预训练的光流模型，能够支持任意时间步的帧插值，通过时间编码输入进行处理。
+
+    在这段代码中，我们用 RIFE 模型把视频的帧数提升到原来的两倍。
+
+    ```python
+    from diffsynth import VideoData, ModelManager, save_video
+    from diffsynth.extensions.RIFE import RIFEInterpolater
+
+    model_manager = ModelManager(model_id_list=["RIFE"])
+    rife = RIFEInterpolater.from_model_manager(model_manager)
+    video = VideoData("input_video.mp4", height=512, width=768).raw_data()
+    video = rife.interpolate(video)
+    save_video(video, "output_video.mp4", fps=60)
+    ```
+
+- **[ESRGAN](https://github.com/xinntao/ESRGAN)**: ESRGAN 是一个图像超分辨率模型，能够实现四倍的分辨率提升。该方法通过优化网络架构、对抗损失和感知损失，显著提升了生成图像的真实感。
+
+    在这段代码中，我们用 ESRGAN 模型把图像分辨率提升到原来的四倍。
+
+    ```python
+    from PIL import Image
+    from diffsynth import ModelManager
+    from diffsynth.extensions.ESRGAN import ESRGAN
+
+    model_manager = ModelManager(model_id_list=["ESRGAN_x4"])
+    rife = ESRGAN.from_model_manager(model_manager)
+    image = Image.open("input_image.jpg")
+    image = rife.upscale(image)
+    image.save("output_image.jpg")
+    ```
+
+- **[FastBlend](https://arxiv.org/abs/2311.09265)**: FastBlend 不依赖模型的视频去闪烁算法，在使用图像生成模型逐帧处理过的视频（风格视频）中，通常会出现闪烁问题，FastBlend 则可以根据原视频（引导视频）中的运动特征，消除风格视频中的闪烁。
+
+    在这段代码中，我们用 FastBlend 把风格视频中的闪烁效果删除。
+
+    ```python
+    from diffsynth import VideoData, save_video
+    from diffsynth.extensions.FastBlend import FastBlendSmoother
+
+    fastblend = FastBlendSmoother()
+    guide_video = VideoData("guide_video.mp4", height=512, width=768).raw_data()
+    style_video = VideoData("style_video.mp4", height=512, width=768).raw_data()
+    output_video = fastblend(style_video, original_frames=guide_video)
+    save_video(output_video, "output_video.mp4", fps=30)
+    ```
--- a/docs/source/tutorial/Installation.md
+++ b/docs/source/tutorial/Installation.md
@@ -0,0 +1,26 @@
+# 安装
+
+目前，DiffSynth-Studio 支持从 GitHub 克隆安装或使用 pip 安装，我们建议用户从 GitHub 克隆安装，从而体验最新的功能。
+
+## 从源码下载
+
+1. 克隆源码仓库：
+
+    ```bash
+    git clone https://github.com/modelscope/DiffSynth-Studio.git
+    ```
+
+2. 进入项目目录并安装：
+
+    ```bash
+    cd DiffSynth-Studio
+    pip install -e .
+    ```
+
+## 使用 PyPI 下载
+
+直接通过 PyPI 安装（功能更新存在延后）：
+
+```bash
+pip install diffsynth
+```
--- a/docs/source/tutorial/Models.md
+++ b/docs/source/tutorial/Models.md
@@ -0,0 +1,18 @@
+# 模型
+
+目前为止，DiffSynth Studio 支持的模型如下所示：
+
+* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)
+* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
+* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
+* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
+* [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
+* [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)
+* [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)
+* [RIFE](https://github.com/hzwer/ECCV2022-RIFE)
+* [ESRGAN](https://github.com/xinntao/ESRGAN)
+* [Ip-Adapter](https://github.com/tencent-ailab/IP-Adapter)
+* [AnimateDiff](https://github.com/guoyww/animatediff/)
+* [ControlNet](https://github.com/lllyasviel/ControlNet)
+* [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
+* [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)
--- a/docs/source/tutorial/Pipelines.md
+++ b/docs/source/tutorial/Pipelines.md
@@ -0,0 +1,22 @@
+# 流水线
+
+DiffSynth-Studio 中包括多个流水线，分为图像生成和视频生成两类。
+
+## 图像生成流水线
+
+| Pipeline                   | Models                                                     |
+|----------------------------|----------------------------------------------------------------|
+| SDImagePipeline             | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter |
+| SDXLImagePipeline           | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter |
+| SD3ImagePipeline            | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
+| HunyuanDiTImagePipeline     | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
+| FluxImagePipeline     | text_encoder_1: FluxTextEncoder1<br>text_encoder_2: FluxTextEncoder2<br>dit: FluxDiT<br>vae_decoder: FluxVAEDecoder<br>vae_encoder: FluxVAEEncoder |
+
+## 视频生成流水线
+
+| Pipeline                   | Models                                                     |
+|----------------------------|----------------------------------------------------------------|
+| SDVideoPipeline            | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter<br>motion_modules: SDMotionModel |
+| SDXLVideoPipeline          | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter<br>motion_modules: SDXLMotionModel |
+| SVDVideoPipeline           | image_encoder: SVDImageEncoder<br>unet: SVDUNet<br>vae_encoder: SVDVAEEncoder<br>vae_decoder: SVDVAEDecoder |
+| CogVideoPipeline           | text_encoder: FluxTextEncoder2<br>dit: CogDiT<br>vae_encoder: CogVAEEncoder<br>vae_decoder: CogVAEDecoder |
--- a/docs/source/tutorial/PromptProcessing.md
+++ b/docs/source/tutorial/PromptProcessing.md
@@ -0,0 +1,37 @@
+# 提示词处理
+
+DiffSynth 内置了提示词处理功能，分为：
+
+- **提示词润色器（`prompt_refiner_classes`）**：包括提示词润色、提示词中译英、提示词同时润色与中译英，可选参数如下：
+
+    - **英文提示词润色**：'BeautifulPrompt'，使用到的是[pai-bloom-1b1-text2prompt-sd](https://modelscope.cn/models/AI-ModelScope/pai-bloom-1b1-text2prompt-sd)。
+
+    - **提示词中译英**：'Translator'，使用到的是[opus-mt-zh-e](https://modelscope.cn/models/moxying/opus-mt-zh-en)。
+
+    - **提示词中译英并润色**：'QwenPrompt'，使用到的是[Qwen2-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct)。
+
+- **提示词扩展器（`prompt_extender_classes`）**：基于Omost的提示词分区控制扩写，可选参数如下：
+
+    - **提示词分区扩写**：'OmostPromter'。
+
+
+## 使用说明
+
+### 提示词润色器
+
+在加载模型 Pipeline 时，可以通过参数 `prompt_refiner_classes` 指定所需的提示词润色器功能。有关示例代码，请参考 [sd_prompt_refining.py](examples/image_synthesis/sd_prompt_refining.py)。
+
+可选的 `prompt_refiner_classes` 参数包括：Translator、BeautifulPrompt、QwenPrompt。
+
+```python
+pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[Translator, BeautifulPrompt])
+```
+
+### 提示词扩展器
+
+在加载模型 Pipeline 时，可以通过参数 `prompt_extender_classes` 指定所需的提示词扩展器。有关示例代码，请参考 [omost_flux_text_to_image.py](examples/image_synthesis/omost_flux_text_to_image.py)。
+
+```python
+pipe = FluxImagePipeline.from_model_manager(model_manager, prompt_extender_classes=[OmostPromter])
+```
+
--- a/docs/source/tutorial/Schedulers.md
+++ b/docs/source/tutorial/Schedulers.md
@@ -0,0 +1,11 @@
+# 调度器
+
+调度器（Scheduler）控制模型的整个去噪（或采样）过程。在加载 Pipeline 时，DiffSynth 会自动选择最适合当前 Pipeline 的调度器，**无需额外配置**。
+
+我们支持的调度器包括：
+
+- **EnhancedDDIMScheduler**：扩展了去噪扩散概率模型（DDPM）中的去噪过程，引入了非马尔可夫指导。
+
+- **FlowMatchScheduler**：实现了 [Stable Diffusion 3](https://arxiv.org/abs/2403.03206) 中提出的流量匹配采样方法。
+
+- **ContinuousODEScheduler**：基于常微分方程（ODE）的调度器。