update docs

2026-03-19 14:58:12 +00:00 · 2024-10-08 21:18:35 +08:00
parent 677ecbf1d2
commit 55f1a10255
7 changed files with 185 additions and 47 deletions
--- a/docs/source/tutorial/ASimpleExample.md
+++ b/docs/source/tutorial/ASimpleExample.md
@@ -12,26 +12,55 @@ cd DiffSynth-Studio
 pip install -e .
 ```

-## 下载模型
+## 一键运行！

-我们在 DiffSynth-Studio 中预置了一些主流 Diffusion 模型的下载链接，你可以直接使用 `download_models` 函数下载预置的模型文件。
+通过运行以下代码，我们将会下载模型、加载模型、生成图像。

 ```python
-from diffsynth import download_models
+import torch
+from diffsynth import ModelManager, FluxImagePipeline

-download_models(["FLUX.1-dev"])
+model_manager = ModelManager(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_id_list=["FLUX.1-dev"]
+)
+pipe = FluxImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="In a forest, a wooden plank sign reading DiffSynth",
+    height=576, width=1024,
+)
+image.save("image.jpg")
 ```

-我们支持从 [ModelScope](https://www.modelscope.cn/) 和 [HuggingFace](https://huggingface.co/) 下载模型，也支持下载非预置的模型，请参考[模型下载](./DownloadModels.md)。
+![image](https://github.com/user-attachments/assets/15a52a2b-2f18-46fe-810c-cb3ad2853919)

-## 加载模型
+从这个例子中，我们可以看到，DiffSynth 中有两个关键模块：`ModelManager` 和 `Pipeline`，接下来我们详细介绍。

-在 DiffSynth-Studio 中，模型由统一的 `ModelManager` 维护。以 FLUX.1-dev 模型为例，模型包括两个文本编码器、一个 DiT、一个 VAE，使用方式如下所示：
+## 下载和加载模型
+
+`ModelManager` 负责下载和加载模型，通过以下代码可以直接一步完成。

 ```python
 import torch
 from diffsynth import ModelManager

+model_manager = ModelManager(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_id_list=["FLUX.1-dev"]
+)
+```
+
+当然，我们也支持分步完成，以下代码和上述代码的行为是等价的。
+
+```python
+import torch
+from diffsynth import download_models, ModelManager
+
+download_models(["FLUX.1-dev"])
 model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
 model_manager.load_models([
    "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
@@ -41,7 +70,9 @@ model_manager.load_models([
 ])
 ```

-你可以把所有想要加载的模型路径放入其中。对于 `.safetensors` 等格式的模型权重文件，`ModelManager` 在加载后会自动判断模型类型；对于文件夹格式的模型，`ModelManager` 会尝试解析其中的 `config.json` 文件并尝试调用 `transformers` 等第三方库中的对应模块。关于 DiffSynth-Studio 支持的模型，请参考[支持的模型](./Models.md)。
+下载模型时，我们支持从 [ModelScope](https://www.modelscope.cn/) 和 [HuggingFace](https://huggingface.co/) 下载模型，也支持下载非预置的模型，关于模型下载的更多信息请参考[模型下载](./DownloadModels.md)。
+
+加载模型时，你可以把所有想要加载的模型路径放入其中。对于 `.safetensors` 等格式的模型权重文件，`ModelManager` 在加载后会自动判断模型类型；对于文件夹格式的模型，`ModelManager` 会尝试解析其中的 `config.json` 文件并尝试调用 `transformers` 等第三方库中的对应模块。关于 DiffSynth-Studio 支持的模型，请参考[支持的模型](./Models.md)。

 ## 构建 Pipeline

@@ -52,30 +83,3 @@ pipe = FluxImagePipeline.from_model_manager(model_manager)
 ```

 更多用于图像生成和视频生成的 `Pipeline` 详见[推理流水线](./Pipelines.md)。
-
-## 生成！
-
-写好你的提示词，交给 DiffSynth-Studio，启动生成任务吧！
-
-```python
-import torch
-from diffsynth import ModelManager, FluxImagePipeline
-
-model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
-model_manager.load_models([
-    "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
-    "models/FLUX/FLUX.1-dev/text_encoder_2",
-    "models/FLUX/FLUX.1-dev/ae.safetensors",
-    "models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
-])
-pipe = FluxImagePipeline.from_model_manager(model_manager)
-
-torch.manual_seed(0)
-image = pipe(
-    prompt="In a forest, a wooden plank sign reading DiffSynth",
-    height=576, width=1024
-)
-image.save("image.jpg")
-```
-
-![image](https://github.com/user-attachments/assets/15a52a2b-2f18-46fe-810c-cb3ad2853919)
--- a/docs/source/tutorial/DownloadModels.md
+++ b/docs/source/tutorial/DownloadModels.md
@@ -1,6 +1,6 @@
 # 下载模型

-我们在 DiffSynth-Studio 中预置了一些主流 Diffusion 模型的下载链接，你可以轻松地下载并使用这些模型。
+我们在 DiffSynth-Studio 中预置了一些主流 Diffusion 模型的下载链接，你可以下载并使用这些模型。

 ## 下载预置模型

@@ -21,10 +21,14 @@ download_models(["FLUX.1-dev"])
 你可以选择 [ModelScope](https://modelscope.cn/models) 和 [HuggingFace](https://huggingface.co/models) 两个下载源中的模型。当然，你也可以通过浏览器等工具选择手动下载自己所需的模型。

 ```python
-from diffsynth.models.downloader import download_from_huggingface, download_from_modelscope
+from diffsynth import download_customized_models

-# From Modelscope (recommended)
-download_from_modelscope("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.bin", "models/kolors/Kolors/vae")
-# From Huggingface
-download_from_huggingface("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.safetensors", "models/kolors/Kolors/vae")
+download_customized_models(
+    model_id="Kwai-Kolors/Kolors",
+    origin_file_path="vae/diffusion_pytorch_model.fp16.bin",
+    local_dir="models/kolors/Kolors/vae",
+    downloading_priority=["ModelScope", "HuggingFace"]
+)
 ```
+
+在这段代码中，我们将会按照下载的优先级，优先从 `ModelScope` 下载，在 ID 为 `Kwai-Kolors/Kolors` 的[模型库](https://modelscope.cn/models/Kwai-Kolors/Kolors)中，把文件 `vae/diffusion_pytorch_model.fp16.bin` 下载到本地的路径 `models/kolors/Kolors/vae` 中。
--- a/docs/source/tutorial/Extensions.md
+++ b/docs/source/tutorial/Extensions.md
@@ -2,10 +2,48 @@

 本文档介绍了一些在 DiffSynth 实现的 Diffusion 模型之外的相关技术，这些模型在图像和视频处理方面具有显著的应用潜力。

- **[RIFE](https://github.com/hzwer/ECCV2022-RIFE)**：FIRE（实时中间流估计算法）是一个基于实时中间流估计的帧插值（VFI）方法。FIRE采用了一种名为IFNet的神经网络，能够以更快的速度端到端估计中间流。为确保IFNet的稳定训练并提升整体性能，设计了一种特权蒸馏方案。FIRE不依赖于预训练的光流模型，能够支持任意时间步的帧插值，通过时间编码输入进行处理。
+- **[RIFE](https://github.com/hzwer/ECCV2022-RIFE)**：RIFE 是一个基于实时中间流估计的帧插值方法。采用 IFNet 结构的模型，能够以很快的速度端到端估计中间流。RIFE 不依赖于预训练的光流模型，能够支持任意时间步的帧插值，通过时间编码输入进行处理。

- **[ESRGAN](https://github.com/xinntao/ESRGAN)**: ESRGAN（增强型超分辨率生成对抗网络）是对 SRGAN 的一种改进方法，旨在提升单幅图像超分辨率的视觉质量。该方法通过优化SRGAN的三个关键组件——网络架构、对抗损失和感知损失，显著提升了生成图像的真实感。
+    在这段代码中，我们用 RIFE 模型把视频的帧数提升到原来的两倍。

- **[FastBlend](https://arxiv.org/abs/2311.09265)**: FastBlend是一个用来平滑视频的无模型工具包，与 Diffusion 模型结合打造了强大的视频处理流程。该工具能够有效消除视频中的闪烁现象，对关键帧序列插值，并且可以基于单一图像处理完整视频。
+    ```python
+    from diffsynth import VideoData, ModelManager, save_video
+    from diffsynth.extensions.RIFE import RIFEInterpolater

+    model_manager = ModelManager(model_id_list=["RIFE"])
+    rife = RIFEInterpolater.from_model_manager(model_manager)
+    video = VideoData("input_video.mp4", height=512, width=768).raw_data()
+    video = rife.interpolate(video)
+    save_video(video, "output_video.mp4", fps=60)
+    ```

+- **[ESRGAN](https://github.com/xinntao/ESRGAN)**: ESRGAN 是一个图像超分辨率模型，能够实现四倍的分辨率提升。该方法通过优化网络架构、对抗损失和感知损失，显著提升了生成图像的真实感。
+
+    在这段代码中，我们用 ESRGAN 模型把图像分辨率提升到原来的四倍。
+
+    ```python
+    from PIL import Image
+    from diffsynth import ModelManager
+    from diffsynth.extensions.ESRGAN import ESRGAN
+
+    model_manager = ModelManager(model_id_list=["ESRGAN_x4"])
+    rife = ESRGAN.from_model_manager(model_manager)
+    image = Image.open("input_image.jpg")
+    image = rife.upscale(image)
+    image.save("output_image.jpg")
+    ```
+
+- **[FastBlend](https://arxiv.org/abs/2311.09265)**: FastBlend 不依赖模型的视频去闪烁算法，在使用图像生成模型逐帧处理过的视频（风格视频）中，通常会出现闪烁问题，FastBlend 则可以根据原视频（引导视频）中的运动特征，消除风格视频中的闪烁。
+
+    在这段代码中，我们用 FastBlend 把风格视频中的闪烁效果删除。
+
+    ```python
+    from diffsynth import VideoData, save_video
+    from diffsynth.extensions.FastBlend import FastBlendSmoother
+
+    fastblend = FastBlendSmoother()
+    guide_video = VideoData("guide_video.mp4", height=512, width=768).raw_data()
+    style_video = VideoData("style_video.mp4", height=512, width=768).raw_data()
+    output_video = fastblend(style_video, original_frames=guide_video)
+    save_video(output_video, "output_video.mp4", fps=30)
+    ```
--- a/docs/source/tutorial/Installation.md
+++ b/docs/source/tutorial/Installation.md
@@ -19,7 +19,7 @@

 ## 使用 PyPI 下载

-直接通过 PyPI 安装：
+直接通过 PyPI 安装（功能更新存在延后）：

 ```bash
 pip install diffsynth
--- a/docs/source/tutorial/Models.md
+++ b/docs/source/tutorial/Models.md
@@ -2,7 +2,7 @@

 目前为止，DiffSynth Studio 支持的模型如下所示：

-* [CogVideo](https://huggingface.co/THUDM/CogVideoX-5b)
+* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)
 * [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
 * [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
 * [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
--- a/docs/source/tutorial/Schedulers.md
+++ b/docs/source/tutorial/Schedulers.md
@@ -1,6 +1,6 @@
 # 调度器

-调度器（Scheduler）控制模型的整个去噪（或采样）过程。在加载 Pipeline 时，DiffSynth 会自动选择最适合当前 Pipeline 的调度器，无需额外配置。
+调度器（Scheduler）控制模型的整个去噪（或采样）过程。在加载 Pipeline 时，DiffSynth 会自动选择最适合当前 Pipeline 的调度器，``无需额外配置``。

 我们支持的调度器包括：