update doc

2026-04-08 08:58:20 +00:00 · 2025-12-03 18:36:31 +08:00
parent d5a0aab2b2
commit 5c37fdcd8f
26 changed files with 150 additions and 114 deletions
--- a/docs/zh/API_Reference/core/attention.md
+++ b/docs/zh/API_Reference/core/attention.md
@@ -1,6 +1,6 @@
 # `diffsynth.core.attention`: 注意力机制实现

-`diffsynth.core.attention` 提供了注意力机制实现的路由机制，根据 `Python` 环境中的可用包和[环境变量](../Environment_Variables.md#diffsynth_attention_implementation)自动选择高效的注意力机制实现。
+`diffsynth.core.attention` 提供了注意力机制实现的路由机制，根据 `Python` 环境中的可用包和[环境变量](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation)自动选择高效的注意力机制实现。

 ## 注意力机制

@@ -46,7 +46,7 @@ output_1 = attention(query, key, value)
 * xFormers：[GitHub](https://github.com/facebookresearch/xformers)、[文档](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops)
 * PyTorch：[GitHub](https://github.com/pytorch/pytorch)、[文档](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html)

-如需调用除 `PyTorch` 外的其他注意力实现，请按照其 GitHub 页面的指引安装对应的包。`DiffSynth-Studio` 会自动根据 Python 环境中的可用包路由到对应的实现上，也可通过[环境变量](/docs/Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation)控制。
+如需调用除 `PyTorch` 外的其他注意力实现，请按照其 GitHub 页面的指引安装对应的包。`DiffSynth-Studio` 会自动根据 Python 环境中的可用包路由到对应的实现上，也可通过[环境变量](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_attention_implementation)控制。

 ```python
 from diffsynth.core.attention import attention_forward
--- a/docs/zh/API_Reference/core/loader.md
+++ b/docs/zh/API_Reference/core/loader.md
@@ -8,9 +8,9 @@

 ### 从远程下载并加载模型

-以模型[DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny) 为例，在 `ModelConfig` 中填写 `model_id` 和 `origin_file_pattern` 后即可自动下载模型。默认下载到 `./models` 路径，该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](/docs/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。
+以模型[DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny) 为例，在 `ModelConfig` 中填写 `model_id` 和 `origin_file_pattern` 后即可自动下载模型。默认下载到 `./models` 路径，该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。

-默认情况下，即使模型已经下载完毕，程序仍会向远程查询是否有遗漏文件，如果要完全关闭远程请求，请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](/docs/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True`。
+默认情况下，即使模型已经下载完毕，程序仍会向远程查询是否有遗漏文件，如果要完全关闭远程请求，请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True`。

 ```python
 from diffsynth.core import ModelConfig
@@ -51,7 +51,7 @@ config = ModelConfig(path=[

 ### 显存管理配置

-`ModelConfig` 也包含了显存管理配置信息，详见[显存管理](/docs/Pipeline_Usage/VRAM_management.md#更多使用方式)。
+`ModelConfig` 也包含了显存管理配置信息，详见[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md#更多使用方式)。

 ## 模型文件加载

@@ -103,11 +103,11 @@ print(hash_model_file([

 模型哈希值只与模型文件中 state dict 的 keys 和 tensor shape 有关，与模型参数的数值、文件保存时间等信息无关。在计算 `.safetensors` 格式文件的模型哈希值时，`hash_model_file` 是几乎瞬间完成的，无需读取模型的参数；但在计算 `.bin`、`.pth`、`.ckpt` 等二进制文件的模型哈希值时，则需要读取全部模型参数，因此**我们不建议开发者继续使用这些格式的文件。**

-通过[编写模型 Config](/docs/Developer_Guide/Integrating_Your_Model.md#step-3-编写模型-config)并将模型哈希值等信息填入 `diffsynth/configs/model_configs.py`，开发者可以让 `DiffSynth-Studio` 自动识别模型类型并加载。
+通过[编写模型 Config](/docs/zh/Developer_Guide/Integrating_Your_Model.md#step-3-编写模型-config)并将模型哈希值等信息填入 `diffsynth/configs/model_configs.py`，开发者可以让 `DiffSynth-Studio` 自动识别模型类型并加载。

 ## 模型加载

-`load_model` 是 `diffsynth.core.loader` 中加载模型的外部入口，它会调用 [skip_model_initialization](./vram.md#跳过模型参数初始化) 跳过模型参数初始化。如果启用了 [Disk Offload](/docs/Pipeline_Usage/VRAM_management.md#disk-offload)，则调用 [DiskMap](./vram.md#state-dict-硬盘映射) 进行惰性加载；如果没有启用 Disk Offload，则调用 [load_state_dict](#模型文件加载) 加载模型参数。如果需要的话，还会调用 [state dict converter](/docs/Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换) 进行模型格式转换。最后调用 `model.eval()` 将其切换到推理模式。
+`load_model` 是 `diffsynth.core.loader` 中加载模型的外部入口，它会调用 [skip_model_initialization](/docs/zh/API_Reference/core/vram.md#跳过模型参数初始化) 跳过模型参数初始化。如果启用了 [Disk Offload](/docs/zh/Pipeline_Usage/VRAM_management.md#disk-offload)，则调用 [DiskMap](/docs/zh/API_Reference/core/vram.md#state-dict-硬盘映射) 进行惰性加载；如果没有启用 Disk Offload，则调用 [load_state_dict](#模型文件加载) 加载模型参数。如果需要的话，还会调用 [state dict converter](/docs/zh/Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换) 进行模型格式转换。最后调用 `model.eval()` 将其切换到推理模式。

 以下是一个启用了 Disk Offload 的使用案例：

--- a/docs/zh/API_Reference/core/vram.md
+++ b/docs/zh/API_Reference/core/vram.md
@@ -31,7 +31,7 @@ state_dict = load_state_dict(path, device="cpu")
 model.load_state_dict(state_dict, assign=True)
 ```

-在 `DiffSynth-Studio` 中，所有预训练模型都遵循这一加载逻辑。开发者在[接入模型](/docs/Developer_Guide/Integrating_Your_Model.md)完毕后即可直接以这种方式快速加载模型。
+在 `DiffSynth-Studio` 中，所有预训练模型都遵循这一加载逻辑。开发者在[接入模型](/docs/zh/Developer_Guide/Integrating_Your_Model.md)完毕后即可直接以这种方式快速加载模型。

 ## State Dict 硬盘映射

@@ -57,10 +57,10 @@ state_dict = DiskMap(path, device="cpu") # Fast
 print(state_dict["img_in.weight"])
 ```

-`DiskMap` 是 `DiffSynth-Studio` 中 Disk Offload 的基本组件，开发者在[配置细粒度显存管理方案](/docs/Developer_Guide/Enabling_VRAM_management.md)后即可直接启用 Disk Offload。
+`DiskMap` 是 `DiffSynth-Studio` 中 Disk Offload 的基本组件，开发者在[配置细粒度显存管理方案](/docs/zh/Developer_Guide/Enabling_VRAM_management.md)后即可直接启用 Disk Offload。

 `DiskMap` 是利用 `.safetensors` 文件的特性实现的功能，因此在使用 `.bin`、`.pth`、`.ckpt` 等二进制文件时，模型的参数是全量加载的，这也导致 Disk Offload 不支持这些格式的文件。**我们不建议开发者继续使用这些格式的文件。**

 ## 显存管理可替换模块

-在启用 `DiffSynth-Studio` 的显存管理后，模型内部的模块会被替换为 `diffsynth.core.vram.layers` 中的可替换模块，其使用方式详见[细粒度显存管理方案](/docs/Developer_Guide/Enabling_VRAM_management.md#编写细粒度显存管理方案)。
+在启用 `DiffSynth-Studio` 的显存管理后，模型内部的模块会被替换为 `diffsynth.core.vram.layers` 中的可替换模块，其使用方式详见[细粒度显存管理方案](/docs/zh/Developer_Guide/Enabling_VRAM_management.md#编写细粒度显存管理方案)。
--- a/docs/zh/Developer_Guide/Building_a_Pipeline.md
+++ b/docs/zh/Developer_Guide/Building_a_Pipeline.md
@@ -1,6 +1,6 @@
 # 接入 Pipeline

-在[将 Pipeline 所需的模型接入](./Integrating_Your_Model.md)之后，还需构建 `Pipeline` 用于模型推理，本文档提供 `Pipeline` 构建的标准化流程，开发者也可参考现有的 `Pipeline` 进行构建。
+在[将 Pipeline 所需的模型接入](/docs/zh/Developer_Guide/Integrating_Your_Model.md)之后，还需构建 `Pipeline` 用于模型推理，本文档提供 `Pipeline` 构建的标准化流程，开发者也可参考现有的 `Pipeline` 进行构建。

 `Pipeline` 的实现位于 `diffsynth/pipelines`，每个 `Pipeline` 包含以下必要的关键组件：

@@ -79,7 +79,7 @@ class NewDiffSynthPipeline(BasePipeline):
        return pipe
 ```

-开发者需要实现其中获取模型的逻辑，对应的模型名称即为[模型接入时填写的模型 Config](Integrating_Your_Model.md#step-3-编写模型-config) 中的 `"model_name"`。
+开发者需要实现其中获取模型的逻辑，对应的模型名称即为[模型接入时填写的模型 Config](/docs/zh/Developer_Guide/Integrating_Your_Model.md#step-3-编写模型-config) 中的 `"model_name"`。

 部分模型还需要加载 `tokenizer`，可根据需要在 `from_pretrained` 上添加额外的 `tokenizer_config` 参数并在获取模型后实现这部分。

--- a/docs/zh/Developer_Guide/Enabling_VRAM_management.md
+++ b/docs/zh/Developer_Guide/Enabling_VRAM_management.md
@@ -1,6 +1,6 @@
 # 细粒度显存管理方案

-本文档介绍如何为模型编写合理的细粒度显存管理方案，以及如何将 `DiffSynth-Studio` 中的显存管理功能用于外部的其他代码库，在阅读本文档前，请先阅读文档[显存管理](../Pipeline_Usage/VRAM_management.md)。
+本文档介绍如何为模型编写合理的细粒度显存管理方案，以及如何将 `DiffSynth-Studio` 中的显存管理功能用于外部的其他代码库，在阅读本文档前，请先阅读文档[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)。

 ## 20B 模型需要多少显存？

@@ -124,7 +124,7 @@ module_map={
 }
 ```

-此外，还需要提供 `vram_config` 与 `vram_limit`，这两个参数在[显存管理](../Pipeline_Usage/VRAM_management.md#更多使用方式)中已有介绍。
+此外，还需要提供 `vram_config` 与 `vram_limit`，这两个参数在[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md#更多使用方式)中已有介绍。

 调用 `enable_vram_management` 即可启用显存管理，注意此时模型加载时的 `device` 为 `cpu`，与 `offload_device` 一致：

@@ -171,7 +171,7 @@ with torch.no_grad():

 ## Disk Offload

-[Disk Offload](../Pipeline_Usage/VRAM_management.md#disk-offload) 是特殊的显存管理方案，需在模型加载过程中启用，而非模型加载完毕后。通常，在以上代码能够顺利运行的前提下，Disk Offload 可以直接启用：
+[Disk Offload](/docs/zh/Pipeline_Usage/VRAM_management.md#disk-offload) 是特殊的显存管理方案，需在模型加载过程中启用，而非模型加载完毕后。通常，在以上代码能够顺利运行的前提下，Disk Offload 可以直接启用：

 ```python
 from diffsynth.core import load_model, enable_vram_management, AutoWrappedLinear, AutoWrappedModule
@@ -212,7 +212,7 @@ with torch.no_grad():
    output = model(**inputs)
 ```

-Disk Offload 是极为特殊的显存管理方案，只支持 `.safetensors` 格式文件，不支持 `.bin`、`.pth`、`.ckpt` 等二进制文件，不支持带 Tensor reshape 的 [state dict converter](../Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换)。
+Disk Offload 是极为特殊的显存管理方案，只支持 `.safetensors` 格式文件，不支持 `.bin`、`.pth`、`.ckpt` 等二进制文件，不支持带 Tensor reshape 的 [state dict converter](/docs/zh/Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换)。

 如果出现非 Disk Offload 能正常运行但 Disk Offload 不能正常运行的情况，请在 GitHub 上给我们提 issue。

--- a/docs/zh/Developer_Guide/Integrating_Your_Model.md
+++ b/docs/zh/Developer_Guide/Integrating_Your_Model.md
@@ -183,4 +183,4 @@ Loaded model: {

 ## Step 5: 编写模型显存管理方案

-`DiffSynth-Studio` 支持复杂的显存管理，详见[启用显存管理](./Enabling_VRAM_management.md)。
+`DiffSynth-Studio` 支持复杂的显存管理，详见[启用显存管理](/docs/zh/Developer_Guide/Enabling_VRAM_management.md)。
--- a/docs/zh/Developer_Guide/Training_Diffusion_Models.md
+++ b/docs/zh/Developer_Guide/Training_Diffusion_Models.md
@@ -1,6 +1,6 @@
 # 接入模型训练

-在[接入模型](./Integrating_Your_Model.md)并[实现 Pipeline](./Building_a_Pipeline.md)后，接下来接入模型训练功能。
+在[接入模型](/docs/zh/Developer_Guide/Integrating_Your_Model.md)并[实现 Pipeline](/docs/zh/Developer_Guide/Building_a_Pipeline.md)后，接下来接入模型训练功能。

 ## 训推一致的 Pipeline 改造

--- a/docs/zh/Model_Details/FLUX.md
+++ b/docs/zh/Model_Details/FLUX.md
@@ -14,7 +14,7 @@ cd DiffSynth-Studio
 pip install -e .
 ```

-更多关于安装的信息，请参考[安装依赖](/docs/Pipeline_Usage/Setup.md)。
+更多关于安装的信息，请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。

 ## 快速开始

@@ -107,14 +107,14 @@ graph LR;

 特殊训练脚本：

-* 差分 LoRA 训练：[doc](/docs/Training/Differential_LoRA.md)、[code](/examples/flux/model_training/special/differential_training/)
-* FP8 精度训练：[doc](/docs/Training/FP8_Precision.md)、[code](/examples/flux/model_training/special/fp8_training/)
-* 两阶段拆分训练：[doc](/docs/Training/Split_Training.md)、[code](/examples/flux/model_training/special/split_training/)
-* 端到端直接蒸馏：[doc](/docs/Training/Direct_Distill.md)、[code](/examples/flux/model_training/lora/FLUX.1-dev-Distill-LoRA.sh)
+* 差分 LoRA 训练：[doc](/docs/zh/Training/Differential_LoRA.md)、[code](/examples/flux/model_training/special/differential_training/)
+* FP8 精度训练：[doc](/docs/zh/Training/FP8_Precision.md)、[code](/examples/flux/model_training/special/fp8_training/)
+* 两阶段拆分训练：[doc](/docs/zh/Training/Split_Training.md)、[code](/examples/flux/model_training/special/split_training/)
+* 端到端直接蒸馏：[doc](/docs/zh/Training/Direct_Distill.md)、[code](/examples/flux/model_training/lora/FLUX.1-dev-Distill-LoRA.sh)

 ## 模型推理

-模型通过 `FluxImagePipeline.from_pretrained` 加载，详见[加载模型](/docs/Pipeline_Usage/Model_Inference.md#加载模型)。
+模型通过 `FluxImagePipeline.from_pretrained` 加载，详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。

 `FluxImagePipeline` 推理的输入参数包括：

@@ -152,7 +152,7 @@ graph LR;
 * `flex_control_stop`: Flex 模型的控制停止时间步。
 * `nexus_gen_reference_image`: Nexus-Gen 模型的参考图像。

-如果显存不足，请开启[显存管理](/docs/Pipeline_Usage/VRAM_management.md)，我们在示例代码中提供了每个模型推荐的低显存配置，详见前文"模型总览"中的表格。
+如果显存不足，请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)，我们在示例代码中提供了每个模型推荐的低显存配置，详见前文"模型总览"中的表格。

 ## 模型训练

@@ -207,4 +207,4 @@ FLUX 系列模型统一通过 [`examples/flux/model_training/train.py`](/example
 modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
 ```

-我们为每个模型编写了推荐的训练脚本，请参考前文"模型总览"中的表格。关于如何编写模型训练脚本，请参考[模型训练](/docs/Pipeline_Usage/Model_Training.md)；更多高阶训练算法，请参考[训练框架详解](/docs/Training/)。
+我们为每个模型编写了推荐的训练脚本，请参考前文"模型总览"中的表格。关于如何编写模型训练脚本，请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md)；更多高阶训练算法，请参考[训练框架详解](/docs/Training/)。
--- a/docs/zh/Model_Details/FLUX2.md
+++ b/docs/zh/Model_Details/FLUX2.md
@@ -12,7 +12,7 @@ cd DiffSynth-Studio
 pip install -e .
 ```

-更多关于安装的信息，请参考[安装依赖](/docs/Pipeline_Usage/Setup.md)。
+更多关于安装的信息，请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。

 ## 快速开始

@@ -56,14 +56,14 @@ image.save("image_FLUX.2-dev.jpg")

 特殊训练脚本：

-* 差分 LoRA 训练：[doc](/docs/Training/Differential_LoRA.md)、[code](/examples/flux/model_training/special/differential_training/)
-* FP8 精度训练：[doc](/docs/Training/FP8_Precision.md)、[code](/examples/flux/model_training/special/fp8_training/)
-* 两阶段拆分训练：[doc](/docs/Training/Split_Training.md)、[code](/examples/flux/model_training/special/split_training/)
-* 端到端直接蒸馏：[doc](/docs/Training/Direct_Distill.md)、[code](/examples/flux/model_training/lora/FLUX.1-dev-Distill-LoRA.sh)
+* 差分 LoRA 训练：[doc](/docs/zh/Training/Differential_LoRA.md)、[code](/examples/flux/model_training/special/differential_training/)
+* FP8 精度训练：[doc](/docs/zh/Training/FP8_Precision.md)、[code](/examples/flux/model_training/special/fp8_training/)
+* 两阶段拆分训练：[doc](/docs/zh/Training/Split_Training.md)、[code](/examples/flux/model_training/special/split_training/)
+* 端到端直接蒸馏：[doc](/docs/zh/Training/Direct_Distill.md)、[code](/examples/flux/model_training/lora/FLUX.1-dev-Distill-LoRA.sh)

 ## 模型推理

-模型通过 `Flux2ImagePipeline.from_pretrained` 加载，详见[加载模型](/docs/Pipeline_Usage/Model_Inference.md#加载模型)。
+模型通过 `Flux2ImagePipeline.from_pretrained` 加载，详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。

 `Flux2ImagePipeline` 推理的输入参数包括：

@@ -82,7 +82,7 @@ image.save("image_FLUX.2-dev.jpg")
 * `tile_stride`: VAE 编解码阶段的分块步长，默认为 64，仅在 `tiled=True` 时生效，需保证其数值小于或等于 `tile_size`。
 * `progress_bar_cmd`: 进度条，默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。

-如果显存不足，请开启[显存管理](/docs/Pipeline_Usage/VRAM_management.md)，我们在示例代码中提供了每个模型推荐的低显存配置，详见前文"模型总览"中的表格。
+如果显存不足，请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)，我们在示例代码中提供了每个模型推荐的低显存配置，详见前文"模型总览"中的表格。

 ## 模型训练

@@ -135,4 +135,4 @@ FLUX.2 系列模型统一通过 [`examples/flux2/model_training/train.py`](/exam
 modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
 ```

-我们为每个模型编写了推荐的训练脚本，请参考前文"模型总览"中的表格。关于如何编写模型训练脚本，请参考[模型训练](/docs/Pipeline_Usage/Model_Training.md)；更多高阶训练算法，请参考[训练框架详解](/docs/Training/)。
+我们为每个模型编写了推荐的训练脚本，请参考前文"模型总览"中的表格。关于如何编写模型训练脚本，请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md)；更多高阶训练算法，请参考[训练框架详解](/docs/Training/)。
--- a/docs/zh/Model_Details/Overview.md
+++ b/docs/zh/Model_Details/Overview.md
@@ -2,7 +2,7 @@

 ## Qwen-Image

-文档：[./Qwen-Image.md](./Qwen-Image.md)
+文档：[./Qwen-Image.md](/docs/zh/Model_Details/Qwen-Image.md)

 <details>

@@ -85,7 +85,7 @@ graph LR;

 ## FLUX 系列

-文档：[./FLUX.md](./FLUX.md)
+文档：[./FLUX.md](/docs/zh/Model_Details/FLUX.md)

 <details>

@@ -166,7 +166,7 @@ graph LR;

 ## Wan 系列

-文档：[./Wan.md](./Wan.md)
+文档：[./Wan.md](/docs/zh/Model_Details/Wan.md)

 <details>

--- a/docs/zh/Model_Details/Qwen-Image.md
+++ b/docs/zh/Model_Details/Qwen-Image.md
@@ -14,7 +14,7 @@ cd DiffSynth-Studio
 pip install -e .
 ```

-更多关于安装的信息，请参考[安装依赖](/docs/Pipeline_Usage/Setup.md)。
+更多关于安装的信息，请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。

 ## 快速开始

@@ -96,14 +96,14 @@ graph LR;

 特殊训练脚本：

-* 差分 LoRA 训练：[doc](/docs/Training/Differential_LoRA.md)、[code](/examples/qwen_image/model_training/special/differential_training/)
-* FP8 精度训练：[doc](/docs/Training/FP8_Precision.md)、[code](/examples/qwen_image/model_training/special/fp8_training/)
-* 两阶段拆分训练：[doc](/docs/Training/Split_Training.md)、[code](/examples/qwen_image/model_training/special/split_training/)
-* 端到端直接蒸馏：[doc](/docs/Training/Direct_Distill.md)、[code](/examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh)
+* 差分 LoRA 训练：[doc](/docs/zh/Training/Differential_LoRA.md)、[code](/examples/qwen_image/model_training/special/differential_training/)
+* FP8 精度训练：[doc](/docs/zh/Training/FP8_Precision.md)、[code](/examples/qwen_image/model_training/special/fp8_training/)
+* 两阶段拆分训练：[doc](/docs/zh/Training/Split_Training.md)、[code](/examples/qwen_image/model_training/special/split_training/)
+* 端到端直接蒸馏：[doc](/docs/zh/Training/Direct_Distill.md)、[code](/examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh)

 ## 模型推理

-模型通过 `QwenImagePipeline.from_pretrained` 加载，详见[加载模型](/docs/Pipeline_Usage/Model_Inference.md#加载模型)。
+模型通过 `QwenImagePipeline.from_pretrained` 加载，详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。

 `QwenImagePipeline` 推理的输入参数包括：

@@ -134,7 +134,7 @@ graph LR;
 * `tile_stride`: VAE 编解码阶段的分块步长，默认为 64，仅在 `tiled=True` 时生效，需保证其数值小于或等于 `tile_size`。
 * `progress_bar_cmd`: 进度条，默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。

-如果显存不足，请开启[显存管理](/docs/Pipeline_Usage/VRAM_management.md)，我们在示例代码中提供了每个模型推荐的低显存配置，详见前文“模型总览”中的表格。
+如果显存不足，请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)，我们在示例代码中提供了每个模型推荐的低显存配置，详见前文“模型总览”中的表格。

 ## 模型训练

@@ -188,4 +188,4 @@ Qwen-Image 系列模型统一通过 [`examples/qwen_image/model_training/train.p
 modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
 ```

-我们为每个模型编写了推荐的训练脚本，请参考前文“模型总览”中的表格。关于如何编写模型训练脚本，请参考[模型训练](/docs/Pipeline_Usage/Model_Training.md)；更多高阶训练算法，请参考[训练框架详解](/docs/Training/)。
+我们为每个模型编写了推荐的训练脚本，请参考前文“模型总览”中的表格。关于如何编写模型训练脚本，请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md)；更多高阶训练算法，请参考[训练框架详解](/docs/Training/)。
--- a/docs/zh/Model_Details/Wan.md
+++ b/docs/zh/Model_Details/Wan.md
@@ -14,7 +14,7 @@ cd DiffSynth-Studio
 pip install -e .
 ```

-更多关于安装的信息，请参考[安装依赖](/docs/Pipeline_Usage/Setup.md)。
+更多关于安装的信息，请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。

 ## 快速开始

@@ -140,13 +140,13 @@ graph LR;
 |[PAI/Wan2.2-Fun-A14B-Control](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control)|`control_video`, `reference_image`|[code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control.py)|[code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control.sh)|[code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control.py)|[code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control.sh)|[code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control.py)|
 |[PAI/Wan2.2-Fun-A14B-Control-Camera](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control-Camera)|`control_camera_video`, `input_image`|[code](/examples/wanvideo/model_inference/Wan2.2-Fun-A14B-Control-Camera.py)|[code](/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control-Camera.sh)|[code](/examples/wanvideo/model_training/validate_full/Wan2.2-Fun-A14B-Control-Camera.py)|[code](/examples/wanvideo/model_training/lora/Wan2.2-Fun-A14B-Control-Camera.sh)|[code](/examples/wanvideo/model_training/validate_lora/Wan2.2-Fun-A14B-Control-Camera.py)|

-* FP8 精度训练：[doc](/docs/Training/FP8_Precision.md)、[code](/examples/wanvideo/model_training/special/fp8_training/)
-* 两阶段拆分训练：[doc](/docs/Training/Split_Training.md)、[code](/examples/wanvideo/model_training/special/split_training/)
-* 端到端直接蒸馏：[doc](/docs/Training/Direct_Distill.md)、[code](/examples/wanvideo/model_training/special/direct_distill/)
+* FP8 精度训练：[doc](/docs/zh/Training/FP8_Precision.md)、[code](/examples/wanvideo/model_training/special/fp8_training/)
+* 两阶段拆分训练：[doc](/docs/zh/Training/Split_Training.md)、[code](/examples/wanvideo/model_training/special/split_training/)
+* 端到端直接蒸馏：[doc](/docs/zh/Training/Direct_Distill.md)、[code](/examples/wanvideo/model_training/special/direct_distill/)

 ## 模型推理

-模型通过 `WanVideoPipeline.from_pretrained` 加载，详见[加载模型](/docs/Pipeline_Usage/Model_Inference.md#加载模型)。
+模型通过 `WanVideoPipeline.from_pretrained` 加载，详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。

 `WanVideoPipeline` 推理的输入参数包括：

@@ -196,7 +196,7 @@ graph LR;
 * `tea_cache_model_id`: TeaCache 使用的模型 ID。
 * `progress_bar_cmd`: 进度条，默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。

-如果显存不足，请开启[显存管理](/docs/Pipeline_Usage/VRAM_management.md)，我们在示例代码中提供了每个模型推荐的低显存配置，详见前文"模型总览"中的表格。
+如果显存不足，请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)，我们在示例代码中提供了每个模型推荐的低显存配置，详见前文"模型总览"中的表格。

 ## 模型训练

@@ -251,4 +251,4 @@ Wan 系列模型统一通过 [`examples/wanvideo/model_training/train.py`](/exam
 modelscope download --dataset DiffSynth-Studio/example_video_dataset --local_dir ./data/example_video_dataset
 ```

-我们为每个模型编写了推荐的训练脚本，请参考前文"模型总览"中的表格。关于如何编写模型训练脚本，请参考[模型训练](/docs/Pipeline_Usage/Model_Training.md)；更多高阶训练算法，请参考[训练框架详解](/docs/Training/)。
+我们为每个模型编写了推荐的训练脚本，请参考前文"模型总览"中的表格。关于如何编写模型训练脚本，请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md)；更多高阶训练算法，请参考[训练框架详解](/docs/Training/)。
--- a/docs/zh/Model_Details/Z-Image.md
+++ b/docs/zh/Model_Details/Z-Image.md
@@ -12,7 +12,7 @@ cd DiffSynth-Studio
 pip install -e .
 ```

-更多关于安装的信息，请参考[安装依赖](/docs/Pipeline_Usage/Setup.md)。
+更多关于安装的信息，请参考[安装依赖](/docs/zh/Pipeline_Usage/Setup.md)。

 ## 快速开始

@@ -46,12 +46,12 @@ image.save("image.jpg")

 特殊训练脚本：

-* 差分 LoRA 训练：[doc](/docs/Training/Differential_LoRA.md)、[code](/examples/z_image/model_training/special/differential_training/)
+* 差分 LoRA 训练：[doc](/docs/zh/Training/Differential_LoRA.md)、[code](/examples/z_image/model_training/special/differential_training/)
 * 轨迹模仿蒸馏训练（实验性功能）：[code](/examples/z_image/model_training/special/trajectory_imitation/)

 ## 模型推理

-模型通过 `ZImagePipeline.from_pretrained` 加载，详见[加载模型](/docs/Pipeline_Usage/Model_Inference.md#加载模型)。
+模型通过 `ZImagePipeline.from_pretrained` 加载，详见[加载模型](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)。

 `ZImagePipeline` 推理的输入参数包括：

@@ -66,7 +66,7 @@ image.save("image.jpg")
 * `rand_device`: 生成随机高斯噪声矩阵的计算设备，默认为 `"cpu"`。当设置为 `cuda` 时，在不同 GPU 上会导致不同的生成结果。
 * `num_inference_steps`: 推理次数，默认值为 8。

-如果显存不足，请开启[显存管理](/docs/Pipeline_Usage/VRAM_management.md)，我们在示例代码中提供了每个模型推荐的低显存配置，详见前文"模型总览"中的表格。
+如果显存不足，请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)，我们在示例代码中提供了每个模型推荐的低显存配置，详见前文"模型总览"中的表格。

 ## 模型训练

@@ -119,7 +119,7 @@ Z-Image 系列模型统一通过 [`examples/z_image/model_training/train.py`](/e
 modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
 ```

-我们为每个模型编写了推荐的训练脚本，请参考前文"模型总览"中的表格。关于如何编写模型训练脚本，请参考[模型训练](/docs/Pipeline_Usage/Model_Training.md)；更多高阶训练算法，请参考[训练框架详解](/docs/Training/)。
+我们为每个模型编写了推荐的训练脚本，请参考前文"模型总览"中的表格。关于如何编写模型训练脚本，请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md)；更多高阶训练算法，请参考[训练框架详解](/docs/Training/)。

 训练技巧：

--- a/docs/zh/Pipeline_Usage/Environment_Variables.md
+++ b/docs/zh/Pipeline_Usage/Environment_Variables.md
@@ -28,7 +28,7 @@ DIFFSYNTH_MODEL_BASE_PATH="./path_to_my_models" python xxx.py

 ## `DIFFSYNTH_ATTENTION_IMPLEMENTATION`

-注意力机制实现的方式，可以设置为 `flash_attention_3`、`flash_attention_2`、`sage_attention`、`xformers`、`torch`。详见 [`./core/attention.md`](./core/attention.md).
+注意力机制实现的方式，可以设置为 `flash_attention_3`、`flash_attention_2`、`sage_attention`、`xformers`、`torch`。详见 [`./core/attention.md`](/docs/zh/API_Reference/core/attention.md).

 ## `DIFFSYNTH_DISK_MAP_BUFFER_SIZE`

--- a/docs/zh/Pipeline_Usage/Model_Inference.md
+++ b/docs/zh/Pipeline_Usage/Model_Inference.md
@@ -22,7 +22,7 @@ pipe = QwenImagePipeline.from_pretrained(
 )
 ```

-其中 `torch_dtype` 和 `device` 是计算精度和计算设备（不是模型的精度和设备）。`model_configs` 可通过多种方式配置模型路径，关于本项目内部是如何加载模型的，请参考 [`diffsynth.core.loader`](/docs/API_Reference/core/loader.md)。
+其中 `torch_dtype` 和 `device` 是计算精度和计算设备（不是模型的精度和设备）。`model_configs` 可通过多种方式配置模型路径，关于本项目内部是如何加载模型的，请参考 [`diffsynth.core.loader`](/docs/zh/API_Reference/core/loader.md)。

 <details>

@@ -34,7 +34,7 @@ pipe = QwenImagePipeline.from_pretrained(
 > ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
 > ```
 > 
-> 模型文件默认下载到 `./models` 路径，该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](/docs/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。
+> 模型文件默认下载到 `./models` 路径，该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。

 </details>

@@ -61,7 +61,7 @@ pipe = QwenImagePipeline.from_pretrained(

 </details>

-默认情况下，即使模型已经下载完毕，程序仍会向远程查询是否有遗漏文件，如果要完全关闭远程请求，请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](/docs/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True`。
+默认情况下，即使模型已经下载完毕，程序仍会向远程查询是否有遗漏文件，如果要完全关闭远程请求，请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True`。

 ```shell
 import os
@@ -69,7 +69,7 @@ os.environ["DIFFSYNTH_SKIP_DOWNLOAD"] = "True"
 import diffsynth
 ```

-如需从 [HuggingFace](https://huggingface.co/) 下载模型，请将[环境变量 DIFFSYNTH_DOWNLOAD_RESOURCE](/docs/Pipeline_Usage/Environment_Variables.md#diffsynth_download_resource) 设置为 `huggingface`。
+如需从 [HuggingFace](https://huggingface.co/) 下载模型，请将[环境变量 DIFFSYNTH_DOWNLOAD_RESOURCE](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_download_resource) 设置为 `huggingface`。

 ```shell
 import os
@@ -102,4 +102,4 @@ image.save("image.jpg")

 每个模型 `Pipeline` 的输入参数不同，请参考各模型的文档。

-如果模型参数量太大，导致显存不足，请开启[显存管理](./VRAM_management.md)。
+如果模型参数量太大，导致显存不足，请开启[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)。
--- a/docs/zh/Pipeline_Usage/Model_Training.md
+++ b/docs/zh/Pipeline_Usage/Model_Training.md
@@ -65,7 +65,7 @@ image_1.jpg,"a dog"
 image_2.jpg,"a cat"
 ```

-我们构建了样例数据集，以方便您进行测试。了解通用数据集架构是如何实现的，请参考 [`diffsynth.core.data`](/docs/API_Reference/core/data.md)。
+我们构建了样例数据集，以方便您进行测试。了解通用数据集架构是如何实现的，请参考 [`diffsynth.core.data`](/docs/zh/API_Reference/core/data.md)。

 <details>

@@ -93,7 +93,7 @@ image_2.jpg,"a cat"

 ## 加载模型

-类似于[推理时的模型加载](./Model_Inference.md#加载模型)，我们支持多种方式配置模型路径，两种方式是可以混用的。
+类似于[推理时的模型加载](/docs/zh/Pipeline_Usage/Model_Inference.md#加载模型)，我们支持多种方式配置模型路径，两种方式是可以混用的。

 <details>

@@ -115,9 +115,9 @@ image_2.jpg,"a cat"
 > --model_id_with_origin_paths "Qwen/Qwen-Image:transformer/diffusion_pytorch_model*.safetensors,Qwen/Qwen-Image:text_encoder/model*.safetensors,Qwen/Qwen-Image:vae/diffusion_pytorch_model.safetensors"
 > ```
 > 
-> 模型文件默认下载到 `./models` 路径，该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](/docs/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。
+> 模型文件默认下载到 `./models` 路径，该路径可通过[环境变量 DIFFSYNTH_MODEL_BASE_PATH](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_model_base_path) 修改。
 > 
-> 默认情况下，即使模型已经下载完毕，程序仍会向远程查询是否有遗漏文件，如果要完全关闭远程请求，请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](/docs/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True`。
+> 默认情况下，即使模型已经下载完毕，程序仍会向远程查询是否有遗漏文件，如果要完全关闭远程请求，请将[环境变量 DIFFSYNTH_SKIP_DOWNLOAD](/docs/zh/Pipeline_Usage/Environment_Variables.md#diffsynth_skip_download) 设置为 `True`。

 </details>

@@ -235,11 +235,11 @@ accelerate launch --config_file examples/qwen_image/model_training/full/accelera

 ## 训练注意事项

-* 数据集的元数据除 `csv` 格式外，还支持 `json`、`jsonl` 格式，关于如何选择最佳的元数据格式，请参考[](/docs/API_Reference/core/data.md#元数据)
+* 数据集的元数据除 `csv` 格式外，还支持 `json`、`jsonl` 格式，关于如何选择最佳的元数据格式，请参考[](/docs/zh/API_Reference/core/data.md#元数据)
 * 通常训练效果与训练步数强相关，与 epoch 数量弱相关，因此我们更推荐使用参数 `--save_steps` 按训练步数间隔来保存模型文件。
 * 当数据量 * `dataset_repeat` 超过 $10^9$ 时，我们观测到数据集的速度明显变慢，这似乎是 `PyTorch` 的 bug，我们尚不确定新版本的 `PyTorch` 是否已经修复了这一问题。
 * 学习率 `--learning_rate` 在 LoRA 训练中建议设置为 `1e-4`，在全量训练中建议设置为 `1e-5`。
-* 训练框架不支持 batch size > 1，原因是复杂的，详见 [Q&A: 为什么训练框架不支持 batch size > 1？](/docs/QA.md#为什么训练框架不支持-batch-size--1)
+* 训练框架不支持 batch size > 1，原因是复杂的，详见 [Q&A: 为什么训练框架不支持 batch size > 1？](/docs/zh/QA.md#为什么训练框架不支持-batch-size--1)
 * 少数模型包含冗余参数，例如 Qwen-Image 的 DiT 部分最后一层的文本编码部分，在训练这些模型时，需设置 `--find_unused_parameters` 避免在多 GPU 训练中报错。出于对开源社区模型兼容性的考虑，我们不打算删除这些冗余参数。
 * Diffusion 模型的损失函数值与实际效果的关系不大，因此我们在训练过程中不会记录损失函数值。我们建议把 `--num_epochs` 设置为足够大的数值，边训边测，直至效果收敛后手动关闭训练程序。
-* `--use_gradient_checkpointing` 通常是开启的，除非 GPU 显存足够；`--use_gradient_checkpointing_offload` 则按需开启，详见 [`diffsynth.core.gradient`](/docs/API_Reference/core/gradient.md)。
+* `--use_gradient_checkpointing` 通常是开启的，除非 GPU 显存足够；`--use_gradient_checkpointing_offload` 则按需开启，详见 [`diffsynth.core.gradient`](/docs/zh/API_Reference/core/gradient.md)。
--- a/docs/zh/Pipeline_Usage/VRAM_management.md
+++ b/docs/zh/Pipeline_Usage/VRAM_management.md
@@ -140,7 +140,7 @@ image.save("image.jpg")

 在更为极端的情况下，当内存也不足以存储整个模型时，Disk Offload 功能可以让模型参数惰性加载，即，模型中的每个 Layer 仅在调用 forward 时才会从硬盘中读取相应的参数。启用这一功能时，我们建议使用高速的 SSD 硬盘。

-Disk Offload 是极为特殊的显存管理方案，只支持 `.safetensors` 格式文件，不支持 `.bin`、`.pth`、`.ckpt` 等二进制文件，不支持带 Tensor reshape 的 [state dict converter](../Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换)。
+Disk Offload 是极为特殊的显存管理方案，只支持 `.safetensors` 格式文件，不支持 `.bin`、`.pth`、`.ckpt` 等二进制文件，不支持带 Tensor reshape 的 [state dict converter](/docs/zh/Developer_Guide/Integrating_Your_Model.md#step-2-模型文件格式转换)。

 ```python
 from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
@@ -196,7 +196,7 @@ vram_config = {
 * Preparing：Onload 和 Computation 的中间状态，在显存允许的前提下的暂存状态，这个状态由显存管理机制控制切换，当且仅当【vram_limit 设置为无限制】或【vram_limit 已设置且有空余显存】时会进入这一状态
 * Computation：模型正在计算过程中，这个状态由显存管理机制控制切换，仅在 `forward` 中临时进入

-如果你是模型开发者，希望自行控制某个模型的显存管理粒度，请参考[../Developer_Guide/Enabling_VRAM_management.md](../Developer_Guide/Enabling_VRAM_management.md)。
+如果你是模型开发者，希望自行控制某个模型的显存管理粒度，请参考[../Developer_Guide/Enabling_VRAM_management.md](/docs/zh/Developer_Guide/Enabling_VRAM_management.md)。

 ## 最佳实践

--- a/docs/zh/README.md
+++ b/docs/zh/README.md
@@ -26,51 +26,51 @@ graph LR;

 本节介绍 `DiffSynth-Studio` 的基本使用方式，包括如何启用显存管理从而在极低显存的 GPU 上进行推理，以及如何训练任意基础模型、LoRA、ControlNet 等模型。

-* [安装依赖](./Pipeline_Usage/Setup.md)
-* [模型推理](./Pipeline_Usage/Model_Inference.md)
-* [显存管理](./Pipeline_Usage/VRAM_management.md)
-* [模型训练](./Pipeline_Usage/Model_Training.md)
-* [环境变量](./Pipeline_Usage/Environment_Variables.md)
+* [安装依赖](/docs/zh/Pipeline_Usage/Setup.md)
+* [模型推理](/docs/zh/Pipeline_Usage/Model_Inference.md)
+* [显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)
+* [模型训练](/docs/zh/Pipeline_Usage/Model_Training.md)
+* [环境变量](/docs/zh/Pipeline_Usage/Environment_Variables.md)

 ## Section 2: 模型详解

 本节介绍 `DiffSynth-Studio` 所支持的 Diffusion 模型，部分模型 Pipeline 具备可控生成、并行加速等特色功能。

-* [FLUX.1](./Model_Details/FLUX.md)
-* [Wan](./Model_Details/Wan.md)
-* [Qwen-Image](./Model_Details/Qwen-Image.md)
-* [FLUX.2](./Model_Details/FLUX2.md)
-* [Z-Image](./Model_Details/Z-Image.md)
+* [FLUX.1](/docs/zh/Model_Details/FLUX.md)
+* [Wan](/docs/zh/Model_Details/Wan.md)
+* [Qwen-Image](/docs/zh/Model_Details/Qwen-Image.md)
+* [FLUX.2](/docs/zh/Model_Details/FLUX2.md)
+* [Z-Image](/docs/zh/Model_Details/Z-Image.md)

 ## Section 3: 训练框架

 本节介绍 `DiffSynth-Studio` 中训练框架的设计思路，帮助开发者理解 Diffusion 模型训练算法的原理。

-* [Diffusion 模型基本原理](./Training/Understanding_Diffusion_models.md)
-* [标准监督训练](./Training/Supervised_Fine_Tuning.md)
-* [在训练中启用 FP8 精度](./Training/FP8_Precision.md)
-* [端到端的蒸馏加速训练](./Training/Direct_Distill.md)
-* [两阶段拆分训练](./Training/Split_Training.md)
-* [差分 LoRA 训练](./Training/Differential_LoRA.md)
+* [Diffusion 模型基本原理](/docs/zh/Training/Understanding_Diffusion_models.md)
+* [标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md)
+* [在训练中启用 FP8 精度](/docs/zh/Training/FP8_Precision.md)
+* [端到端的蒸馏加速训练](/docs/zh/Training/Direct_Distill.md)
+* [两阶段拆分训练](/docs/zh/Training/Split_Training.md)
+* [差分 LoRA 训练](/docs/zh/Training/Differential_LoRA.md)

 ## Section 4: 模型接入

 本节介绍如何将模型接入 `DiffSynth-Studio` 从而使用框架基础功能，帮助开发者为本项目提供新模型的支持，或进行私有化模型的推理和训练。

-* [接入模型结构](./Developer_Guide/Integrating_Your_Model.md)
-* [接入 Pipeline](./Developer_Guide/Building_a_Pipeline.md)
-* [接入细粒度显存管理](./Developer_Guide/Enabling_VRAM_management.md)
-* [接入模型训练](./Developer_Guide/Training_Diffusion_Models.md)
+* [接入模型结构](/docs/zh/Developer_Guide/Integrating_Your_Model.md)
+* [接入 Pipeline](/docs/zh/Developer_Guide/Building_a_Pipeline.md)
+* [接入细粒度显存管理](/docs/zh/Developer_Guide/Enabling_VRAM_management.md)
+* [接入模型训练](/docs/zh/Developer_Guide/Training_Diffusion_Models.md)

 ## Section 5: API 参考

 本节介绍 `DiffSynth-Studio` 中的独立核心模块 `diffsynth.core`，介绍内部的功能是如何设计和运作的，开发者如有需要，可将其中的功能模块用于其他代码库的开发中。

-* [`diffsynth.core.attention`](./API_Reference/core/attention.md): 注意力机制实现
-* [`diffsynth.core.data`](./API_Reference/core/data.md): 数据处理算子与通用数据集
-* [`diffsynth.core.gradient`](./API_Reference/core/gradient.md): 梯度检查点
-* [`diffsynth.core.loader`](./API_Reference/core/loader.md): 模型下载与加载
-* [`diffsynth.core.vram`](./API_Reference/core/vram.md): 显存管理
+* [`diffsynth.core.attention`](/docs/zh/API_Reference/core/attention.md): 注意力机制实现
+* [`diffsynth.core.data`](/docs/zh/API_Reference/core/data.md): 数据处理算子与通用数据集
+* [`diffsynth.core.gradient`](/docs/zh/API_Reference/core/gradient.md): 梯度检查点
+* [`diffsynth.core.loader`](/docs/zh/API_Reference/core/loader.md): 模型下载与加载
+* [`diffsynth.core.vram`](/docs/zh/API_Reference/core/vram.md): 显存管理

 ## Section 6: 学术导引

@@ -85,4 +85,4 @@ graph LR;

 本节总结了开发者常见的问题，如果你在使用和开发中遇到了问题，请参考本节内容，如果仍无法解决，请到 GitHub 上给我们提 issue。

-* [常见问题](./QA.md)
+* [常见问题](/docs/zh/QA.md)
--- a/docs/zh/Training/Differential_LoRA.md
+++ b/docs/zh/Training/Differential_LoRA.md
@@ -8,8 +8,8 @@

 假设我们有两张内容相似的图像：图 1 和图 2。例如两张图中分别有一辆车，但图 1 中画面细节更少，图 2 中画面细节更多。在差分 LoRA 训练中，我们进行两步训练：

-* 以图 1 为训练数据，以[标准监督训练](./Supervised_Fine_Tuning.md)的方式，训练 LoRA 1
-* 以图 2 为训练数据，将 LoRA 1 融入基础模型后，以[标准监督训练](./Supervised_Fine_Tuning.md)的方式，训练 LoRA 2
+* 以图 1 为训练数据，以[标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md)的方式，训练 LoRA 1
+* 以图 2 为训练数据，将 LoRA 1 融入基础模型后，以[标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md)的方式，训练 LoRA 2

 在第一步训练中，由于训练数据仅有一张图，LoRA 模型很容易过拟合，因此训练完成后，LoRA 1 会让模型毫不犹豫地生成图 1，无论随机种子是什么。在第二步训练中，LoRA 模型再次过拟合，因此训练完成后，在 LoRA 1 和 LoRA 2 的共同作用下，模型会毫不犹豫地生成图 2。简言之：

--- a/docs/zh/Training/Direct_Distill.md
+++ b/docs/zh/Training/Direct_Distill.md
@@ -44,7 +44,7 @@ loss = torch.nn.functional.mse_loss(image_1, image_2)

 ## 在训练框架中使用蒸馏加速训练

-首先，需要生成训练数据，请参考[模型推理](/docs/Pipeline_Usage/Model_Inference.md)部分编写推理代码，以足够多的推理步数生成训练数据。
+首先，需要生成训练数据，请参考[模型推理](/docs/zh/Pipeline_Usage/Model_Inference.md)部分编写推理代码，以足够多的推理步数生成训练数据。

 以 Qwen-Image 为例，以下代码可以生成一张图片：

@@ -67,7 +67,7 @@ image = pipe(prompt, seed=0, num_inference_steps=40)
 image.save("image.jpg")
 ```

-然后，我们把必要的信息编写成[元数据文件](/docs/API_Reference/core/data.md#元数据)：
+然后，我们把必要的信息编写成[元数据文件](/docs/zh/API_Reference/core/data.md#元数据)：

 ```csv
 image,prompt,seed,rand_device,num_inference_steps,cfg_scale
@@ -86,11 +86,11 @@ modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir
 bash examples/qwen_image/model_training/lora/Qwen-Image-Distill-LoRA.sh
 ```

-请注意，在[训练脚本参数](/docs/Pipeline_Usage/Model_Training.md#脚本参数)中，数据集的图像分辨率设置要避免触发缩放处理。当设定 `--height` 和 `--width` 以启用固定分辨率时，所有训练数据必须是以完全一致的宽高生成的；当设定 `--max_pixels` 以启用动态分辨率时，`--max_pixels` 的数值必须大于或等于任一训练图像的像素面积。
+请注意，在[训练脚本参数](/docs/zh/Pipeline_Usage/Model_Training.md#脚本参数)中，数据集的图像分辨率设置要避免触发缩放处理。当设定 `--height` 和 `--width` 以启用固定分辨率时，所有训练数据必须是以完全一致的宽高生成的；当设定 `--max_pixels` 以启用动态分辨率时，`--max_pixels` 的数值必须大于或等于任一训练图像的像素面积。

 ## 训练框架设计思路

-直接蒸馏与[标准监督训练](./Supervised_Fine_Tuning.md)相比，仅训练的损失函数不同，直接蒸馏的损失函数是 `diffsynth.diffusion.loss` 中的 `DirectDistillLoss`。
+直接蒸馏与[标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md)相比，仅训练的损失函数不同，直接蒸馏的损失函数是 `diffsynth.diffusion.loss` 中的 `DirectDistillLoss`。

 ## 未来工作

--- a/docs/zh/Training/FP8_Precision.md
+++ b/docs/zh/Training/FP8_Precision.md
@@ -1,8 +1,8 @@
 # 在训练中启用 FP8 精度

-尽管 `DiffSynth-Studio` 在模型推理中支持[显存管理](/docs/Pipeline_Usage/VRAM_management.md)，但其中的大部分减少显存占用的技术不适合用于训练中，Offload 会导致极为缓慢的训练过程。
+尽管 `DiffSynth-Studio` 在模型推理中支持[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)，但其中的大部分减少显存占用的技术不适合用于训练中，Offload 会导致极为缓慢的训练过程。

-FP8 精度是唯一可在训练过程中启用的显存管理策略，但本框架目前不支持原生 FP8 精度训练，原因详见 [Q&A: 为什么训练框架不支持原生 FP8 精度训练？](/docs/QA.md#为什么训练框架不支持原生-fp8-精度训练)，仅支持将参数不被梯度更新的模型（不需要梯度回传，或梯度仅更新其 LoRA）以 FP8 精度进行存储。
+FP8 精度是唯一可在训练过程中启用的显存管理策略，但本框架目前不支持原生 FP8 精度训练，原因详见 [Q&A: 为什么训练框架不支持原生 FP8 精度训练？](/docs/zh/QA.md#为什么训练框架不支持原生-fp8-精度训练)，仅支持将参数不被梯度更新的模型（不需要梯度回传，或梯度仅更新其 LoRA）以 FP8 精度进行存储。

 ## 启用 FP8

--- a/docs/zh/Training/Split_Training.md
+++ b/docs/zh/Training/Split_Training.md
@@ -8,7 +8,7 @@

 在大部分模型的训练过程中，大量计算发生在“前处理”中，即“与去噪模型无关的计算”，包括 VAE 编码、文本编码等。当对应的模型参数固定时，这部分计算的结果是重复的，在多个 epoch 中每个数据样本的计算结果完全相同，因此我们提供了“拆分训练”功能，该功能可以自动分析并拆分训练过程。

-对于普通文生图模型的标准监督训练，拆分过程是非常简单的，只需要把所有 [`Pipeline Units`](/docs/Developer_Guide/Building_a_Pipeline.md#units) 的计算拆分到第一阶段，将计算结果存储到硬盘中，然后在第二阶段从硬盘中读取这些结果并进行后续计算即可。但如果前处理过程中需要梯度回传，情况就变得极其复杂，为此，我们引入了一个计算图拆分算法用于分析如何拆分计算。
+对于普通文生图模型的标准监督训练，拆分过程是非常简单的，只需要把所有 [`Pipeline Units`](/docs/zh/Developer_Guide/Building_a_Pipeline.md#units) 的计算拆分到第一阶段，将计算结果存储到硬盘中，然后在第二阶段从硬盘中读取这些结果并进行后续计算即可。但如果前处理过程中需要梯度回传，情况就变得极其复杂，为此，我们引入了一个计算图拆分算法用于分析如何拆分计算。

 ## 计算图拆分算法

@@ -16,7 +16,7 @@

 ## 使用拆分训练

-拆分训练已支持[标准监督训练](./Supervised_Fine_Tuning.md)和[直接蒸馏训练](./Direct_Distill.md)，在训练命令中通过 `--task` 参数控制，以 Qwen-Image 模型的 LoRA 训练为例，拆分前的训练命令为：
+拆分训练已支持[标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md)和[直接蒸馏训练](/docs/zh/Training/Direct_Distill.md)，在训练命令中通过 `--task` 参数控制，以 Qwen-Image 模型的 LoRA 训练为例，拆分前的训练命令为：

 ```shell
 accelerate launch examples/qwen_image/model_training/train.py \
--- a/docs/zh/Training/Supervised_Fine_Tuning.md
+++ b/docs/zh/Training/Supervised_Fine_Tuning.md
@@ -1,10 +1,10 @@
 # 标准监督训练

-在理解 [Diffusion 模型基本原理](./Understanding_Diffusion_models.md)之后，本文档介绍框架如何实现 Diffusion 模型的训练。本文档介绍框架的原理，帮助开发者编写新的训练代码，如需使用我们提供的默认训练功能，请参考[模型训练](/docs/Pipeline_Usage/Model_Training.md)。
+在理解 [Diffusion 模型基本原理](/docs/zh/Training/Understanding_Diffusion_models.md)之后，本文档介绍框架如何实现 Diffusion 模型的训练。本文档介绍框架的原理，帮助开发者编写新的训练代码，如需使用我们提供的默认训练功能，请参考[模型训练](/docs/zh/Pipeline_Usage/Model_Training.md)。

 回顾前文中的模型训练伪代码，当我们实际编写代码时，情况会变得极为复杂。部分模型需要输入额外的引导条件并进行预处理，例如 ControlNet；部分模型需要与去噪模型进行交叉式的计算，例如 VACE；部分模型因显存需求过大，需要开启 Gradient Checkpointing，例如 Qwen-Image 的 DiT。

-为了实现严格的推理和训练一致性，我们对 `Pipeline` 等组件进行了抽象封装，在训练过程中大量复用推理代码。请参考[接入 Pipeline](/docs/Developer_Guide/Building_a_Pipeline.md) 了解 `Pipeline` 组件的设计。接下来我们介绍训练框架如何利用 `Pipeline` 组件构建训练算法。
+为了实现严格的推理和训练一致性，我们对 `Pipeline` 等组件进行了抽象封装，在训练过程中大量复用推理代码。请参考[接入 Pipeline](/docs/zh/Developer_Guide/Building_a_Pipeline.md) 了解 `Pipeline` 组件的设计。接下来我们介绍训练框架如何利用 `Pipeline` 组件构建训练算法。

 ## 框架设计思路

@@ -48,13 +48,13 @@ class QwenImageTrainingModule(DiffusionTrainingModule):
        )
 ```

-加载模型的逻辑与推理时基本一致，支持从远程和本地路径加载模型，详见[模型推理](/docs/Pipeline_Usage/Model_Inference.md)，但请注意不要启用[显存管理](/docs/Pipeline_Usage/VRAM_management.md)。
+加载模型的逻辑与推理时基本一致，支持从远程和本地路径加载模型，详见[模型推理](/docs/zh/Pipeline_Usage/Model_Inference.md)，但请注意不要启用[显存管理](/docs/zh/Pipeline_Usage/VRAM_management.md)。

 `switch_pipe_to_training_mode` 可以将模型切换到训练模式，详见 `switch_pipe_to_training_mode`。

 ### `forward`

-在 `forward` 中需计算损失函数值，先进行前处理，然后经过 `Pipeline` 的 [`model_fn`](/docs/Developer_Guide/Building_a_Pipeline.md#model_fn) 计算损失函数。
+在 `forward` 中需计算损失函数值，先进行前处理，然后经过 `Pipeline` 的 [`model_fn`](/docs/zh/Developer_Guide/Building_a_Pipeline.md#model_fn) 计算损失函数。

 ```python
    def forward(self, data):
@@ -90,7 +90,7 @@ class QwenImageTrainingModule(DiffusionTrainingModule):
 训练框架还需其他模块，包括：

 * accelerator: `accelerate` 提供的训练启动器，详见 [`accelerate`](https://huggingface.co/docs/accelerate/index)
-* dataset: 通用数据集，详见 [`diffsynth.core.data`](/docs/API_Reference/core/data.md)
+* dataset: 通用数据集，详见 [`diffsynth.core.data`](/docs/zh/API_Reference/core/data.md)
 * model_logger: 模型记录器，详见 `diffsynth.diffusion.logger`

 ```python
--- a/docs/zh/Training/Understanding_Diffusion_models.md
+++ b/docs/zh/Training/Understanding_Diffusion_models.md
@@ -140,4 +140,4 @@ $$

 ## 本项目如何封装和实现模型训练？

-请阅读下一文档：[标准监督训练](./Supervised_Fine_Tuning.md)
+请阅读下一文档：[标准监督训练](/docs/zh/Training/Supervised_Fine_Tuning.md)
--- a/examples/dev_tools/fix_path.py
+++ b/examples/dev_tools/fix_path.py
@@ -0,0 +1,36 @@
+import re, os
+
+
+def read_file(path):
+    with open(path, "r", encoding="utf-8-sig") as f:
+        context = f.read()
+    return context
+
+def get_files(files, path):
+    if os.path.isdir(path):
+        for folder in os.listdir(path):
+            get_files(files, os.path.join(path, folder))
+    elif path.endswith(".md"):
+        files.append(path)
+        
+
+test_str = read_file("docs/zh/API_Reference/core/attention.md")
+files = []
+get_files(files, "docs/zh")
+file_map = {}
+for file in files:
+    name = file.split("/")[-1]
+    file_map[name] = "/" + file
+
+pattern = re.compile(r'\]\([^)]*\.md')
+for file in files:
+    context = read_file(file)
+    matches = pattern.findall(context)
+    
+    for match in matches:
+        target = "](" + file_map[match.split("/")[-1].replace("](", "")]
+        context = context.replace(match, target)
+        print(match, target)
+    
+    with open(file, "w", encoding="utf-8") as f:
+        f.write(context)
--- a/examples/dev_tools/unit_test.py
+++ b/examples/dev_tools/unit_test.py