accelerate

2026-03-24 10:18:12 +00:00 · 2025-06-12 10:37:33 +08:00
parent 6a833c7134
commit b25c66b303
5 changed files with 90 additions and 80 deletions
--- a/examples/wanvideo/README_zh.md
+++ b/examples/wanvideo/README_zh.md
@@ -78,6 +78,7 @@ ModelConfig(path=[
 * `local_model_path`: 用于保存下载模型的路径，默认值为 `"./models"`。
 * `skip_download`: 是否跳过下载，默认值为 `False`。当您的网络无法访问[魔搭社区](https://modelscope.cn/)时，请手动下载必要的文件，并将其设置为 `True`。
 * `redirect_common_files`: 是否重定向重复模型文件，默认值为 `True`。由于 Wan 系列模型包括多个基础模型，每个基础模型的 text encoder 等模块都是相同的，为避免重复下载，我们会对模型路径进行重定向。
+* `use_usp`: 是否启用 Unified Sequence Parallel，默认值为 `False`。用于多 GPU 并行推理。

 </details>

@@ -142,6 +143,23 @@ FP8 量化能够大幅度减少显存占用，但不会加速，部分模型在
 </details>


+<details>
+
+<summary>推理加速</summary>
+
+Wan 支持多种加速方案，包括
+
+* 高效注意力机制实现：当您的 Python 环境中安装过这些注意力机制实现方案时，我们将会按照以下优先级自动启用。
+    * [Flash Attention 3](https://github.com/Dao-AILab/flash-attention)
+    * [Flash Attention 2](https://github.com/Dao-AILab/flash-attention)
+    * [Sage Attention](https://github.com/thu-ml/SageAttention)
+    * [torch SDPA](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) (默认设置，建议安装 `torch>=2.5.0`)
+* 统一序列并行：基于 [xDiT](https://github.com/xdit-project/xDiT) 实现的序列并行，请参考[示例代码](./acceleration/unified_sequence_parallel.py)。
+* TeaCache：加速技术 [TeaCache](https://github.com/ali-vilab/TeaCache)，请参考[示例代码](./acceleration/teacache.py)。
+
+</details>
+
+
 <details>

 <summary>输入参数</summary>