diffsynth 2.0 prototype

2026-03-19 23:08:13 +00:00 · 2025-11-04 10:59:29 +08:00
parent a30ed9093f
commit 288fb7604c
664 changed files with 3581 additions and 2237905 deletions
--- a/docs/API_Reference/Environment_Variables.md
+++ b/docs/API_Reference/Environment_Variables.md
@@ -0,0 +1,35 @@
+# 环境变量
+
+`DiffSynth-Studio` 可通过环境变量控制一些设置。
+
+在 `Python` 代码中，可以使用 `os.environ` 设置环境变量。请注意，环境变量需在 `import diffsynth` 前设置。
+
+```python
+import os
+os.environ["DIFFSYNTH_MODEL_BASE_PATH"] = "./path_to_my_models"
+import diffsynth
+```
+
+在 Linux 操作系统上，也可在命令行临时设置环境变量：
+
+```shell
+DIFFSYNTH_MODEL_BASE_PATH="./path_to_my_models" python xxx.py
+```
+
+以下是 `DiffSynth-Studio` 所支持的环境变量。
+
+## `DIFFSYNTH_SKIP_DOWNLOAD`
+
+是否跳过模型下载。可设置为 `True`、`true`、`False`、`false`，若 `ModelConfig` 中没有设置 `skip_download`，则会根据这一环境变量决定是否跳过模型下载。
+
+## `DIFFSYNTH_MODEL_BASE_PATH`
+
+模型下载根目录。可设置为任意本地路径，若 `ModelConfig` 中没有设置 `local_model_path`，则会将模型文件下载到这一环境变量指向的路径。若两者都未设置，则会将模型文件下载到 `./models`。
+
+## `DIFFSYNTH_ATTENTION_IMPLEMENTATION`
+
+注意力机制实现的方式，可以设置为 `flash_attention_3`、`flash_attention_2`、`sage_attention`、`xformers`、`torch`。详见 [`./core/attention.md`](./core/attention.md).
+
+## `DIFFSYNTH_DISK_MAP_BUFFER_SIZE`
+
+硬盘直连中的 Buffer 大小，默认是 1B（1000000000），数值越大，占用内存越大，速度越快。
--- a/docs/API_Reference/core/attention.md
+++ b/docs/API_Reference/core/attention.md
@@ -0,0 +1,73 @@
+# `diffsynth.core.attention`: 注意力机制实现
+
+`diffsynth.core.attention` 提供了注意力机制实现的路由机制，根据 `Python` 环境中的可用包和[环境变量](../Environment_Variables.md#diffsynth_attention_implementation)自动选择高效的注意力机制实现。
+
+## 注意力机制
+
+注意力机制是在论文[《Attention Is All You Need》](https://arxiv.org/abs/1706.03762)中提出的模型结构，在原论文中，注意力机制按照如下公式实现：
+$$
+\text{Attention}(Q, K, V) = \text{Softmax}\left(
+    \frac{QK^T}{\sqrt{d_k}}
+\right)
+V.
+$$
+在 `PyTorch` 中，可以用如下代码实现：
+```python
+import torch
+
+def attention(query, key, value):
+    scale_factor = 1 / query.size(-1)**0.5
+    attn_weight = query @ key.transpose(-2, -1) * scale_factor
+    attn_weight = torch.softmax(attn_weight, dim=-1)
+    return attn_weight @ value
+
+query = torch.rand(32, 8, 128, 64, dtype=torch.bfloat16, device="cuda")
+key = torch.rand(32, 8, 128, 64, dtype=torch.bfloat16, device="cuda")
+value = torch.rand(32, 8, 128, 64, dtype=torch.bfloat16, device="cuda")
+output_1 = attention(query, key, value)
+```
+
+其中 `query`、`key`、`value` 的维度是 $(b, n, s, d)$：
+* $b$：Batch size
+* $n$: Attention head 的数量
+* $s$: 序列长度
+* $d$: 每个 Attention head 的维数
+
+这部分计算是不包含任何可训练参数的，现代 transformer 架构的模型会在进行这一计算前后经过 Linear 层，本文讨论的“注意力机制”不包含这些计算，仅包含以上代码的计算。
+
+## 更高效的实现
+
+注意到，注意力机制中 Attention Score（公式中的 $\text{Softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)$，代码中的 `attn_weight`）的维度为 $(b, n, s, s)$，其中序列长度 $s$ 通常非常大，这导致计算的时间和空间复杂度达到平方级。以图像生成模型为例，图像的宽度和高度每增加到 2 倍，序列长度增加到 4 倍，计算量和显存需求增加到 16 倍。为了避免高昂的计算成本，需采用更高效的注意力机制实现，包括
+* Flash Attention 3：[GitHub](https://github.com/Dao-AILab/flash-attention)、[论文](https://arxiv.org/abs/2407.08608)
+* Flash Attention 2：[GitHub](https://github.com/Dao-AILab/flash-attention)、[论文](https://arxiv.org/abs/2307.08691)
+* Sage Attention：[GitHub](https://github.com/thu-ml/SageAttention)、[论文](https://arxiv.org/abs/2505.11594)
+* xFormers：[GitHub](https://github.com/facebookresearch/xformers)、[文档](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops)
+* PyTorch：[GitHub](https://github.com/pytorch/pytorch)、[文档](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html)
+
+如需调用除 `PyTorch` 外的其他注意力实现，请按照其 GitHub 页面的指引安装对应的包。`DiffSynth-Studio` 会自动根据 Python 环境中的可用包路由到对应的实现上，也可通过[环境变量](../Environment_Variables.md#diffsynth_attention_implementation)控制。
+
+```python
+from diffsynth.core.attention import attention_forward
+import torch
+
+def attention(query, key, value):
+    scale_factor = 1 / query.size(-1)**0.5
+    attn_weight = query @ key.transpose(-2, -1) * scale_factor
+    attn_weight = torch.softmax(attn_weight, dim=-1)
+    return attn_weight @ value
+
+query = torch.rand(32, 8, 128, 64, dtype=torch.bfloat16, device="cuda")
+key = torch.rand(32, 8, 128, 64, dtype=torch.bfloat16, device="cuda")
+value = torch.rand(32, 8, 128, 64, dtype=torch.bfloat16, device="cuda")
+output_1 = attention(query, key, value)
+output_2 = attention_forward(query, key, value)
+print((output_1 - output_2).abs().mean())
+```
+
+请注意，加速的同时会引入误差，但在大多数情况下误差是可以忽略不计的。
+
+## 最佳实践
+
+**在大多数情况下，我们建议直接使用 `PyTorch` 原生的实现，无需安装任何额外的包。** 虽然其他注意力机制实现可以加速，但加速效果是较为有限的，在少数情况下会出现兼容性和精度不足的问题。
+
+此外，高效的注意力机制实现会逐步集成到 `PyTorch` 中，`PyTorch` 的 `2.9.0` 版本中的 `scaled_dot_product_attention` 已经集成了 Flash Attention 2。我们仍在 `DiffSynth-Studio` 提供这一接口，是为了让一些激进的加速方案能够快速走向应用，尽管它们在稳定性上还需要时间验证。
--- a/docs/API_Reference/core/data.md
+++ b/docs/API_Reference/core/data.md
@@ -0,0 +1,3 @@
+# `diffsynth.core.data`: 通用数据集与数据处理算子
+
+
--- a/docs/API_Reference/core/gradient.md
+++ b/docs/API_Reference/core/gradient.md
--- a/docs/API_Reference/core/loader.md
+++ b/docs/API_Reference/core/loader.md
--- a/docs/API_Reference/core/vram.md
+++ b/docs/API_Reference/core/vram.md
				`@@ -0,0 +1,3 @@`
				# `diffsynth.core.data`: 通用数据集与数据处理算子