Merge pull request #1169 from Feng0w0/sample_add

Docs:Supplement NPU training script samples and documentation instruction
2026-03-22 16:50:47 +00:00 · 2026-01-12 10:08:38 +08:00
parent 8cc3bece6d 62c3d406d9
commit 00f2d1aa5d
13 changed files with 286 additions and 9 deletions
--- a/docs/en/Pipeline_Usage/GPU_support.md
+++ b/docs/en/Pipeline_Usage/GPU_support.md
@@ -13,7 +13,7 @@ All sample code provided by this project supports NVIDIA GPUs by default, requir
 AMD provides PyTorch packages based on ROCm, so most models can run without code changes. A small number of models may not be compatible due to their reliance on CUDA-specific instructions.

 ## Ascend NPU
-
+### Inference
 When using Ascend NPU, you need to replace `"cuda"` with `"npu"` in your code.

 For example, here is the inference code for **Wan2.1-T2V-1.3B**, modified for Ascend NPU:
@@ -22,6 +22,7 @@ For example, here is the inference code for **Wan2.1-T2V-1.3B**, modified for As
 import torch
 from diffsynth.utils.data import save_video, VideoData
 from diffsynth.pipelines.wan_video import WanVideoPipeline, ModelConfig
+from diffsynth.core.device.npu_compatible_device import get_device_name

 vram_config = {
    "offload_dtype": "disk",
@@ -46,7 +47,7 @@ pipe = WanVideoPipeline.from_pretrained(
    ],
    tokenizer_config=ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="google/umt5-xxl/"),
 -   vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 2,
-+   vram_limit=torch.npu.mem_get_info("npu:0")[1] / (1024 ** 3) - 2,
+   vram_limit=torch.npu.mem_get_info(get_device_name())[1] / (1024 ** 3) - 2,
 )

 video = pipe(
@@ -56,3 +57,28 @@ video = pipe(
 )
 save_video(video, "video.mp4", fps=15, quality=5)
 ```
+
+### Training
+NPU startup script samples have been added for each type of model,the scripts are stored in the `examples/xxx/special/npu_scripts`, for example `examples/wanvideo/model_training/special/npu_scripts/Wan2.2-T2V-A14B-NPU.sh`.
+
+In the NPU training scripts, NPU specific environment variables that can optimize performance have been added, and relevant parameters have been enabled for specific models.
+
+#### Environment variables
+```shell
+export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
+```
+`expandable_segments:<value>`: Enable the memory pool expansion segment function, which is the virtual memory feature.
+
+```shell
+export CPU_AFFINITY_CONF=1
+```
+Set 0 or not set: indicates not enabling the binding function
+
+1: Indicates enabling coarse-grained kernel binding
+
+2: Indicates enabling fine-grained kernel binding
+
+#### Parameters for specific models
+| Model          | Parameter                 | Note              |
+|----------------|---------------------------|-------------------|
+| Wan 14B series | --initialize_model_on_cpu | The 14B model needs to be initialized on the CPU |