mirror of
https://github.com/modelscope/DiffSynth-Studio.git
synced 2026-03-22 16:50:47 +00:00
Merge pull request #1169 from Feng0w0/sample_add
Docs:Supplement NPU training script samples and documentation instruction
This commit is contained in:
@@ -13,7 +13,7 @@ All sample code provided by this project supports NVIDIA GPUs by default, requir
|
||||
AMD provides PyTorch packages based on ROCm, so most models can run without code changes. A small number of models may not be compatible due to their reliance on CUDA-specific instructions.
|
||||
|
||||
## Ascend NPU
|
||||
|
||||
### Inference
|
||||
When using Ascend NPU, you need to replace `"cuda"` with `"npu"` in your code.
|
||||
|
||||
For example, here is the inference code for **Wan2.1-T2V-1.3B**, modified for Ascend NPU:
|
||||
@@ -22,6 +22,7 @@ For example, here is the inference code for **Wan2.1-T2V-1.3B**, modified for As
|
||||
import torch
|
||||
from diffsynth.utils.data import save_video, VideoData
|
||||
from diffsynth.pipelines.wan_video import WanVideoPipeline, ModelConfig
|
||||
from diffsynth.core.device.npu_compatible_device import get_device_name
|
||||
|
||||
vram_config = {
|
||||
"offload_dtype": "disk",
|
||||
@@ -46,7 +47,7 @@ pipe = WanVideoPipeline.from_pretrained(
|
||||
],
|
||||
tokenizer_config=ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="google/umt5-xxl/"),
|
||||
- vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 2,
|
||||
+ vram_limit=torch.npu.mem_get_info("npu:0")[1] / (1024 ** 3) - 2,
|
||||
+ vram_limit=torch.npu.mem_get_info(get_device_name())[1] / (1024 ** 3) - 2,
|
||||
)
|
||||
|
||||
video = pipe(
|
||||
@@ -56,3 +57,28 @@ video = pipe(
|
||||
)
|
||||
save_video(video, "video.mp4", fps=15, quality=5)
|
||||
```
|
||||
|
||||
### Training
|
||||
NPU startup script samples have been added for each type of model,the scripts are stored in the `examples/xxx/special/npu_scripts`, for example `examples/wanvideo/model_training/special/npu_scripts/Wan2.2-T2V-A14B-NPU.sh`.
|
||||
|
||||
In the NPU training scripts, NPU specific environment variables that can optimize performance have been added, and relevant parameters have been enabled for specific models.
|
||||
|
||||
#### Environment variables
|
||||
```shell
|
||||
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
|
||||
```
|
||||
`expandable_segments:<value>`: Enable the memory pool expansion segment function, which is the virtual memory feature.
|
||||
|
||||
```shell
|
||||
export CPU_AFFINITY_CONF=1
|
||||
```
|
||||
Set 0 or not set: indicates not enabling the binding function
|
||||
|
||||
1: Indicates enabling coarse-grained kernel binding
|
||||
|
||||
2: Indicates enabling fine-grained kernel binding
|
||||
|
||||
#### Parameters for specific models
|
||||
| Model | Parameter | Note |
|
||||
|----------------|---------------------------|-------------------|
|
||||
| Wan 14B series | --initialize_model_on_cpu | The 14B model needs to be initialized on the CPU |
|
||||
Reference in New Issue
Block a user