mirror of
https://github.com/modelscope/DiffSynth-Studio.git
synced 2026-03-18 22:08:13 +00:00
fix wans2v bug and update readme
This commit is contained in:
41
README.md
41
README.md
@@ -760,6 +760,37 @@ Example code for Wan is available at: [/examples/wanvideo/](/examples/wanvideo/)
|
|||||||
|
|
||||||
DiffSynth-Studio is not just an engineered model framework, but also an incubator for innovative achievements.
|
DiffSynth-Studio is not just an engineered model framework, but also an incubator for innovative achievements.
|
||||||
|
|
||||||
|
<details>
|
||||||
|
|
||||||
|
<summary>Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image Generation</summary>
|
||||||
|
|
||||||
|
- Paper: [Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image Generation
|
||||||
|
](https://arxiv.org/abs/2602.03208)
|
||||||
|
- Sample Code: []()
|
||||||
|
|
||||||
|
|FLUX|FLUX + SES|Qwen-Image|Qwen-Image + SES|
|
||||||
|
|-|-|-|-|
|
||||||
|
|||||
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
|
<details>
|
||||||
|
|
||||||
|
<summary>VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers</summary>
|
||||||
|
|
||||||
|
- Paper: [VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers
|
||||||
|
](https://arxiv.org/abs/2602.03210)
|
||||||
|
- Sample code: [/examples/flux/model_inference/FLUX.1-dev-LoRA-Fusion.py](/examples/flux/model_inference/FLUX.1-dev-LoRA-Fusion.py)
|
||||||
|
- Model: [ModelScope](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA)
|
||||||
|
|
||||||
|
|Example 1|Example 2|Query|Output|
|
||||||
|
|-|-|-|-|
|
||||||
|
|||||
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
|
|
||||||
<summary>AttriCtrl: Attribute Intensity Control for Image Generation Models</summary>
|
<summary>AttriCtrl: Attribute Intensity Control for Image Generation Models</summary>
|
||||||
@@ -770,7 +801,7 @@ DiffSynth-Studio is not just an engineered model framework, but also an incubato
|
|||||||
|
|
||||||
|brightness scale = 0.1|brightness scale = 0.3|brightness scale = 0.5|brightness scale = 0.7|brightness scale = 0.9|
|
|brightness scale = 0.1|brightness scale = 0.3|brightness scale = 0.5|brightness scale = 0.7|brightness scale = 0.9|
|
||||||
|-|-|-|-|-|
|
|-|-|-|-|-|
|
||||||
||||||
|
||||||
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
@@ -785,10 +816,10 @@ DiffSynth-Studio is not just an engineered model framework, but also an incubato
|
|||||||
|
|
||||||
||[LoRA 1](https://modelscope.cn/models/cancel13/cxsk)|[LoRA 2](https://modelscope.cn/models/wy413928499/xuancai2)|[LoRA 3](https://modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1)|[LoRA 4](https://modelscope.cn/models/hongyanbujian/JPL)|
|
||[LoRA 1](https://modelscope.cn/models/cancel13/cxsk)|[LoRA 2](https://modelscope.cn/models/wy413928499/xuancai2)|[LoRA 3](https://modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1)|[LoRA 4](https://modelscope.cn/models/hongyanbujian/JPL)|
|
||||||
|-|-|-|-|-|
|
|-|-|-|-|-|
|
||||||
|[LoRA 1](https://modelscope.cn/models/cancel13/cxsk) |||||
|
|[LoRA 1](https://modelscope.cn/models/cancel13/cxsk) |||||
|
||||||
|[LoRA 2](https://modelscope.cn/models/wy413928499/xuancai2) |||||
|
|[LoRA 2](https://modelscope.cn/models/wy413928499/xuancai2) |||||
|
||||||
|[LoRA 3](https://modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1) |||||
|
|[LoRA 3](https://modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1) |||||
|
||||||
|[LoRA 4](https://modelscope.cn/models/hongyanbujian/JPL) |||||
|
|[LoRA 4](https://modelscope.cn/models/hongyanbujian/JPL) |||||
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
|||||||
39
README_zh.md
39
README_zh.md
@@ -762,6 +762,35 @@ DiffSynth-Studio 不仅仅是一个工程化的模型框架,更是创新成果
|
|||||||
|
|
||||||
<details>
|
<details>
|
||||||
|
|
||||||
|
<summary>Spectral Evolution Search: 用于奖励对齐图像生成的高效推理阶段缩放</summary>
|
||||||
|
|
||||||
|
- 论文:[Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image Generation
|
||||||
|
](https://arxiv.org/abs/2602.03208)
|
||||||
|
- 代码样例:[]()
|
||||||
|
|
||||||
|
|FLUX|FLUX + SES|Qwen-Image|Qwen-Image + SES|
|
||||||
|
|-|-|-|-|
|
||||||
|
|||||
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
|
||||||
|
<summary>VIRAL:基于DiT模型的类比视觉上下文推理</summary>
|
||||||
|
|
||||||
|
- 论文:[VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers
|
||||||
|
](https://arxiv.org/abs/2602.03210)
|
||||||
|
- 代码样例:[/examples/qwen_image/model_inference/Qwen-Image-Edit-2511-ICEdit.py](/examples/qwen_image/model_inference/Qwen-Image-Edit-2511-ICEdit.py)
|
||||||
|
- 模型:[ModelScope](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA)
|
||||||
|
|
||||||
|
|Example 1|Example 2|Query|Output|
|
||||||
|
|-|-|-|-|
|
||||||
|
|||||
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
|
||||||
<summary>AttriCtrl: 图像生成模型的属性强度控制</summary>
|
<summary>AttriCtrl: 图像生成模型的属性强度控制</summary>
|
||||||
|
|
||||||
- 论文:[AttriCtrl: Fine-Grained Control of Aesthetic Attribute Intensity in Diffusion Models
|
- 论文:[AttriCtrl: Fine-Grained Control of Aesthetic Attribute Intensity in Diffusion Models
|
||||||
@@ -771,7 +800,7 @@ DiffSynth-Studio 不仅仅是一个工程化的模型框架,更是创新成果
|
|||||||
|
|
||||||
|brightness scale = 0.1|brightness scale = 0.3|brightness scale = 0.5|brightness scale = 0.7|brightness scale = 0.9|
|
|brightness scale = 0.1|brightness scale = 0.3|brightness scale = 0.5|brightness scale = 0.7|brightness scale = 0.9|
|
||||||
|-|-|-|-|-|
|
|-|-|-|-|-|
|
||||||
||||||
|
||||||
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
@@ -787,10 +816,10 @@ DiffSynth-Studio 不仅仅是一个工程化的模型框架,更是创新成果
|
|||||||
|
|
||||||
||[LoRA 1](https://modelscope.cn/models/cancel13/cxsk)|[LoRA 2](https://modelscope.cn/models/wy413928499/xuancai2)|[LoRA 3](https://modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1)|[LoRA 4](https://modelscope.cn/models/hongyanbujian/JPL)|
|
||[LoRA 1](https://modelscope.cn/models/cancel13/cxsk)|[LoRA 2](https://modelscope.cn/models/wy413928499/xuancai2)|[LoRA 3](https://modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1)|[LoRA 4](https://modelscope.cn/models/hongyanbujian/JPL)|
|
||||||
|-|-|-|-|-|
|
|-|-|-|-|-|
|
||||||
|[LoRA 1](https://modelscope.cn/models/cancel13/cxsk) |||||
|
|[LoRA 1](https://modelscope.cn/models/cancel13/cxsk) |||||
|
||||||
|[LoRA 2](https://modelscope.cn/models/wy413928499/xuancai2) |||||
|
|[LoRA 2](https://modelscope.cn/models/wy413928499/xuancai2) |||||
|
||||||
|[LoRA 3](https://modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1) |||||
|
|[LoRA 3](https://modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1) |||||
|
||||||
|[LoRA 4](https://modelscope.cn/models/hongyanbujian/JPL) |||||
|
|[LoRA 4](https://modelscope.cn/models/hongyanbujian/JPL) |||||
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,47 @@
|
|||||||
|
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
|
||||||
|
from modelscope import snapshot_download
|
||||||
|
from PIL import Image
|
||||||
|
import torch
|
||||||
|
|
||||||
|
# Load models
|
||||||
|
pipe = QwenImagePipeline.from_pretrained(
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
device="cuda",
|
||||||
|
model_configs=[
|
||||||
|
ModelConfig(model_id="Qwen/Qwen-Image-Edit-2511", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
|
||||||
|
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
|
||||||
|
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
|
||||||
|
],
|
||||||
|
processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
|
||||||
|
)
|
||||||
|
lora = ModelConfig(
|
||||||
|
model_id="DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA",
|
||||||
|
origin_file_pattern="model.safetensors"
|
||||||
|
)
|
||||||
|
pipe.load_lora(pipe.dit, lora)
|
||||||
|
|
||||||
|
# Load images
|
||||||
|
snapshot_download(
|
||||||
|
"DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA",
|
||||||
|
local_dir="./data",
|
||||||
|
allow_file_pattern="assets/*"
|
||||||
|
)
|
||||||
|
edit_image = [
|
||||||
|
Image.open("data/assets/image1_original.png"),
|
||||||
|
Image.open("data/assets/image1_edit_1.png"),
|
||||||
|
Image.open("data/assets/image2_original.png")
|
||||||
|
]
|
||||||
|
prompt = "Edit image 3 based on the transformation from image 1 to image 2."
|
||||||
|
negative_prompt = "泛黄,AI感,不真实,丑陋,油腻的皮肤,异常的肢体,不协调的肢体"
|
||||||
|
|
||||||
|
# Generate
|
||||||
|
image_4 = pipe(
|
||||||
|
prompt=prompt, negative_prompt=negative_prompt,
|
||||||
|
edit_image=edit_image,
|
||||||
|
seed=1,
|
||||||
|
num_inference_steps=50,
|
||||||
|
height=1280,
|
||||||
|
width=720,
|
||||||
|
zero_cond_t=True,
|
||||||
|
)
|
||||||
|
image_4.save("image.png")
|
||||||
@@ -7,6 +7,7 @@ accelerate launch --config_file examples/wanvideo/model_training/full/accelerate
|
|||||||
--num_frames 81 \
|
--num_frames 81 \
|
||||||
--dataset_repeat 100 \
|
--dataset_repeat 100 \
|
||||||
--model_id_with_origin_paths "Wan-AI/Wan2.2-S2V-14B:diffusion_pytorch_model*.safetensors,Wan-AI/Wan2.2-S2V-14B:wav2vec2-large-xlsr-53-english/model.safetensors,Wan-AI/Wan2.2-S2V-14B:models_t5_umt5-xxl-enc-bf16.pth,Wan-AI/Wan2.2-S2V-14B:Wan2.1_VAE.pth" \
|
--model_id_with_origin_paths "Wan-AI/Wan2.2-S2V-14B:diffusion_pytorch_model*.safetensors,Wan-AI/Wan2.2-S2V-14B:wav2vec2-large-xlsr-53-english/model.safetensors,Wan-AI/Wan2.2-S2V-14B:models_t5_umt5-xxl-enc-bf16.pth,Wan-AI/Wan2.2-S2V-14B:Wan2.1_VAE.pth" \
|
||||||
|
--audio_processor_path "Wan-AI/Wan2.2-S2V-14B:wav2vec2-large-xlsr-53-english/" \
|
||||||
--learning_rate 1e-5 \
|
--learning_rate 1e-5 \
|
||||||
--num_epochs 1 \
|
--num_epochs 1 \
|
||||||
--trainable_models "dit" \
|
--trainable_models "dit" \
|
||||||
|
|||||||
@@ -7,6 +7,7 @@ accelerate launch --config_file examples/wanvideo/model_training/full/accelerate
|
|||||||
--num_frames 81 \
|
--num_frames 81 \
|
||||||
--dataset_repeat 100 \
|
--dataset_repeat 100 \
|
||||||
--model_id_with_origin_paths "Wan-AI/Wan2.2-S2V-14B:diffusion_pytorch_model*.safetensors,Wan-AI/Wan2.2-S2V-14B:wav2vec2-large-xlsr-53-english/model.safetensors,Wan-AI/Wan2.2-S2V-14B:models_t5_umt5-xxl-enc-bf16.pth,Wan-AI/Wan2.2-S2V-14B:Wan2.1_VAE.pth" \
|
--model_id_with_origin_paths "Wan-AI/Wan2.2-S2V-14B:diffusion_pytorch_model*.safetensors,Wan-AI/Wan2.2-S2V-14B:wav2vec2-large-xlsr-53-english/model.safetensors,Wan-AI/Wan2.2-S2V-14B:models_t5_umt5-xxl-enc-bf16.pth,Wan-AI/Wan2.2-S2V-14B:Wan2.1_VAE.pth" \
|
||||||
|
--audio_processor_path "Wan-AI/Wan2.2-S2V-14B:wav2vec2-large-xlsr-53-english/" \
|
||||||
--learning_rate 1e-4 \
|
--learning_rate 1e-4 \
|
||||||
--num_epochs 5 \
|
--num_epochs 5 \
|
||||||
--remove_prefix_in_ckpt "pipe.dit." \
|
--remove_prefix_in_ckpt "pipe.dit." \
|
||||||
|
|||||||
@@ -33,7 +33,7 @@ class WanTrainingModule(DiffusionTrainingModule):
|
|||||||
# Load models
|
# Load models
|
||||||
model_configs = self.parse_model_configs(model_paths, model_id_with_origin_paths, fp8_models=fp8_models, offload_models=offload_models, device=device)
|
model_configs = self.parse_model_configs(model_paths, model_id_with_origin_paths, fp8_models=fp8_models, offload_models=offload_models, device=device)
|
||||||
tokenizer_config = ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="google/umt5-xxl/") if tokenizer_path is None else ModelConfig(tokenizer_path)
|
tokenizer_config = ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="google/umt5-xxl/") if tokenizer_path is None else ModelConfig(tokenizer_path)
|
||||||
audio_processor_config = ModelConfig(model_id="Wan-AI/Wan2.2-S2V-14B", origin_file_pattern="wav2vec2-large-xlsr-53-english/") if audio_processor_path is None else ModelConfig(audio_processor_path)
|
audio_processor_config = self.parse_path_or_model_id(audio_processor_path)
|
||||||
self.pipe = WanVideoPipeline.from_pretrained(torch_dtype=torch.bfloat16, device=device, model_configs=model_configs, tokenizer_config=tokenizer_config, audio_processor_config=audio_processor_config)
|
self.pipe = WanVideoPipeline.from_pretrained(torch_dtype=torch.bfloat16, device=device, model_configs=model_configs, tokenizer_config=tokenizer_config, audio_processor_config=audio_processor_config)
|
||||||
self.pipe = self.split_pipeline_units(task, self.pipe, trainable_models, lora_base_model)
|
self.pipe = self.split_pipeline_units(task, self.pipe, trainable_models, lora_base_model)
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user