Model Directory

Qwen-Image

Documentation: ./Qwen-Image.md

Effect Preview

Quick Start

from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
from PIL import Image
import torch

pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
)
prompt = "精致肖像，水下少女，蓝裙飘逸，发丝轻扬，光影透澈，气泡环绕，面容恬静，细节精致，梦幻唯美。"
image = pipe(
    prompt, seed=0, num_inference_steps=40,
    # edit_image=Image.open("xxx.jpg").resize((1328, 1328)) # For Qwen-Image-Edit
)
image.save("image.jpg")

Model Lineage

graph LR;
    Qwen/Qwen-Image-->Qwen/Qwen-Image-Edit;
    Qwen/Qwen-Image-Edit-->Qwen/Qwen-Image-Edit-2509;
    Qwen/Qwen-Image-->EliGen-Series;
    EliGen-Series-->DiffSynth-Studio/Qwen-Image-EliGen;
    DiffSynth-Studio/Qwen-Image-EliGen-->DiffSynth-Studio/Qwen-Image-EliGen-V2;
    EliGen-Series-->DiffSynth-Studio/Qwen-Image-EliGen-Poster;
    Qwen/Qwen-Image-->Distill-Series;
    Distill-Series-->DiffSynth-Studio/Qwen-Image-Distill-Full;
    Distill-Series-->DiffSynth-Studio/Qwen-Image-Distill-LoRA;
    Qwen/Qwen-Image-->ControlNet-Series;
    ControlNet-Series-->Blockwise-ControlNet-Series;
    Blockwise-ControlNet-Series-->DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny;
    Blockwise-ControlNet-Series-->DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth;
    Blockwise-ControlNet-Series-->DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint;
    ControlNet-Series-->DiffSynth-Studio/Qwen-Image-In-Context-Control-Union;
    Qwen/Qwen-Image-->DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix;

Model ID	Inference	Low VRAM Inference	Full Training	Validation After Full Training	LoRA Training	Validation After LoRA Training
Qwen/Qwen-Image	code	code	code	code	code	code
Qwen/Qwen-Image-Edit	code	code	code	code	code	code
Qwen/Qwen-Image-Edit-2509	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-EliGen	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-EliGen-V2	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-EliGen-Poster	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-Distill-Full	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-Distill-LoRA	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-In-Context-Control-Union	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix	code	code	-	-	-	-

FLUX Series

Documentation: ./FLUX.md

Effect Preview

Quick Start

import torch
from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig

pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
    ],
)

image = pipe(prompt="a cat", seed=0)
image.save("image.jpg")

Model Lineage

graph LR;
    FLUX.1-Series-->black-forest-labs/FLUX.1-dev;
    FLUX.1-Series-->black-forest-labs/FLUX.1-Krea-dev;
    FLUX.1-Series-->black-forest-labs/FLUX.1-Kontext-dev;
    black-forest-labs/FLUX.1-dev-->FLUX.1-dev-ControlNet-Series;
    FLUX.1-dev-ControlNet-Series-->alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta;
    FLUX.1-dev-ControlNet-Series-->InstantX/FLUX.1-dev-Controlnet-Union-alpha;
    FLUX.1-dev-ControlNet-Series-->jasperai/Flux.1-dev-Controlnet-Upscaler;
    black-forest-labs/FLUX.1-dev-->InstantX/FLUX.1-dev-IP-Adapter;
    black-forest-labs/FLUX.1-dev-->ByteDance/InfiniteYou;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/Eligen;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/LoRAFusion-preview-FLUX.1-dev;
    black-forest-labs/FLUX.1-dev-->ostris/Flex.2-preview;
    black-forest-labs/FLUX.1-dev-->stepfun-ai/Step1X-Edit;
    Qwen/Qwen2.5-VL-7B-Instruct-->stepfun-ai/Step1X-Edit;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/Nexus-GenV2;
    Qwen/Qwen2.5-VL-7B-Instruct-->DiffSynth-Studio/Nexus-GenV2;

Model ID	Extra Parameters	Inference	Low VRAM Inference	Full Training	Validation After Full Training	LoRA Training	Validation After LoRA Training
black-forest-labs/FLUX.1-dev		code	code	code	code	code	code
black-forest-labs/FLUX.1-Krea-dev		code	code	code	code	code	code
black-forest-labs/FLUX.1-Kontext-dev	`kontext_images`	code	code	code	code	code	code
alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta	`controlnet_inputs`	code	code	code	code	code	code
InstantX/FLUX.1-dev-Controlnet-Union-alpha	`controlnet_inputs`	code	code	code	code	code	code
jasperai/Flux.1-dev-Controlnet-Upscaler	`controlnet_inputs`	code	code	code	code	code	code
InstantX/FLUX.1-dev-IP-Adapter	`ipadapter_images`, `ipadapter_scale`	code	code	code	code	code	code
ByteDance/InfiniteYou	`infinityou_id_image`, `infinityou_guidance`, `controlnet_inputs`	code	code	code	code	code	code
DiffSynth-Studio/Eligen	`eligen_entity_prompts`, `eligen_entity_masks`, `eligen_enable_on_negative`, `eligen_enable_inpaint`	code	code	-	-	code	code
DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev	`lora_encoder_inputs`, `lora_encoder_scale`	code	code	code	code	-	-
DiffSynth-Studio/LoRAFusion-preview-FLUX.1-dev		code	-	-	-	-	-
stepfun-ai/Step1X-Edit	`step1x_reference_image`	code	code	code	code	code	code
ostris/Flex.2-preview	`flex_inpaint_image`, `flex_inpaint_mask`, `flex_control_image`, `flex_control_strength`, `flex_control_stop`	code	code	code	code	code	code
DiffSynth-Studio/Nexus-GenV2	`nexus_gen_reference_image`	code	code	code	code	code	code

Wan Series

Documentation: ./Wan.md

Effect Preview

https://github.com/user-attachments/assets/1d66ae74-3b02-40a9-acc3-ea95fc039314

Quick Start

import torch
from diffsynth.utils.data import save_video
from diffsynth.pipelines.wan_video import WanVideoPipeline, ModelConfig

pipe = WanVideoPipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="models_t5_umt5-xxl-enc-bf16.pth"),
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="Wan2.1_VAE.pth"),
    ],
)

video = pipe(
    prompt="纪实摄影风格画面，一只活泼的小狗在绿茵茵的草地上迅速奔跑。小狗毛色棕黄，两只耳朵立起，神情专注而欢快。阳光洒在它身上，使得毛发看上去格外柔软而闪亮。背景是一片开阔的草地，偶尔点缀着几朵野花，远处隐约可见蓝天和几片白云。透视感鲜明，捕捉小狗奔跑时的动感和四周草地的生机。中景侧面移动视角。",
    negative_prompt="色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走",
    seed=0, tiled=True,
)
save_video(video, "video.mp4", fps=15, quality=5)

Model Lineage

graph LR;
    Wan-Series-->Wan2.1-Series;
    Wan-Series-->Wan2.2-Series;
    Wan2.1-Series-->Wan-AI/Wan2.1-T2V-1.3B;
    Wan2.1-Series-->Wan-AI/Wan2.1-T2V-14B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.1-I2V-14B-480P;
    Wan-AI/Wan2.1-I2V-14B-480P-->Wan-AI/Wan2.1-I2V-14B-720P;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.1-FLF2V-14B-720P;
    Wan-AI/Wan2.1-T2V-1.3B-->iic/VACE-Wan2.1-1.3B-Preview;
    iic/VACE-Wan2.1-1.3B-Preview-->Wan-AI/Wan2.1-VACE-1.3B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.1-VACE-14B;
    Wan-AI/Wan2.1-T2V-1.3B-->Wan2.1-Fun-1.3B-Series;
    Wan2.1-Fun-1.3B-Series-->PAI/Wan2.1-Fun-1.3B-InP;
    Wan2.1-Fun-1.3B-Series-->PAI/Wan2.1-Fun-1.3B-Control;
    Wan-AI/Wan2.1-T2V-14B-->Wan2.1-Fun-14B-Series;
    Wan2.1-Fun-14B-Series-->PAI/Wan2.1-Fun-14B-InP;
    Wan2.1-Fun-14B-Series-->PAI/Wan2.1-Fun-14B-Control;
    Wan-AI/Wan2.1-T2V-1.3B-->Wan2.1-Fun-V1.1-1.3B-Series;
    Wan2.1-Fun-V1.1-1.3B-Series-->PAI/Wan2.1-Fun-V1.1-1.3B-Control;
    Wan2.1-Fun-V1.1-1.3B-Series-->PAI/Wan2.1-Fun-V1.1-1.3B-InP;
    Wan2.1-Fun-V1.1-1.3B-Series-->PAI/Wan2.1-Fun-V1.1-1.3B-Control-Camera;
    Wan-AI/Wan2.1-T2V-14B-->Wan2.1-Fun-V1.1-14B-Series;
    Wan2.1-Fun-V1.1-14B-Series-->PAI/Wan2.1-Fun-V1.1-14B-Control;
    Wan2.1-Fun-V1.1-14B-Series-->PAI/Wan2.1-Fun-V1.1-14B-InP;
    Wan2.1-Fun-V1.1-14B-Series-->PAI/Wan2.1-Fun-V1.1-14B-Control-Camera;
    Wan-AI/Wan2.1-T2V-1.3B-->DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1;
    Wan-AI/Wan2.1-T2V-14B-->krea/krea-realtime-video;
    Wan-AI/Wan2.1-I2V-14B-720P-->ByteDance/Video-As-Prompt-Wan2.1-14B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.2-Animate-14B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.2-S2V-14B;
    Wan2.2-Series-->Wan-AI/Wan2.2-T2V-A14B;
    Wan2.2-Series-->Wan-AI/Wan2.2-I2V-A14B;
    Wan2.2-Series-->Wan-AI/Wan2.2-TI2V-5B;
    Wan-AI/Wan2.2-T2V-A14B-->Wan2.2-Fun-Series;
    Wan2.2-Fun-Series-->PAI/Wan2.2-VACE-Fun-A14B;
    Wan2.2-Fun-Series-->PAI/Wan2.2-Fun-A14B-InP;
    Wan2.2-Fun-Series-->PAI/Wan2.2-Fun-A14B-Control;
    Wan2.2-Fun-Series-->PAI/Wan2.2-Fun-A14B-Control-Camera;

Model ID	Extra Parameters	Inference	Full Training	Validation After Full Training	LoRA Training	Validation After LoRA Training
Wan-AI/Wan2.1-T2V-1.3B		code	code	code	code	code
Wan-AI/Wan2.1-T2V-14B		code	code	code	code	code
Wan-AI/Wan2.1-I2V-14B-480P	`input_image`	code	code	code	code	code
Wan-AI/Wan2.1-I2V-14B-720P	`input_image`	code	code	code	code	code
Wan-AI/Wan2.1-FLF2V-14B-720P	`input_image`, `end_image`	code	code	code	code	code
iic/VACE-Wan2.1-1.3B-Preview	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
Wan-AI/Wan2.1-VACE-1.3B	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
Wan-AI/Wan2.1-VACE-14B	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
PAI/Wan2.1-Fun-1.3B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-1.3B-Control	`control_video`	code	code	code	code	code
PAI/Wan2.1-Fun-14B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-14B-Control	`control_video`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-1.3B-Control	`control_video`, `reference_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-14B-Control	`control_video`, `reference_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-1.3B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-14B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-1.3B-Control-Camera	`control_camera_video`, `input_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-14B-Control-Camera	`control_camera_video`, `input_image`	code	code	code	code	code
DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1	`motion_bucket_id`	code	code	code	code	code
krea/krea-realtime-video		code	code	code	code	code
meituan-longcat/LongCat-Video	`longcat_video`	code	code	code	code	code
ByteDance/Video-As-Prompt-Wan2.1-14B	`vap_video`, `vap_prompt`	code	code	code	code	code
Wan-AI/Wan2.2-T2V-A14B		code	code	code	code	code
Wan-AI/Wan2.2-I2V-A14B	`input_image`	code	code	code	code	code
Wan-AI/Wan2.2-TI2V-5B	`input_image`	code	code	code	code	code
Wan-AI/Wan2.2-Animate-14B	`input_image`, `animate_pose_video`, `animate_face_video`, `animate_inpaint_video`, `animate_mask_video`	code	code	code	code	code
Wan-AI/Wan2.2-S2V-14B	`input_image`, `input_audio`, `audio_sample_rate`, `s2v_pose_video`	code	code	code	code	code
PAI/Wan2.2-VACE-Fun-A14B	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
PAI/Wan2.2-Fun-A14B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.2-Fun-A14B-Control	`control_video`, `reference_image`	code	code	code	code	code
PAI/Wan2.2-Fun-A14B-Control-Camera	`control_camera_video`, `input_image`	code	code	code	code	code

FP8 Precision Training: doc, code
Two-stage Split Training: doc, code
End-to-end Direct Distillation: doc, code

39 KiB Raw Permalink Blame History Unescape Escape

Model Directory

Qwen-Image

FLUX Series

Wan Series

39 KiB

Raw Permalink Blame History