Files
DiffSynth-Studio/docs/en/Model_Details/Overview.md
Artiprocher 9ecb9d8fe7 update doc
2025-12-03 19:29:18 +08:00

39 KiB
Raw Permalink Blame History

Model Directory

Qwen-Image

Documentation: ./Qwen-Image.md

Effect Preview

Image

Quick Start
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
from PIL import Image
import torch

pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
)
prompt = "精致肖像,水下少女,蓝裙飘逸,发丝轻扬,光影透澈,气泡环绕,面容恬静,细节精致,梦幻唯美。"
image = pipe(
    prompt, seed=0, num_inference_steps=40,
    # edit_image=Image.open("xxx.jpg").resize((1328, 1328)) # For Qwen-Image-Edit
)
image.save("image.jpg")
Model Lineage
graph LR;
    Qwen/Qwen-Image-->Qwen/Qwen-Image-Edit;
    Qwen/Qwen-Image-Edit-->Qwen/Qwen-Image-Edit-2509;
    Qwen/Qwen-Image-->EliGen-Series;
    EliGen-Series-->DiffSynth-Studio/Qwen-Image-EliGen;
    DiffSynth-Studio/Qwen-Image-EliGen-->DiffSynth-Studio/Qwen-Image-EliGen-V2;
    EliGen-Series-->DiffSynth-Studio/Qwen-Image-EliGen-Poster;
    Qwen/Qwen-Image-->Distill-Series;
    Distill-Series-->DiffSynth-Studio/Qwen-Image-Distill-Full;
    Distill-Series-->DiffSynth-Studio/Qwen-Image-Distill-LoRA;
    Qwen/Qwen-Image-->ControlNet-Series;
    ControlNet-Series-->Blockwise-ControlNet-Series;
    Blockwise-ControlNet-Series-->DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny;
    Blockwise-ControlNet-Series-->DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth;
    Blockwise-ControlNet-Series-->DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint;
    ControlNet-Series-->DiffSynth-Studio/Qwen-Image-In-Context-Control-Union;
    Qwen/Qwen-Image-->DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix;
Model ID Inference Low VRAM Inference Full Training Validation After Full Training LoRA Training Validation After LoRA Training
Qwen/Qwen-Image code code code code code code
Qwen/Qwen-Image-Edit code code code code code code
Qwen/Qwen-Image-Edit-2509 code code code code code code
DiffSynth-Studio/Qwen-Image-EliGen code code - - code code
DiffSynth-Studio/Qwen-Image-EliGen-V2 code code - - code code
DiffSynth-Studio/Qwen-Image-EliGen-Poster code code - - code code
DiffSynth-Studio/Qwen-Image-Distill-Full code code code code code code
DiffSynth-Studio/Qwen-Image-Distill-LoRA code code - - code code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny code code code code code code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth code code code code code code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint code code code code code code
DiffSynth-Studio/Qwen-Image-In-Context-Control-Union code code - - code code
DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix code code - - - -

FLUX Series

Documentation: ./FLUX.md

Effect Preview

Image

Quick Start
import torch
from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig

pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
    ],
)

image = pipe(prompt="a cat", seed=0)
image.save("image.jpg")
Model Lineage
graph LR;
    FLUX.1-Series-->black-forest-labs/FLUX.1-dev;
    FLUX.1-Series-->black-forest-labs/FLUX.1-Krea-dev;
    FLUX.1-Series-->black-forest-labs/FLUX.1-Kontext-dev;
    black-forest-labs/FLUX.1-dev-->FLUX.1-dev-ControlNet-Series;
    FLUX.1-dev-ControlNet-Series-->alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta;
    FLUX.1-dev-ControlNet-Series-->InstantX/FLUX.1-dev-Controlnet-Union-alpha;
    FLUX.1-dev-ControlNet-Series-->jasperai/Flux.1-dev-Controlnet-Upscaler;
    black-forest-labs/FLUX.1-dev-->InstantX/FLUX.1-dev-IP-Adapter;
    black-forest-labs/FLUX.1-dev-->ByteDance/InfiniteYou;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/Eligen;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/LoRAFusion-preview-FLUX.1-dev;
    black-forest-labs/FLUX.1-dev-->ostris/Flex.2-preview;
    black-forest-labs/FLUX.1-dev-->stepfun-ai/Step1X-Edit;
    Qwen/Qwen2.5-VL-7B-Instruct-->stepfun-ai/Step1X-Edit;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/Nexus-GenV2;
    Qwen/Qwen2.5-VL-7B-Instruct-->DiffSynth-Studio/Nexus-GenV2;
Model ID Extra Parameters Inference Low VRAM Inference Full Training Validation After Full Training LoRA Training Validation After LoRA Training
black-forest-labs/FLUX.1-dev code code code code code code
black-forest-labs/FLUX.1-Krea-dev code code code code code code
black-forest-labs/FLUX.1-Kontext-dev kontext_images code code code code code code
alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta controlnet_inputs code code code code code code
InstantX/FLUX.1-dev-Controlnet-Union-alpha controlnet_inputs code code code code code code
jasperai/Flux.1-dev-Controlnet-Upscaler controlnet_inputs code code code code code code
InstantX/FLUX.1-dev-IP-Adapter ipadapter_images, ipadapter_scale code code code code code code
ByteDance/InfiniteYou infinityou_id_image, infinityou_guidance, controlnet_inputs code code code code code code
DiffSynth-Studio/Eligen eligen_entity_prompts, eligen_entity_masks, eligen_enable_on_negative, eligen_enable_inpaint code code - - code code
DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev lora_encoder_inputs, lora_encoder_scale code code code code - -
DiffSynth-Studio/LoRAFusion-preview-FLUX.1-dev code - - - - -
stepfun-ai/Step1X-Edit step1x_reference_image code code code code code code
ostris/Flex.2-preview flex_inpaint_image, flex_inpaint_mask, flex_control_image, flex_control_strength, flex_control_stop code code code code code code
DiffSynth-Studio/Nexus-GenV2 nexus_gen_reference_image code code code code code code

Wan Series

Documentation: ./Wan.md

Effect Preview

https://github.com/user-attachments/assets/1d66ae74-3b02-40a9-acc3-ea95fc039314

Quick Start
import torch
from diffsynth.utils.data import save_video
from diffsynth.pipelines.wan_video import WanVideoPipeline, ModelConfig

pipe = WanVideoPipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="models_t5_umt5-xxl-enc-bf16.pth"),
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="Wan2.1_VAE.pth"),
    ],
)

video = pipe(
    prompt="纪实摄影风格画面,一只活泼的小狗在绿茵茵的草地上迅速奔跑。小狗毛色棕黄,两只耳朵立起,神情专注而欢快。阳光洒在它身上,使得毛发看上去格外柔软而闪亮。背景是一片开阔的草地,偶尔点缀着几朵野花,远处隐约可见蓝天和几片白云。透视感鲜明,捕捉小狗奔跑时的动感和四周草地的生机。中景侧面移动视角。",
    negative_prompt="色调艳丽过曝静态细节模糊不清字幕风格作品画作画面静止整体发灰最差质量低质量JPEG压缩残留丑陋的残缺的多余的手指画得不好的手部画得不好的脸部畸形的毁容的形态畸形的肢体手指融合静止不动的画面杂乱的背景三条腿背景人很多倒着走",
    seed=0, tiled=True,
)
save_video(video, "video.mp4", fps=15, quality=5)
Model Lineage
graph LR;
    Wan-Series-->Wan2.1-Series;
    Wan-Series-->Wan2.2-Series;
    Wan2.1-Series-->Wan-AI/Wan2.1-T2V-1.3B;
    Wan2.1-Series-->Wan-AI/Wan2.1-T2V-14B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.1-I2V-14B-480P;
    Wan-AI/Wan2.1-I2V-14B-480P-->Wan-AI/Wan2.1-I2V-14B-720P;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.1-FLF2V-14B-720P;
    Wan-AI/Wan2.1-T2V-1.3B-->iic/VACE-Wan2.1-1.3B-Preview;
    iic/VACE-Wan2.1-1.3B-Preview-->Wan-AI/Wan2.1-VACE-1.3B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.1-VACE-14B;
    Wan-AI/Wan2.1-T2V-1.3B-->Wan2.1-Fun-1.3B-Series;
    Wan2.1-Fun-1.3B-Series-->PAI/Wan2.1-Fun-1.3B-InP;
    Wan2.1-Fun-1.3B-Series-->PAI/Wan2.1-Fun-1.3B-Control;
    Wan-AI/Wan2.1-T2V-14B-->Wan2.1-Fun-14B-Series;
    Wan2.1-Fun-14B-Series-->PAI/Wan2.1-Fun-14B-InP;
    Wan2.1-Fun-14B-Series-->PAI/Wan2.1-Fun-14B-Control;
    Wan-AI/Wan2.1-T2V-1.3B-->Wan2.1-Fun-V1.1-1.3B-Series;
    Wan2.1-Fun-V1.1-1.3B-Series-->PAI/Wan2.1-Fun-V1.1-1.3B-Control;
    Wan2.1-Fun-V1.1-1.3B-Series-->PAI/Wan2.1-Fun-V1.1-1.3B-InP;
    Wan2.1-Fun-V1.1-1.3B-Series-->PAI/Wan2.1-Fun-V1.1-1.3B-Control-Camera;
    Wan-AI/Wan2.1-T2V-14B-->Wan2.1-Fun-V1.1-14B-Series;
    Wan2.1-Fun-V1.1-14B-Series-->PAI/Wan2.1-Fun-V1.1-14B-Control;
    Wan2.1-Fun-V1.1-14B-Series-->PAI/Wan2.1-Fun-V1.1-14B-InP;
    Wan2.1-Fun-V1.1-14B-Series-->PAI/Wan2.1-Fun-V1.1-14B-Control-Camera;
    Wan-AI/Wan2.1-T2V-1.3B-->DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1;
    Wan-AI/Wan2.1-T2V-14B-->krea/krea-realtime-video;
    Wan-AI/Wan2.1-I2V-14B-720P-->ByteDance/Video-As-Prompt-Wan2.1-14B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.2-Animate-14B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.2-S2V-14B;
    Wan2.2-Series-->Wan-AI/Wan2.2-T2V-A14B;
    Wan2.2-Series-->Wan-AI/Wan2.2-I2V-A14B;
    Wan2.2-Series-->Wan-AI/Wan2.2-TI2V-5B;
    Wan-AI/Wan2.2-T2V-A14B-->Wan2.2-Fun-Series;
    Wan2.2-Fun-Series-->PAI/Wan2.2-VACE-Fun-A14B;
    Wan2.2-Fun-Series-->PAI/Wan2.2-Fun-A14B-InP;
    Wan2.2-Fun-Series-->PAI/Wan2.2-Fun-A14B-Control;
    Wan2.2-Fun-Series-->PAI/Wan2.2-Fun-A14B-Control-Camera;
Model ID Extra Parameters Inference Full Training Validation After Full Training LoRA Training Validation After LoRA Training
Wan-AI/Wan2.1-T2V-1.3B code code code code code
Wan-AI/Wan2.1-T2V-14B code code code code code
Wan-AI/Wan2.1-I2V-14B-480P input_image code code code code code
Wan-AI/Wan2.1-I2V-14B-720P input_image code code code code code
Wan-AI/Wan2.1-FLF2V-14B-720P input_image, end_image code code code code code
iic/VACE-Wan2.1-1.3B-Preview vace_control_video, vace_reference_image code code code code code
Wan-AI/Wan2.1-VACE-1.3B vace_control_video, vace_reference_image code code code code code
Wan-AI/Wan2.1-VACE-14B vace_control_video, vace_reference_image code code code code code
PAI/Wan2.1-Fun-1.3B-InP input_image, end_image code code code code code
PAI/Wan2.1-Fun-1.3B-Control control_video code code code code code
PAI/Wan2.1-Fun-14B-InP input_image, end_image code code code code code
PAI/Wan2.1-Fun-14B-Control control_video code code code code code
PAI/Wan2.1-Fun-V1.1-1.3B-Control control_video, reference_image code code code code code
PAI/Wan2.1-Fun-V1.1-14B-Control control_video, reference_image code code code code code
PAI/Wan2.1-Fun-V1.1-1.3B-InP input_image, end_image code code code code code
PAI/Wan2.1-Fun-V1.1-14B-InP input_image, end_image code code code code code
PAI/Wan2.1-Fun-V1.1-1.3B-Control-Camera control_camera_video, input_image code code code code code
PAI/Wan2.1-Fun-V1.1-14B-Control-Camera control_camera_video, input_image code code code code code
DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1 motion_bucket_id code code code code code
krea/krea-realtime-video code code code code code
meituan-longcat/LongCat-Video longcat_video code code code code code
ByteDance/Video-As-Prompt-Wan2.1-14B vap_video, vap_prompt code code code code code
Wan-AI/Wan2.2-T2V-A14B code code code code code
Wan-AI/Wan2.2-I2V-A14B input_image code code code code code
Wan-AI/Wan2.2-TI2V-5B input_image code code code code code
Wan-AI/Wan2.2-Animate-14B input_image, animate_pose_video, animate_face_video, animate_inpaint_video, animate_mask_video code code code code code
Wan-AI/Wan2.2-S2V-14B input_image, input_audio, audio_sample_rate, s2v_pose_video code code code code code
PAI/Wan2.2-VACE-Fun-A14B vace_control_video, vace_reference_image code code code code code
PAI/Wan2.2-Fun-A14B-InP input_image, end_image code code code code code
PAI/Wan2.2-Fun-A14B-Control control_video, reference_image code code code code code
PAI/Wan2.2-Fun-A14B-Control-Camera control_camera_video, input_image code code code code code
  • FP8 Precision Training: doc, code
  • Two-stage Split Training: doc, code
  • End-to-end Direct Distillation: doc, code