Model Inference

This document uses the Qwen-Image model as an example to introduce how to use DiffSynth-Studio for model inference.

Loading Models

Models are loaded through from_pretrained:

from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
import torch

pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
)

Where torch_dtype and device are computation precision and computation device (not model precision and device). model_configs can be configured in multiple ways for model paths. For how models are loaded internally in this project, please refer to diffsynth.core.loader.

Download and load models from remote sources

DiffSynth-Studio downloads and loads models from ModelScope by default. You need to fill in model_id and origin_file_pattern, for example:
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
Model files are downloaded to the ./models path by default, which can be modified through environment variable DIFFSYNTH_MODEL_BASE_PATH.

Load models from local file paths

Fill in path, for example:

ModelConfig(path="models/xxx.safetensors")

For models loaded from multiple files, use a list, for example:

ModelConfig(path=[
    "models/Qwen/Qwen-Image/text_encoder/model-00001-of-00004.safetensors",
    "models/Qwen/Qwen-Image/text_encoder/model-00002-of-00004.safetensors",
    "models/Qwen/Qwen-Image/text_encoder/model-00003-of-00004.safetensors",
    "models/Qwen/Qwen-Image/text_encoder/model-00004-of-00004.safetensors",
])

By default, even after models have been downloaded, the program will still query remotely for missing files. To completely disable remote requests, set environment variable DIFFSYNTH_SKIP_DOWNLOAD to True.

import os
os.environ["DIFFSYNTH_SKIP_DOWNLOAD"] = "True"
import diffsynth

To download models from HuggingFace, set environment variable DIFFSYNTH_DOWNLOAD_SOURCE to huggingface.

import os
os.environ["DIFFSYNTH_DOWNLOAD_SOURCE"] = "huggingface"
import diffsynth

Starting Inference

Input a prompt to start the inference process and generate an image.

from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
import torch

pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
)
prompt = "Exquisite portrait, underwater girl, blue dress flowing, hair floating, translucent light, bubbles surrounding, peaceful face, intricate details, dreamy and ethereal."
image = pipe(prompt, seed=0, num_inference_steps=40)
image.save("image.jpg")

Each model Pipeline has different input parameters. Please refer to the documentation for each model.

If the model parameters are too large, causing insufficient VRAM, please enable VRAM management.

4.2 KiB Raw Blame History

Model Inference

Loading Models

Starting Inference

4.2 KiB

Raw Blame History