mirror of
https://github.com/modelscope/DiffSynth-Studio.git
synced 2026-04-16 15:28:21 +00:00
330 lines
14 KiB
Markdown
330 lines
14 KiB
Markdown
# Template Model Inference
|
|
|
|
## Enabling Template Models on Base Model Pipelines
|
|
|
|
Using the base model [black-forest-labs/FLUX.2-klein-base-4B](https://modelscope.cn/models/black-forest-labs/FLUX.2-klein-base-4B) as an example, when generating images using only the base model:
|
|
|
|
```python
|
|
from diffsynth.diffusion.template import TemplatePipeline
|
|
from diffsynth.pipelines.flux2_image import Flux2ImagePipeline, ModelConfig
|
|
import torch
|
|
|
|
# Load base model
|
|
pipe = Flux2ImagePipeline.from_pretrained(
|
|
torch_dtype=torch.bfloat16,
|
|
device="cuda",
|
|
model_configs=[
|
|
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="text_encoder/*.safetensors"),
|
|
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-base-4B", origin_file_pattern="transformer/*.safetensors"),
|
|
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
|
|
],
|
|
tokenizer_config=ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="tokenizer/"),
|
|
)
|
|
# Generate an image
|
|
image = pipe(
|
|
prompt="a cat",
|
|
seed=0, cfg_scale=4,
|
|
height=1024, width=1024,
|
|
)
|
|
image.save("image.png")
|
|
```
|
|
|
|
The Template model [DiffSynth-Studio/F2KB4B-Template-Brightness](https://modelscope.cn/models/DiffSynth-Studio/F2KB4B-Template-Brightness) can control image brightness during generation. Through the `TemplatePipeline` model, it can be loaded from ModelScope (via `ModelConfig(model_id="xxx/xxx")`) or from a local path (via `ModelConfig(path="xxx")`). Inputting `scale=0.8` increases image brightness. Note that in the code, input parameters for `pipe` must be transferred to `template_pipeline`, and `template_inputs` should be added.
|
|
|
|
```python
|
|
# Load Template model
|
|
template_pipeline = TemplatePipeline.from_pretrained(
|
|
torch_dtype=torch.bfloat16,
|
|
device="cuda",
|
|
model_configs=[
|
|
ModelConfig(model_id="DiffSynth-Studio/F2KB4B-Template-Brightness")
|
|
],
|
|
)
|
|
# Generate an image
|
|
image = template_pipeline(
|
|
pipe,
|
|
prompt="a cat",
|
|
seed=0, cfg_scale=4,
|
|
height=1024, width=1024,
|
|
template_inputs=[{"scale": 0.8}],
|
|
)
|
|
image.save("image_0.8.png")
|
|
```
|
|
|
|
## CFG Enhancement for Template Models
|
|
|
|
Template models can enable CFG (Classifier-Free Guidance) to make control effects more pronounced. For example, with the model [DiffSynth-Studio/F2KB4B-Template-Brightness](https://modelscope.cn/models/DiffSynth-Studio/F2KB4B-Template-Brightness), adding `negative_template_inputs` to the TemplatePipeline input parameters and setting its scale to 0.5 will generate images with more noticeable brightness variations by contrasting both sides.
|
|
|
|
```python
|
|
# Generate an image with CFG
|
|
image = template_pipeline(
|
|
pipe,
|
|
prompt="a cat",
|
|
seed=0, cfg_scale=4,
|
|
height=1024, width=1024,
|
|
template_inputs=[{"scale": 0.8}],
|
|
negative_template_inputs=[{"scale": 0.5}],
|
|
)
|
|
image.save("image_0.8_cfg.png")
|
|
```
|
|
|
|
## Low VRAM Support
|
|
|
|
Template models currently do not support the main framework's VRAM management, but lazy loading can be used - loading Template models only when needed for inference. This significantly reduces VRAM requirements when enabling multiple Template models, with peak VRAM usage being that of a single Template model. Add parameter `lazy_loading=True` to enable.
|
|
|
|
```python
|
|
template_pipeline = TemplatePipeline.from_pretrained(
|
|
torch_dtype=torch.bfloat16,
|
|
device="cuda",
|
|
model_configs=[
|
|
ModelConfig(model_id="DiffSynth-Studio/F2KB4B-Template-Brightness")
|
|
],
|
|
lazy_loading=True,
|
|
)
|
|
```
|
|
|
|
The base model's Pipeline and Template Pipeline are completely independent and can enable VRAM management on demand.
|
|
|
|
When Template model outputs contain LoRA in Template Cache, you need to enable VRAM management for the base model's Pipeline or enable LoRA hot loading (using the code below), otherwise LoRA weights will be叠加.
|
|
|
|
```python
|
|
pipe.dit = pipe.enable_lora_hot_loading(pipe.dit)
|
|
```
|
|
|
|
## Enabling Multiple Template Models
|
|
|
|
`TemplatePipeline` can load multiple Template models. During inference, use `model_id` in `template_inputs` to distinguish inputs for each Template model.
|
|
|
|
After enabling VRAM management for the base model's Pipeline and lazy loading for Template Pipeline, you can load any number of Template models.
|
|
|
|
```python
|
|
from diffsynth.diffusion.template import TemplatePipeline
|
|
from diffsynth.pipelines.flux2_image import Flux2ImagePipeline, ModelConfig
|
|
import torch
|
|
from PIL import Image
|
|
|
|
vram_config = {
|
|
"offload_dtype": "disk",
|
|
"offload_device": "disk",
|
|
"onload_dtype": torch.bfloat16,
|
|
"onload_device": "cuda",
|
|
"preparing_dtype": torch.bfloat16,
|
|
"preparing_device": "cuda",
|
|
"computation_dtype": torch.bfloat16,
|
|
"computation_device": "cuda",
|
|
}
|
|
pipe = Flux2ImagePipeline.from_pretrained(
|
|
torch_dtype=torch.bfloat16,
|
|
device="cuda",
|
|
model_configs=[
|
|
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-base-4B", origin_file_pattern="transformer/*.safetensors", **vram_config),
|
|
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="text_encoder/*.safetensors", **vram_config),
|
|
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
|
|
],
|
|
tokenizer_config=ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="tokenizer/"),
|
|
)
|
|
pipe.dit = pipe.enable_lora_hot_loading(pipe.dit)
|
|
template = TemplatePipeline.from_pretrained(
|
|
torch_dtype=torch.bfloat16,
|
|
device="cuda",
|
|
lazy_loading=True,
|
|
model_configs=[
|
|
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Brightness"),
|
|
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-ControlNet"),
|
|
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Edit"),
|
|
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Upscaler"),
|
|
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-SoftRGB"),
|
|
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Sharpness"),
|
|
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Inpaint"),
|
|
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Aesthetic"),
|
|
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-PandaMeme"),
|
|
],
|
|
)
|
|
```
|
|
|
|
### Super-Resolution + Sharpness Enhancement
|
|
|
|
Combining [DiffSynth-Studio/Template-KleinBase4B-Upscaler](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Upscaler) and [DiffSynth-Studio/Template-KleinBase4B-Sharpness](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Sharpness) can upscale blurry images while improving detail clarity.
|
|
|
|
```python
|
|
image = template(
|
|
pipe,
|
|
prompt="A cat is sitting on a stone.",
|
|
seed=0, cfg_scale=4, num_inference_steps=50,
|
|
template_inputs = [
|
|
{
|
|
"model_id": 3,
|
|
"image": Image.open("data/assets/image_lowres_100.jpg"),
|
|
"prompt": "A cat is sitting on a stone.",
|
|
},
|
|
{
|
|
"model_id": 5,
|
|
"scale": 1,
|
|
},
|
|
],
|
|
negative_template_inputs = [
|
|
{
|
|
"model_id": 3,
|
|
"image": Image.open("data/assets/image_lowres_100.jpg"),
|
|
"prompt": "",
|
|
},
|
|
{
|
|
"model_id": 5,
|
|
"scale": 0,
|
|
},
|
|
],
|
|
)
|
|
image.save("image_Upscaler_Sharpness.png")
|
|
```
|
|
|
|
| Low Resolution Input | High Resolution Output |
|
|
|----------------------|------------------------|
|
|
|  |  |
|
|
|
|
### Structure Control + Aesthetic Alignment + Sharpness Enhancement
|
|
|
|
[DiffSynth-Studio/Template-KleinBase4B-ControlNet](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-ControlNet) controls composition, [DiffSynth-Studio/Template-KleinBase4B-Aesthetic](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Aesthetic) fills in details, and [DiffSynth-Studio/Template-KleinBase4B-Sharpness](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Sharpness) ensures clarity. Combining these three Template models produces exquisite images.
|
|
|
|
```python
|
|
image = template(
|
|
pipe,
|
|
prompt="A cat is sitting on a stone, bathed in bright sunshine.",
|
|
seed=0, cfg_scale=4, num_inference_steps=50,
|
|
template_inputs = [
|
|
{
|
|
"model_id": 1,
|
|
"image": Image.open("data/assets/image_depth.jpg"),
|
|
"prompt": "A cat is sitting on a stone, bathed in bright sunshine.",
|
|
},
|
|
{
|
|
"model_id": 7,
|
|
"lora_ids": list(range(1, 180, 2)),
|
|
"lora_scales": 2.0,
|
|
"merge_type": "mean",
|
|
},
|
|
{
|
|
"model_id": 5,
|
|
"scale": 0.8,
|
|
},
|
|
],
|
|
negative_template_inputs = [
|
|
{
|
|
"model_id": 1,
|
|
"image": Image.open("data/assets/image_depth.jpg"),
|
|
"prompt": "",
|
|
},
|
|
{
|
|
"model_id": 7,
|
|
"lora_ids": list(range(1, 180, 2)),
|
|
"lora_scales": 2.0,
|
|
"merge_type": "mean",
|
|
},
|
|
{
|
|
"model_id": 5,
|
|
"scale": 0,
|
|
},
|
|
],
|
|
)
|
|
image.save("image_Controlnet_Aesthetic_Sharpness.png")
|
|
```
|
|
|
|
| Structure Control Image | Output Image |
|
|
|-------------------------|--------------|
|
|
|  |  |
|
|
|
|
### Structure Control + Image Editing + Color Adjustment
|
|
|
|
[DiffSynth-Studio/Template-KleinBase4B-ControlNet](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-ControlNet) controls composition, [DiffSynth-Studio/Template-KleinBase4B-Edit](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Edit) preserves original image details like fur texture, and [DiffSynth-Studio/Template-KleinBase4B-SoftRGB](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-SoftRGB) controls color tones, creating an artistic masterpiece.
|
|
|
|
```python
|
|
image = template(
|
|
pipe,
|
|
prompt="A cat is sitting on a stone. Colored ink painting.",
|
|
seed=0, cfg_scale=4, num_inference_steps=50,
|
|
template_inputs = [
|
|
{
|
|
"model_id": 1,
|
|
"image": Image.open("data/assets/image_depth.jpg"),
|
|
"prompt": "A cat is sitting on a stone. Colored ink painting.",
|
|
},
|
|
{
|
|
"model_id": 2,
|
|
"image": Image.open("data/assets/image_reference.jpg"),
|
|
"prompt": "Convert the image style to colored ink painting.",
|
|
},
|
|
{
|
|
"model_id": 4,
|
|
"R": 0.9,
|
|
"G": 0.5,
|
|
"B": 0.3,
|
|
},
|
|
],
|
|
negative_template_inputs = [
|
|
{
|
|
"model_id": 1,
|
|
"image": Image.open("data/assets/image_depth.jpg"),
|
|
"prompt": "",
|
|
},
|
|
{
|
|
"model_id": 2,
|
|
"image": Image.open("data/assets/image_reference.jpg"),
|
|
"prompt": "",
|
|
},
|
|
],
|
|
)
|
|
image.save("image_Controlnet_Edit_SoftRGB.png")
|
|
```
|
|
|
|
| Structure Control Image | Editing Input Image | Output Image |
|
|
|-------------------------|---------------------|--------------|
|
|
|  |  |  |
|
|
|
|
### Brightness Control + Image Editing + Local Redrawing
|
|
|
|
[DiffSynth-Studio/Template-KleinBase4B-Brightness](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Brightness) generates bright scenes, [DiffSynth-Studio/Template-KleinBase4B-Edit](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Edit) references original image layout, and [DiffSynth-Studio/Template-KleinBase4B-Inpaint](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Inpaint) keeps background unchanged, generating cross-dimensional content.
|
|
|
|
```python
|
|
image = template(
|
|
pipe,
|
|
prompt="A cat is sitting on a stone. Flat anime style.",
|
|
seed=0, cfg_scale=4, num_inference_steps=50,
|
|
template_inputs = [
|
|
{
|
|
"model_id": 0,
|
|
"scale": 0.6,
|
|
},
|
|
{
|
|
"model_id": 2,
|
|
"image": Image.open("data/assets/image_reference.jpg"),
|
|
"prompt": "Convert the image style to flat anime style.",
|
|
},
|
|
{
|
|
"model_id": 6,
|
|
"image": Image.open("data/assets/image_reference.jpg"),
|
|
"mask": Image.open("data/assets/image_mask_1.jpg"),
|
|
"force_inpaint": True,
|
|
},
|
|
],
|
|
negative_template_inputs = [
|
|
{
|
|
"model_id": 0,
|
|
"scale": 0.5,
|
|
},
|
|
{
|
|
"model_id": 2,
|
|
"image": Image.open("data/assets/image_reference.jpg"),
|
|
"prompt": "",
|
|
},
|
|
{
|
|
"model_id": 6,
|
|
"image": Image.open("data/assets/image_reference.jpg"),
|
|
"mask": Image.open("data/assets/image_mask_1.jpg"),
|
|
},
|
|
],
|
|
)
|
|
image.save("image_Brightness_Edit_Inpaint.png")
|
|
```
|
|
|
|
| Reference Image | Redrawing Area | Output Image |
|
|
|------------------|----------------|--------------|
|
|
|  |  |  | |