add a new model

2026-04-21 11:46:58 +00:00 · 2026-04-20 10:56:29 +08:00
parent f58ba5a784
commit 13f2618da2
19 changed files with 433 additions and 7 deletions
--- a/docs/en/Diffusion_Templates/Introducing_Diffusion_Templates.md
+++ b/docs/en/Diffusion_Templates/Introducing_Diffusion_Templates.md
@@ -0,0 +1,67 @@
+# Diffusion Templates
+
+Diffusion Templates is a controllable generation plugin framework for Diffusion models in DiffSynth-Studio, providing additional controllable generation capabilities for base models.
+
+* Open Source Code: [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)
+* Technical Report: coming soon
+* Documentation Reference
+    * Introducing Diffusion Templates: [English Version](https://diffsynth-studio-doc.readthedocs.io/en/latest/Diffusion_Templates/Introducing_Diffusion_Templates.html), [中文版](https://diffsynth-studio-doc.readthedocs.io/zh-cn/latest/Diffusion_Templates/Introducing_Diffusion_Templates.html)
+    * Diffusion Templates Architecture Details: [English Version](https://diffsynth-studio-doc.readthedocs.io/en/latest/Diffusion_Templates/Understanding_Diffusion_Templates.html), [中文版](https://diffsynth-studio-doc.readthedocs.io/zh-cn/latest/Diffusion_Templates/Understanding_Diffusion_Templates.html)
+    * Template Model Inference: [English Version](https://diffsynth-studio-doc.readthedocs.io/en/latest/Diffusion_Templates/Template_Model_Inference.html), [中文版](https://diffsynth-studio-doc.readthedocs.io/zh-cn/latest/Diffusion_Templates/Template_Model_Inference.html)
+    * Template Model Training: [English Version](https://diffsynth-studio-doc.readthedocs.io/en/latest/Diffusion_Templates/Template_Model_Training.html), [中文版](https://diffsynth-studio-doc.readthedocs.io/zh-cn/latest/Diffusion_Templates/Template_Model_Training.html)
+* Online Demo: [ModelScope Creative Space](https://modelscope.cn/studios/DiffSynth-Studio/Diffusion-Templates)
+* Models: [Collection](https://modelscope.cn/collections/DiffSynth-Studio/KleinBase4B-Templates)
+    * Structure Control: [DiffSynth-Studio/Template-KleinBase4B-ControlNet](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-ControlNet)
+    * Brightness Adjustment: [DiffSynth-Studio/Template-KleinBase4B-Brightness](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Brightness)
+    * Color Adjustment: [DiffSynth-Studio/Template-KleinBase4B-SoftRGB](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-SoftRGB)
+    * Image Editing: [DiffSynth-Studio/Template-KleinBase4B-Edit](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Edit)
+    * Super Resolution: [DiffSynth-Studio/Template-KleinBase4B-Upscaler](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Upscaler)
+    * Sharpness Enhancement: [DiffSynth-Studio/Template-KleinBase4B-Sharpness](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Sharpness)
+    * Aesthetic Alignment: [DiffSynth-Studio/Template-KleinBase4B-Aesthetic](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Aesthetic)
+    * Inpainting: [DiffSynth-Studio/Template-KleinBase4B-Inpaint](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Inpaint)
+    * Content Reference: [DiffSynth-Studio/Template-KleinBase4B-ContentRef](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-ContentRef)
+    * Panda Meme (Easter Egg Model): [DiffSynth-Studio/Template-KleinBase4B-PandaMeme](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-PandaMeme)
+* Datasets: [Collection](https://modelscope.cn/collections/DiffSynth-Studio/ImagePulseV2--shujuji)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Inpaint](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Inpaint)
+    * [DiffSynth-Studio/ImagePulseV2-TextImage](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-TextImage)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Background](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Background)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Clothes](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Clothes)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Pose](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Pose)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Change](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Change)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-AddRemove](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-AddRemove)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Upscale](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Upscale)
+    * [DiffSynth-Studio/ImagePulseV2-TextImage-Human](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-TextImage-Human)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Crop](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Crop)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Light](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Light)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Structure](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Structure)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-HumanFace](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-HumanFace)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Angle](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Angle)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Style](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Style)
+    * [DiffSynth-Studio/ImagePulseV2-TextImage-MultiResolution](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-TextImage-MultiResolution)
+    * [DiffSynth-Studio/ImagePulseV2-Edit-Merge](https://modelscope.cn/datasets/DiffSynth-Studio/ImagePulseV2-Edit-Merge)
+
+## Model Gallery
+
+* Super Resolution + Sharpness Enhancement: Generate ultra-high-clarity images
+
+|Low Resolution Input|High Resolution Output|
+|-|-|
+|![](https://modelscope.cn/datasets/DiffSynth-Studio/examples_in_diffsynth/resolve/master/templates/image_lowres_100.jpg)|![](https://modelscope.cn/datasets/DiffSynth-Studio/examples_in_diffsynth/resolve/master/templates/image_Upscaler_Sharpness.png)|
+
+* Structure Control + Aesthetic Alignment + Sharpness Enhancement: Fully-armed ControlNet
+
+|Structure Control Image|Output Image|
+|-|-|
+|![](https://modelscope.cn/datasets/DiffSynth-Studio/examples_in_diffsynth/resolve/master/templates/image_depth.jpg)|![](https://modelscope.cn/datasets/DiffSynth-Studio/examples_in_diffsynth/resolve/master/templates/image_Controlnet_Aesthetic_Sharpness.png)|
+
+* Structure Control + Image Editing + Color Adjustment: Artistic style creation at will
+
+|Structure Control Image|Editing Input Image|Output Image|
+|-|-|-|
+|![](https://modelscope.cn/datasets/DiffSynth-Studio/examples_in_diffsynth/resolve/master/templates/image_depth.jpg)|![](https://modelscope.cn/datasets/DiffSynth-Studio/examples_in_diffsynth/resolve/master/templates/image_reference.jpg)|![](https://modelscope.cn/datasets/DiffSynth-Studio/examples_in_diffsynth/resolve/master/templates/image_Controlnet_Edit_SoftRGB.png)|
+
+* Brightness Control + Image Editing + Inpainting: Transport elements across dimensions
+
+|Reference Image|Inpaint Region|Output Image|
+|-|-|-|
+|![](https://modelscope.cn/datasets/DiffSynth-Studio/examples_in_diffsynth/resolve/master/templates/image_reference.jpg)|![](https://modelscope.cn/datasets/DiffSynth-Studio/examples_in_diffsynth/resolve/master/templates/image_mask_1.jpg)|![](https://modelscope.cn/datasets/DiffSynth-Studio/examples_in_diffsynth/resolve/master/templates/image_Brightness_Edit_Inpaint.png)|
--- a/docs/en/Diffusion_Templates/Template_Model_Training.md
+++ b/docs/en/Diffusion_Templates/Template_Model_Training.md
@@ -228,9 +228,56 @@ TEMPLATE_MODEL = CustomizedTemplateModel

 Set `--trainable_models template_model.mlp` to train only the MLP component.

+### Training on Low VRAM Devices
+
+The framework supports splitting Template model training into two stages: the first stage performs gradient-free computation, and the second stage performs gradient updates. For more information, refer to the documentation: [Two-stage Split Training](https://diffsynth-studio-doc.readthedocs.io/en/latest/Training/Split_Training.html). Here's a sample script:
+
+```shell
+modelscope download --dataset DiffSynth-Studio/diffsynth_example_dataset --include "flux2/Template-KleinBase4B-Brightness/*" --local_dir ./data/diffsynth_example_dataset
+
+accelerate launch examples/flux2/model_training/train.py \
+  --dataset_base_path data/diffsynth_example_dataset/flux2/Template-KleinBase4B-Brightness \
+  --dataset_metadata_path data/diffsynth_example_dataset/flux2/Template-KleinBase4B-Brightness/metadata.jsonl \
+  --extra_inputs "template_inputs" \
+  --max_pixels 1048576 \
+  --dataset_repeat 1 \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.2-klein-4B:text_encoder/*.safetensors,black-forest-labs/FLUX.2-klein-4B:vae/diffusion_pytorch_model.safetensors" \
+  --template_model_id_or_path "DiffSynth-Studio/Template-KleinBase4B-Brightness:" \
+  --tokenizer_path "black-forest-labs/FLUX.2-klein-4B:tokenizer/" \
+  --learning_rate 1e-4 \
+  --num_epochs 2 \
+  --remove_prefix_in_ckpt "pipe.template_model." \
+  --output_path "./models/train/Template-KleinBase4B-Brightness_full_cache" \
+  --trainable_models "template_model" \
+  --use_gradient_checkpointing \
+  --find_unused_parameters \
+  --task "sft:data_process"
+
+accelerate launch examples/flux2/model_training/train.py \
+  --dataset_base_path "./models/train/Template-KleinBase4B-Brightness_full_cache" \
+  --extra_inputs "template_inputs" \
+  --max_pixels 1048576 \
+  --dataset_repeat 50 \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.2-klein-base-4B:transformer/*.safetensors" \
+  --template_model_id_or_path "DiffSynth-Studio/Template-KleinBase4B-Brightness:" \
+  --tokenizer_path "black-forest-labs/FLUX.2-klein-4B:tokenizer/" \
+  --learning_rate 1e-4 \
+  --num_epochs 2 \
+  --remove_prefix_in_ckpt "pipe.template_model." \
+  --output_path "./models/train/Template-KleinBase4B-Brightness_full" \
+  --trainable_models "template_model" \
+  --use_gradient_checkpointing \
+  --find_unused_parameters \
+  --task "sft:train"
+```
+
+Two-stage split training can reduce VRAM requirements and improve training speed. The training process is lossless in precision, but requires significant disk space for storing cache files.
+
+To further reduce VRAM requirements, you can enable fp8 precision by adding the parameters `--fp8_models "black-forest-labs/FLUX.2-klein-4B:text_encoder/*.safetensors,black-forest-labs/FLUX.2-klein-4B:vae/diffusion_pytorch_model.safetensors"` and `--fp8_models "black-forest-labs/FLUX.2-klein-base-4B:transformer/*.safetensors"` to the two-stage training. Note that fp8 precision can only be enabled on non-trainable model components and introduces minor errors.
+
 ### Uploading Template Models

-After training, follow these steps to upload to ModelScope:
+After training, follow these steps to upload Template models to ModelScope for wider distribution.

 1. Set model path in `model.py`:
 ```python
--- a/docs/en/Diffusion_Templates/Understanding_Diffusion_Templates.md
+++ b/docs/en/Diffusion_Templates/Understanding_Diffusion_Templates.md
@@ -1,4 +1,4 @@
-# Understanding Diffusion Templates
+# Diffusion Templates Architecture Details

 The Diffusion Templates framework is a controllable generation plugin framework in DiffSynth-Studio that provides additional controllable generation capabilities for Diffusion models.

--- a/docs/en/Model_Details/FLUX2.md
+++ b/docs/en/Model_Details/FLUX2.md
@@ -75,6 +75,7 @@ image.save("image.jpg")
 |[DiffSynth-Studio/Template-KleinBase4B-Sharpness](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Sharpness)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-Sharpness.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Sharpness.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-Sharpness.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-Sharpness.py)|-|-|
 |[DiffSynth-Studio/Template-KleinBase4B-SoftRGB](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-SoftRGB)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-SoftRGB.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-SoftRGB.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-SoftRGB.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-SoftRGB.py)|-|-|
 |[DiffSynth-Studio/Template-KleinBase4B-Upscaler](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Upscaler)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-Upscaler.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Upscaler.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-Upscaler.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-Upscaler.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-ContentRef](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-ContentRef)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-ContentRef.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-ContentRef.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-ContentRef.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-ContentRef.py)|-|-|

 Special Training Scripts:

--- a/docs/en/README.md
+++ b/docs/en/README.md
@@ -82,7 +82,8 @@ This section introduces the independent core module `diffsynth.core` in `DiffSyn

 This section introduces the controllable generation plugin framework for Diffusion models, explaining the framework's operation mechanism and how to use Template models for inference and training.

-* [Understanding Diffusion Templates](./Diffusion_Templates/Understanding_Diffusion_Templates.md)
+* [Introducing Diffusion Templates](./Diffusion_Templates/Introducing_Diffusion_Templates.md)
+* [Diffusion Templates Architecture Details](./Diffusion_Templates/Understanding_Diffusion_Templates.md)
 * [Template Model Inference](./Diffusion_Templates/Template_Model_Inference.md)
 * [Template Model Training](./Diffusion_Templates/Template_Model_Training.md)

--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -64,6 +64,7 @@ Welcome to DiffSynth-Studio's Documentation
   :maxdepth: 2
   :caption: Diffusion Templates

+   Diffusion_Templates/Introducing_Diffusion_Templates.md
   Diffusion_Templates/Understanding_Diffusion_Templates.md
   Diffusion_Templates/Template_Model_Inference.md
   Diffusion_Templates/Template_Model_Training.md