mirror of
https://github.com/modelscope/DiffSynth-Studio.git
synced 2026-04-16 23:38:19 +00:00
update docs
This commit is contained in:
330
docs/zh/Diffusion_Templates/Template_Model_Inference.md
Normal file
330
docs/zh/Diffusion_Templates/Template_Model_Inference.md
Normal file
@@ -0,0 +1,330 @@
|
||||
# Template 模型推理
|
||||
|
||||
## 在基础模型 Pipeline 上启用 Template 模型
|
||||
|
||||
我们以基础模型 [black-forest-labs/FLUX.2-klein-base-4B](https://modelscope.cn/models/black-forest-labs/FLUX.2-klein-base-4B) 为例,当仅使用基础模型生成图像时
|
||||
|
||||
```python
|
||||
from diffsynth.diffusion.template import TemplatePipeline
|
||||
from diffsynth.pipelines.flux2_image import Flux2ImagePipeline, ModelConfig
|
||||
import torch
|
||||
|
||||
# Load base model
|
||||
pipe = Flux2ImagePipeline.from_pretrained(
|
||||
torch_dtype=torch.bfloat16,
|
||||
device="cuda",
|
||||
model_configs=[
|
||||
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="text_encoder/*.safetensors"),
|
||||
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-base-4B", origin_file_pattern="transformer/*.safetensors"),
|
||||
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
|
||||
],
|
||||
tokenizer_config=ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="tokenizer/"),
|
||||
)
|
||||
# Generate an image
|
||||
image = pipe(
|
||||
prompt="a cat",
|
||||
seed=0, cfg_scale=4,
|
||||
height=1024, width=1024,
|
||||
)
|
||||
image.save("image.png")
|
||||
```
|
||||
|
||||
Template 模型 [DiffSynth-Studio/F2KB4B-Template-Brightness](https://modelscope.cn/models/DiffSynth-Studio/F2KB4B-Template-Brightness) 可以控制模型生成图像的亮度。通过 `TemplatePipeline` 模型,可从魔搭模型库加载(`ModelConfig(model_id="xxx/xxx")`)或从本地路径加载(`ModelConfig(path="xxx")`)。输入 scale=0.8 提高图像的亮度。注意在代码中,需将 `pipe` 的输入参数转移到 `template_pipeline` 中,并添加 `template_inputs`。
|
||||
|
||||
```python
|
||||
# Load Template model
|
||||
template_pipeline = TemplatePipeline.from_pretrained(
|
||||
torch_dtype=torch.bfloat16,
|
||||
device="cuda",
|
||||
model_configs=[
|
||||
ModelConfig(model_id="DiffSynth-Studio/F2KB4B-Template-Brightness")
|
||||
],
|
||||
)
|
||||
# Generate an image
|
||||
image = template_pipeline(
|
||||
pipe,
|
||||
prompt="a cat",
|
||||
seed=0, cfg_scale=4,
|
||||
height=1024, width=1024,
|
||||
template_inputs=[{"scale": 0.8}],
|
||||
)
|
||||
image.save("image_0.8.png")
|
||||
```
|
||||
|
||||
## Template 模型的 CFG 增强
|
||||
|
||||
Template 模型可以开启 CFG(Classifier-Free Guidance),使其控制效果更明显。例如模型 [DiffSynth-Studio/F2KB4B-Template-Brightness](https://modelscope.cn/models/DiffSynth-Studio/F2KB4B-Template-Brightness),在 `TemplatePipeline` 的输入参数中添加 `negative_template_inputs` 并将其 scale 设置为 0.5,模型就会对比两侧的差异,生成亮度变化更明显的图像。
|
||||
|
||||
```python
|
||||
# Generate an image with CFG
|
||||
image = template_pipeline(
|
||||
pipe,
|
||||
prompt="a cat",
|
||||
seed=0, cfg_scale=4,
|
||||
height=1024, width=1024,
|
||||
template_inputs=[{"scale": 0.8}],
|
||||
negative_template_inputs=[{"scale": 0.5}],
|
||||
)
|
||||
image.save("image_0.8_cfg.png")
|
||||
```
|
||||
|
||||
## 低显存支持
|
||||
|
||||
Template 模型暂不支持主框架的显存管理,但可以使用惰性加载,仅在需要推理时加载对应的 Template 模型,这在启用多个 Template 模型时可以显著降低显存需求,显存占用峰值为单个 Template 模型的显存占用量。添加参数 `lazy_loading=True` 即可。
|
||||
|
||||
```python
|
||||
template_pipeline = TemplatePipeline.from_pretrained(
|
||||
torch_dtype=torch.bfloat16,
|
||||
device="cuda",
|
||||
model_configs=[
|
||||
ModelConfig(model_id="DiffSynth-Studio/F2KB4B-Template-Brightness")
|
||||
],
|
||||
lazy_loading=True,
|
||||
)
|
||||
```
|
||||
|
||||
基础模型的 Pipeline 与 Template Pipeline 完全独立,可按需开启显存管理。
|
||||
|
||||
当 Template 模型输出的 Template Cache 包含 LoRA 时,需对基础模型的 Pipeline 开启显存管理或开启 LoRA 热加载(使用以下代码),否则会导致 LoRA 权重叠加。
|
||||
|
||||
```python
|
||||
pipe.dit = pipe.enable_lora_hot_loading(pipe.dit)
|
||||
```
|
||||
|
||||
## 启用多个 Template 模型
|
||||
|
||||
`TemplatePipeline` 可以加载多个 Template 模型,推理时在 `template_inputs` 中使用 `model_id` 区分每个 Template 模型的输入。
|
||||
|
||||
对基础模型 Pipeline 存管理,对 Template Pipeline 开启惰性加载后,你可以加载任意多个 Template 模型。
|
||||
|
||||
```python
|
||||
from diffsynth.diffusion.template import TemplatePipeline
|
||||
from diffsynth.pipelines.flux2_image import Flux2ImagePipeline, ModelConfig
|
||||
import torch
|
||||
from PIL import Image
|
||||
|
||||
vram_config = {
|
||||
"offload_dtype": "disk",
|
||||
"offload_device": "disk",
|
||||
"onload_dtype": torch.bfloat16,
|
||||
"onload_device": "cuda",
|
||||
"preparing_dtype": torch.bfloat16,
|
||||
"preparing_device": "cuda",
|
||||
"computation_dtype": torch.bfloat16,
|
||||
"computation_device": "cuda",
|
||||
}
|
||||
pipe = Flux2ImagePipeline.from_pretrained(
|
||||
torch_dtype=torch.bfloat16,
|
||||
device="cuda",
|
||||
model_configs=[
|
||||
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-base-4B", origin_file_pattern="transformer/*.safetensors", **vram_config),
|
||||
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="text_encoder/*.safetensors", **vram_config),
|
||||
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
|
||||
],
|
||||
tokenizer_config=ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="tokenizer/"),
|
||||
)
|
||||
pipe.dit = pipe.enable_lora_hot_loading(pipe.dit)
|
||||
template = TemplatePipeline.from_pretrained(
|
||||
torch_dtype=torch.bfloat16,
|
||||
device="cuda",
|
||||
lazy_loading=True,
|
||||
model_configs=[
|
||||
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Brightness"),
|
||||
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-ControlNet"),
|
||||
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Edit"),
|
||||
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Upscaler"),
|
||||
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-SoftRGB"),
|
||||
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Sharpness"),
|
||||
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Inpaint"),
|
||||
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-Aesthetic"),
|
||||
ModelConfig(model_id="DiffSynth-Studio/Template-KleinBase4B-PandaMeme"),
|
||||
],
|
||||
)
|
||||
```
|
||||
|
||||
### 超分辨率 + 锐利激发
|
||||
|
||||
组合 [DiffSynth-Studio/Template-KleinBase4B-Upscaler](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Upscaler) 和 [DiffSynth-Studio/Template-KleinBase4B-Sharpness](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Sharpness),可以将模糊图片高清化,同时提高细节部分的清晰度。
|
||||
|
||||
```python
|
||||
image = template(
|
||||
pipe,
|
||||
prompt="A cat is sitting on a stone.",
|
||||
seed=0, cfg_scale=4, num_inference_steps=50,
|
||||
template_inputs = [
|
||||
{
|
||||
"model_id": 3,
|
||||
"image": Image.open("data/assets/image_lowres_100.jpg"),
|
||||
"prompt": "A cat is sitting on a stone.",
|
||||
},
|
||||
{
|
||||
"model_id": 5,
|
||||
"scale": 1,
|
||||
},
|
||||
],
|
||||
negative_template_inputs = [
|
||||
{
|
||||
"model_id": 3,
|
||||
"image": Image.open("data/assets/image_lowres_100.jpg"),
|
||||
"prompt": "",
|
||||
},
|
||||
{
|
||||
"model_id": 5,
|
||||
"scale": 0,
|
||||
},
|
||||
],
|
||||
)
|
||||
image.save("image_Upscaler_Sharpness.png")
|
||||
```
|
||||
|
||||
|低清晰度输入|高清晰度输出|
|
||||
|-|-|
|
||||
|||
|
||||
|
||||
### 结构控制 + 美学对齐 + 锐利激发
|
||||
|
||||
[DiffSynth-Studio/Template-KleinBase4B-ControlNet](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-ControlNet) 负责控制构图,[DiffSynth-Studio/Template-KleinBase4B-Aesthetic](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Aesthetic) 负责填充细节,[DiffSynth-Studio/Template-KleinBase4B-Sharpness](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Sharpness) 负责保证清晰度,融合三个 Template 模型可以获得精美的画面。
|
||||
|
||||
```python
|
||||
image = template(
|
||||
pipe,
|
||||
prompt="A cat is sitting on a stone, bathed in bright sunshine.",
|
||||
seed=0, cfg_scale=4, num_inference_steps=50,
|
||||
template_inputs = [
|
||||
{
|
||||
"model_id": 1,
|
||||
"image": Image.open("data/assets/image_depth.jpg"),
|
||||
"prompt": "A cat is sitting on a stone, bathed in bright sunshine.",
|
||||
},
|
||||
{
|
||||
"model_id": 7,
|
||||
"lora_ids": list(range(1, 180, 2)),
|
||||
"lora_scales": 2.0,
|
||||
"merge_type": "mean",
|
||||
},
|
||||
{
|
||||
"model_id": 5,
|
||||
"scale": 0.8,
|
||||
},
|
||||
],
|
||||
negative_template_inputs = [
|
||||
{
|
||||
"model_id": 1,
|
||||
"image": Image.open("data/assets/image_depth.jpg"),
|
||||
"prompt": "",
|
||||
},
|
||||
{
|
||||
"model_id": 7,
|
||||
"lora_ids": list(range(1, 180, 2)),
|
||||
"lora_scales": 2.0,
|
||||
"merge_type": "mean",
|
||||
},
|
||||
{
|
||||
"model_id": 5,
|
||||
"scale": 0,
|
||||
},
|
||||
],
|
||||
)
|
||||
image.save("image_Controlnet_Aesthetic_Sharpness.png")
|
||||
```
|
||||
|
||||
|结构控制图|输出图|
|
||||
|-|-|
|
||||
|||
|
||||
|
||||
### 结构控制 + 图像编辑 + 色彩调节
|
||||
|
||||
[DiffSynth-Studio/Template-KleinBase4B-ControlNet](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-ControlNet) 负责控制构图,[DiffSynth-Studio/Template-KleinBase4B-Edit](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Edit) 负责保留原图的毛发纹理等细节,[DiffSynth-Studio/Template-KleinBase4B-SoftRGB](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-SoftRGB) 负责控制画面色调,一副极具艺术感的画作被渲染出来。
|
||||
|
||||
```python
|
||||
image = template(
|
||||
pipe,
|
||||
prompt="A cat is sitting on a stone. Colored ink painting.",
|
||||
seed=0, cfg_scale=4, num_inference_steps=50,
|
||||
template_inputs = [
|
||||
{
|
||||
"model_id": 1,
|
||||
"image": Image.open("data/assets/image_depth.jpg"),
|
||||
"prompt": "A cat is sitting on a stone. Colored ink painting.",
|
||||
},
|
||||
{
|
||||
"model_id": 2,
|
||||
"image": Image.open("data/assets/image_reference.jpg"),
|
||||
"prompt": "Convert the image style to colored ink painting.",
|
||||
},
|
||||
{
|
||||
"model_id": 4,
|
||||
"R": 0.9,
|
||||
"G": 0.5,
|
||||
"B": 0.3,
|
||||
},
|
||||
],
|
||||
negative_template_inputs = [
|
||||
{
|
||||
"model_id": 1,
|
||||
"image": Image.open("data/assets/image_depth.jpg"),
|
||||
"prompt": "",
|
||||
},
|
||||
{
|
||||
"model_id": 2,
|
||||
"image": Image.open("data/assets/image_reference.jpg"),
|
||||
"prompt": "",
|
||||
},
|
||||
],
|
||||
)
|
||||
image.save("image_Controlnet_Edit_SoftRGB.png")
|
||||
```
|
||||
|
||||
|结构控制图|编辑输入图|输出图|
|
||||
|-|-|-|
|
||||
||||
|
||||
|
||||
### 亮度控制 + 图像编辑 + 局部重绘
|
||||
|
||||
[DiffSynth-Studio/Template-KleinBase4B-Brightness](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Brightness) 负责生成明亮的画面,[DiffSynth-Studio/Template-KleinBase4B-Edit](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Edit) 负责参考原图布局,[DiffSynth-Studio/Template-KleinBase4B-Inpaint](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Inpaint) 负责控制背景不变,生成跨越二次元的画面内容。
|
||||
|
||||
```python
|
||||
image = template(
|
||||
pipe,
|
||||
prompt="A cat is sitting on a stone. Flat anime style.",
|
||||
seed=0, cfg_scale=4, num_inference_steps=50,
|
||||
template_inputs = [
|
||||
{
|
||||
"model_id": 0,
|
||||
"scale": 0.6,
|
||||
},
|
||||
{
|
||||
"model_id": 2,
|
||||
"image": Image.open("data/assets/image_reference.jpg"),
|
||||
"prompt": "Convert the image style to flat anime style.",
|
||||
},
|
||||
{
|
||||
"model_id": 6,
|
||||
"image": Image.open("data/assets/image_reference.jpg"),
|
||||
"mask": Image.open("data/assets/image_mask_1.jpg"),
|
||||
"force_inpaint": True,
|
||||
},
|
||||
],
|
||||
negative_template_inputs = [
|
||||
{
|
||||
"model_id": 0,
|
||||
"scale": 0.5,
|
||||
},
|
||||
{
|
||||
"model_id": 2,
|
||||
"image": Image.open("data/assets/image_reference.jpg"),
|
||||
"prompt": "",
|
||||
},
|
||||
{
|
||||
"model_id": 6,
|
||||
"image": Image.open("data/assets/image_reference.jpg"),
|
||||
"mask": Image.open("data/assets/image_mask_1.jpg"),
|
||||
},
|
||||
],
|
||||
)
|
||||
image.save("image_Brightness_Edit_Inpaint.png")
|
||||
```
|
||||
|
||||
|参考图|重绘区域|输出图|
|
||||
|-|-|-|
|
||||
||||
|
||||
317
docs/zh/Diffusion_Templates/Template_Model_Training.md
Normal file
317
docs/zh/Diffusion_Templates/Template_Model_Training.md
Normal file
@@ -0,0 +1,317 @@
|
||||
# Template 模型训练
|
||||
|
||||
DiffSynth-Studio 目前已为 [black-forest-labs/FLUX.2-klein-base-4B](https://www.modelscope.cn/models/black-forest-labs/FLUX.2-klein-base-4B) 提供了全面的 Templates 训练支持,更多模型的适配敬请期待。
|
||||
|
||||
## 基于预训练 Template 模型继续训练
|
||||
|
||||
如需基于我们预训练好的模型进行继续训练,请参考[FLUX.2](../Model_Details/FLUX2.md#模型总览) 中的表格,找到对应的训练脚本。
|
||||
|
||||
## 构建新的 Template 模型
|
||||
|
||||
### Template 模型组件格式
|
||||
|
||||
一个 Template 模型与一个模型库(或一个本地文件夹)绑定,模型库中有代码文件 `model.py` 作为唯一入口。`model.py` 的模板如下:
|
||||
|
||||
```python
|
||||
import torch
|
||||
|
||||
class CustomizedTemplateModel(torch.nn.Module):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
||||
@torch.no_grad()
|
||||
def process_inputs(self, xxx, **kwargs):
|
||||
yyy = xxx
|
||||
return {"yyy": yyy}
|
||||
|
||||
def forward(self, yyy, **kwargs):
|
||||
zzz = yyy
|
||||
return {"zzz": zzz}
|
||||
|
||||
class DataProcessor:
|
||||
def __call__(self, www, **kwargs):
|
||||
xxx = www
|
||||
return {"xxx": xxx}
|
||||
|
||||
TEMPLATE_MODEL = CustomizedTemplateModel
|
||||
TEMPLATE_MODEL_PATH = "model.safetensors"
|
||||
TEMPLATE_DATA_PROCESSOR = DataProcessor
|
||||
```
|
||||
|
||||
在 Template 模型推理时,Template Input 先后经过 `TEMPLATE_MODEL` 的 `process_inputs` 和 `forward` 得到 Template Cache。
|
||||
|
||||
```mermaid
|
||||
flowchart LR;
|
||||
i@{shape: text, label: "Template Input"}-->p[process_inputs];
|
||||
subgraph TEMPLATE_MODEL
|
||||
p[process_inputs]-->f[forward]
|
||||
end
|
||||
f[forward]-->c@{shape: text, label: "Template Cache"};
|
||||
```
|
||||
|
||||
在 Template 模型训练时,Template Input 不再是用户的输入,而是从数据集中获取,由 `TEMPLATE_DATA_PROCESSOR` 进行计算得到。
|
||||
|
||||
```mermaid
|
||||
flowchart LR;
|
||||
d@{shape: text, label: "Dataset"}-->dp[TEMPLATE_DATA_PROCESSOR]-->p[process_inputs];
|
||||
subgraph TEMPLATE_MODEL
|
||||
p[process_inputs]-->f[forward]
|
||||
end
|
||||
f[forward]-->c@{shape: text, label: "Template Cache"};
|
||||
```
|
||||
|
||||
#### `TEMPLATE_MODEL`
|
||||
|
||||
`TEMPLATE_MODEL` 是 Template 模型的代码实现,需继承 `torch.nn.Module`,并编写 `process_inputs` 与 `forward` 两个函数。`process_inputs` 与 `forward` 构成完整的 Template 模型推理过程,我们将其拆分为两部分,是为了在训练中更容易适配[两阶段拆分训练](https://diffsynth-studio-doc.readthedocs.io/zh-cn/latest/Training/Split_Training.html)。
|
||||
|
||||
* `process_inputs` 需带有装饰器 `@torch.no_grad()`,进行不包含梯度的计算
|
||||
* `forward` 需包含训练模型所需的全部梯度计算过程,其输入与 `process_inputs` 的输出相同
|
||||
|
||||
`process_inputs` 与 `forward` 需包含 `**kwargs`,保证兼容性,此外,我们提供了以下预留的参数
|
||||
|
||||
* 如需在 `process_inputs` 与 `forward` 中和基础模型 Pipeline 进行交互,例如调用基础模型 Pipeline 中的文本编码器进行计算,可在 `process_inputs` 与 `forward` 的输入参数中增加字段 `pipe`
|
||||
* 如需在训练中启用 Gradient Checkpointing,可在 `forward` 的输入参数中增加字段 `use_gradient_checkpointing` 与 `use_gradient_checkpointing_offload`
|
||||
* 多个 Template 模型需通过 `model_id` 区分 Template Inputs,请不要在 `process_inputs` 与 `forward` 的输入参数中使用这个字段
|
||||
|
||||
#### `TEMPLATE_MODEL_PATH`(可选项)
|
||||
|
||||
`TEMPLATE_MODEL_PATH` 是模型预训练权重文件的相对路径,例如
|
||||
|
||||
```python
|
||||
TEMPLATE_MODEL_PATH = "model.safetensors"
|
||||
```
|
||||
|
||||
如需从多个模型文件中加载,可使用列表
|
||||
|
||||
```python
|
||||
TEMPLATE_MODEL_PATH = [
|
||||
"model-00001-of-00003.safetensors",
|
||||
"model-00002-of-00003.safetensors",
|
||||
"model-00003-of-00003.safetensors",
|
||||
]
|
||||
```
|
||||
|
||||
如果需要随机初始化模型参数(模型还未训练),或不需要初始化模型参数,可将其设置为 `None`,或不设置
|
||||
|
||||
```python
|
||||
TEMPLATE_MODEL_PATH = None
|
||||
```
|
||||
|
||||
#### `TEMPLATE_DATA_PROCESSOR`(可选项)
|
||||
|
||||
如需使用 DiffSynth-Studio 训练 Template 模型,则需构建训练数据集,数据集中的 `metadata.json` 包含 `template_inputs` 字段。`metadata.json` 中的 `template_inputs` 并不是直接输入给 Template 模型 `process_inputs` 的参数,而是提供给 `TEMPLATE_DATA_PROCESSOR` 的输入参数,由 `TEMPLATE_DATA_PROCESSOR` 计算出输入给 Template 模型 `process_inputs` 的参数。
|
||||
|
||||
例如,[DiffSynth-Studio/F2KB4B-Template-Brightness](https://modelscope.cn/models/DiffSynth-Studio/F2KB4B-Template-Brightness) 这一亮度控制模型的输入参数是 `scale`,即图像的亮度数值。`scale` 可以直接写在 `metadata.json` 中,此时 `TEMPLATE_DATA_PROCESSOR` 只需要传递参数:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"image": "images/image_1.jpg",
|
||||
"prompt": "a cat",
|
||||
"template_inputs": {"scale": 0.2}
|
||||
},
|
||||
{
|
||||
"image": "images/image_2.jpg",
|
||||
"prompt": "a dog",
|
||||
"template_inputs": {"scale": 0.6}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
```python
|
||||
class DataProcessor:
|
||||
def __call__(self, scale, **kwargs):
|
||||
return {"scale": scale}
|
||||
|
||||
TEMPLATE_DATA_PROCESSOR = DataProcessor
|
||||
```
|
||||
|
||||
也可在 `metadata.json` 中填写图像路径,直接在训练过程中计算 `scale`。
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"image": "images/image_1.jpg",
|
||||
"prompt": "a cat",
|
||||
"template_inputs": {"image": "/path/to/your/dataset/images/image_1.jpg"}
|
||||
},
|
||||
{
|
||||
"image": "images/image_2.jpg",
|
||||
"prompt": "a dog",
|
||||
"template_inputs": {"image": "/path/to/your/dataset/images/image_1.jpg"}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
```python
|
||||
class DataProcessor:
|
||||
def __call__(self, image, **kwargs):
|
||||
image = Image.open(image)
|
||||
image = np.array(image)
|
||||
return {"scale": image.astype(np.float32).mean() / 255}
|
||||
|
||||
TEMPLATE_DATA_PROCESSOR = DataProcessor
|
||||
```
|
||||
|
||||
### 训练 Template 模型
|
||||
|
||||
Template 模型“可训练”的充分条件是:Template Cache 中的变量计算与基础模型 Pipeline 完全解耦,这些变量在推理过程中输入给基础模型 Pipeline 后,不会参与任何 Pipeline Unit 的计算,直达 `model_fn`。
|
||||
|
||||
如果 Template 模型是“可训练”的,那么可以使用 DiffSynth-Studio 进行训练,以基础模型 [black-forest-labs/FLUX.2-klein-base-4B](https://www.modelscope.cn/models/black-forest-labs/FLUX.2-klein-base-4B) 为例,在训练脚本中,填写字段:
|
||||
|
||||
* `--extra_inputs`:额外输入,训练文生图模型的 Template 模型时只需填 `template_inputs`,训练图像编辑模型的 Template 模型时需填 `edit_image,template_inputs`
|
||||
* `--template_model_id_or_path`:Template 模型的魔搭模型 ID 或本地路径,框架会优先匹配本地路径,若本地路径不存在则从魔搭模型库中下载该模型,填写模型 ID 时,以“:”结尾,例如 `"DiffSynth-Studio/Template-KleinBase4B-Brightness:"`
|
||||
* `--remove_prefix_in_ckpt`:保存模型文件时,移除的 state dict 变量名前缀,填 `"pipe.template_model."` 即可
|
||||
* `--trainable_models`:可训练模型,填写 `template_model` 即可,若只需训练其中的某个组件,则需填写 `template_model.xxx,template_model.yyy`,以逗号分隔
|
||||
|
||||
以下是一个样例训练脚本,它会自动下载一个样例数据集,随机初始化模型权重后开始训练亮度控制模型:
|
||||
|
||||
```shell
|
||||
modelscope download --dataset DiffSynth-Studio/diffsynth_example_dataset --include "flux2/Template-KleinBase4B-Brightness/*" --local_dir ./data/diffsynth_example_dataset
|
||||
|
||||
accelerate launch examples/flux2/model_training/train.py \
|
||||
--dataset_base_path data/diffsynth_example_dataset/flux2/Template-KleinBase4B-Brightness \
|
||||
--dataset_metadata_path data/diffsynth_example_dataset/flux2/Template-KleinBase4B-Brightness/metadata.jsonl \
|
||||
--extra_inputs "template_inputs" \
|
||||
--max_pixels 1048576 \
|
||||
--dataset_repeat 50 \
|
||||
--model_id_with_origin_paths "black-forest-labs/FLUX.2-klein-4B:text_encoder/*.safetensors,black-forest-labs/FLUX.2-klein-base-4B:transformer/*.safetensors,black-forest-labs/FLUX.2-klein-4B:vae/diffusion_pytorch_model.safetensors" \
|
||||
--template_model_id_or_path "examples/flux2/model_training/scripts/brightness" \
|
||||
--tokenizer_path "black-forest-labs/FLUX.2-klein-4B:tokenizer/" \
|
||||
--learning_rate 1e-4 \
|
||||
--num_epochs 2 \
|
||||
--remove_prefix_in_ckpt "pipe.template_model." \
|
||||
--output_path "./models/train/Template-KleinBase4B-Brightness_example" \
|
||||
--trainable_models "template_model" \
|
||||
--use_gradient_checkpointing \
|
||||
--find_unused_parameters
|
||||
```
|
||||
|
||||
### 与基础模型 Pipeline 组件交互
|
||||
|
||||
Diffusion Template 框架允许 Template 模型与基础模型 Pipeline 进行交互。例如,你可能需要使用基础模型 Pipeline 中的 text encoder 对文本进行编码,此时在 `process_inputs` 和 `forward` 中使用预留字段 `pipe` 即可。
|
||||
|
||||
```python
|
||||
import torch
|
||||
|
||||
class CustomizedTemplateModel(torch.nn.Module):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.xxx = xxx()
|
||||
|
||||
@torch.no_grad()
|
||||
def process_inputs(self, text, pipe, **kwargs):
|
||||
input_ids = pipe.tokenizer(text)
|
||||
text_emb = pipe.text_encoder(text_emb)
|
||||
return {"text_emb": text_emb}
|
||||
|
||||
def forward(self, text_emb, pipe, **kwargs):
|
||||
kv_cache = self.xxx(text_emb)
|
||||
return {"kv_cache": kv_cache}
|
||||
|
||||
TEMPLATE_MODEL = CustomizedTemplateModel
|
||||
```
|
||||
|
||||
### 使用非训练的模型组件
|
||||
|
||||
在设计 Template 模型时,如果需要使用预训练的模型且不希望在训练过程中更新这部分参数,例如
|
||||
|
||||
```python
|
||||
import torch
|
||||
|
||||
class CustomizedTemplateModel(torch.nn.Module):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.image_encoder = XXXEncoder.from_pretrained(xxx)
|
||||
self.mlp = MLP()
|
||||
|
||||
@torch.no_grad()
|
||||
def process_inputs(self, image, **kwargs):
|
||||
emb = self.image_encoder(image)
|
||||
return {"emb": emb}
|
||||
|
||||
def forward(self, emb, **kwargs):
|
||||
kv_cache = self.mlp(emb)
|
||||
return {"kv_cache": kv_cache}
|
||||
|
||||
TEMPLATE_MODEL = CustomizedTemplateModel
|
||||
```
|
||||
|
||||
此时需在训练命令中通过参数 `--trainable_models template_model.mlp` 设置为仅训练 `mlp` 部分。
|
||||
|
||||
### 上传 Template 模型
|
||||
|
||||
完成训练后,按照以下步骤可上传 Template 模型到魔搭社区
|
||||
|
||||
Step 1:在 `model.py` 中填入训练好的模型文件名,例如
|
||||
|
||||
```python
|
||||
TEMPLATE_MODEL_PATH = "model.safetensors"
|
||||
```
|
||||
|
||||
Step 2:使用以下命令上传 `model.py`,其中 `--token ms-xxx` 在 https://modelscope.cn/my/access/token 获取
|
||||
|
||||
```shell
|
||||
modelscope upload user_name/your_model_id /path/to/your/model.py model.py --token ms-xxx
|
||||
```
|
||||
|
||||
Step 3:确认模型文件
|
||||
|
||||
确认要上传的模型文件,例如 `epoch-1.safetensors`、`step-2000.safetensors`。
|
||||
|
||||
注意,DiffSynth-Studio 保存的模型文件中只包含可训练的参数,如果模型中包括非训练参数,则需要重新将非训练的模型参数打包才能进行推理,你可以通过以下代码进行打包:
|
||||
|
||||
```python
|
||||
from diffsynth.diffusion.template import load_template_model, load_state_dict
|
||||
from safetensors.torch import save_file
|
||||
import torch
|
||||
|
||||
model = load_template_model("path/to/your/template/model", torch_dtype=torch.bfloat16, device="cpu")
|
||||
state_dict = load_state_dict("path/to/your/ckpt/epoch-1.safetensors", torch_dtype=torch.bfloat16, device="cpu")
|
||||
state_dict.update(model.state_dict())
|
||||
save_file(state_dict, "model.safetensors")
|
||||
```
|
||||
|
||||
Step 4:上传模型文件
|
||||
|
||||
```shell
|
||||
modelscope upload user_name/your_model_id /path/to/your/model/epoch-1.safetensors model.safetensors --token ms-xxx
|
||||
```
|
||||
|
||||
Step 5:验证模型推理效果
|
||||
|
||||
```python
|
||||
from diffsynth.diffusion.template import TemplatePipeline
|
||||
from diffsynth.pipelines.flux2_image import Flux2ImagePipeline, ModelConfig
|
||||
import torch
|
||||
|
||||
# Load base model
|
||||
pipe = Flux2ImagePipeline.from_pretrained(
|
||||
torch_dtype=torch.bfloat16,
|
||||
device="cuda",
|
||||
model_configs=[
|
||||
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="text_encoder/*.safetensors"),
|
||||
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-base-4B", origin_file_pattern="transformer/*.safetensors"),
|
||||
ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
|
||||
],
|
||||
tokenizer_config=ModelConfig(model_id="black-forest-labs/FLUX.2-klein-4B", origin_file_pattern="tokenizer/"),
|
||||
)
|
||||
# Load Template model
|
||||
template_pipeline = TemplatePipeline.from_pretrained(
|
||||
torch_dtype=torch.bfloat16,
|
||||
device="cuda",
|
||||
model_configs=[
|
||||
ModelConfig(model_id="user_name/your_model_id")
|
||||
],
|
||||
)
|
||||
# Generate an image
|
||||
image = template_pipeline(
|
||||
pipe,
|
||||
prompt="a cat",
|
||||
seed=0, cfg_scale=4,
|
||||
height=1024, width=1024,
|
||||
template_inputs=[{xxx}],
|
||||
)
|
||||
image.save("image.png")
|
||||
```
|
||||
|
||||
@@ -0,0 +1,61 @@
|
||||
# 理解 Diffusion Templates
|
||||
|
||||
## 框架结构
|
||||
|
||||
Diffusion Templates 框架的结构如下图所示:
|
||||
|
||||
```mermaid
|
||||
flowchart TD;
|
||||
subgraph Template Pipeline
|
||||
si@{shape: text, label: "Template Input"}-->i1@{shape: text, label: "Template Input 1"};
|
||||
si@{shape: text, label: "Template Input"}-->i2@{shape: text, label: "Template Input 2"};
|
||||
si@{shape: text, label: "Template Input"}-->i3@{shape: text, label: "Template Input 3"};
|
||||
i1@{shape: text, label: "Template Input 1"}-->m1[Template Model 1]-->c1@{shape: text, label: "Template Cache 1"};
|
||||
i2@{shape: text, label: "Template Input 2"}-->m2[Template Model 2]-->c2@{shape: text, label: "Template Cache 2"};
|
||||
i3@{shape: text, label: "Template Input 3"}-->m3[Template Model 3]-->c3@{shape: text, label: "Template Cache 3"};
|
||||
c1-->c@{shape: text, label: "Template Cache"};
|
||||
c2-->c;
|
||||
c3-->c;
|
||||
end
|
||||
i@{shape: text, label: "Model Input"}-->m[Diffusion Pipeline]-->o@{shape: text, label: "Model Output"};
|
||||
c-->m;
|
||||
```
|
||||
|
||||
框架包含以下模块设计:
|
||||
|
||||
* Template Input: Template 模型的输入。其格式为 Python 字典,其中的字段由每个 Template 模型自身决定,例如 `{"scale": 0.8}`
|
||||
* Template Model: Template 模型,可从魔搭模型库加载(`ModelConfig(model_id="xxx/xxx")`)或从本地路径加载(`ModelConfig(path="xxx")`)
|
||||
* Template Cache: Template 模型的输出。其格式为 Python 字典,其中的字段仅支持对应基础模型 Pipeline 中的输入参数字段。
|
||||
* Template Pipeline: 用于调度多个 Template 模型的模块。该模块负责加载 Template 模型、整合多个 Template 模型的输出
|
||||
|
||||
当 Diffusion Templates 框架未启用时,基础模型组件(包括 Text Encoder、DiT、VAE 等)被加载到 Diffusion Pipeline 中,输入 Model Input(包括 prompt、height、width 等),输出 Model Output(例如图像)。
|
||||
|
||||
当 Diffusion Templates 框架启用后,若干个 Template 模型被加载到 Template Pipeline 中,Template Pipeline 输出 Template Cache(Diffusion Pipeline 输入参数的子集),并交由 Diffusion Pipeline 进行后续的进一步处理。Template Pipeline 通过接管一部分 Diffusion Pipeline 的输入参数来实现可控生成。
|
||||
|
||||
## 模型能力媒介
|
||||
|
||||
注意到,Template Cache 的格式被定义为 Diffusion Pipeline 输入参数的子集,这是框架通用性设计的基本保证,我们限制 Template 模型的输入只能是 Diffusion Pipeline 的输入参数。因此,我们需要为 Diffusion Pipeline 设计额外的输入参数作为模型能力媒介。其中,KV-Cache 是非常适合 Diffusion 的模型能力媒介
|
||||
|
||||
* 技术路线已经在 LLM Skills 上得到了验证,LLM 中输入的提示词也会被潜在地转化为 KV-Cache
|
||||
* KV-Cache 具有 Diffusion 模型的“高权限”,在生图模型上能够直接影响甚至完全控制生图结果,这保证 Diffusion Template 模型具备足够高的能力上限
|
||||
* KV-Cache 可以直接在序列层面拼接,让多个 Template 模型同时生效
|
||||
* KV-Cache 在框架层面的开发量少,增加一个 Pipeline 的输入参数并穿透到模型内部即可,可以快速适配新的 Diffusion 基础模型
|
||||
|
||||
另外,还有以下媒介也可以用于 Template:
|
||||
|
||||
* Residual:残差,在 ControlNet 中使用较多,适合做点对点的控制,和 KVCache 相比缺点是不能支持任意分辨率以及多个 Residual 融合时可能冲突
|
||||
* LoRA:不要把它当成模型的一部分,而是把它当成模型的输入参数,LoRA 本质上是一系列张量,也可以作为模型能力的媒介
|
||||
|
||||
**目前,我们仅在 FLUX.2 的 Pipeline 上提供了 KV-Cache 和 LoRA 作为 Template Cache 的支持,后续会考虑支持更多模型和更多模型能力媒介。**
|
||||
|
||||
## Template 模型格式
|
||||
|
||||
一个 Template 模型的格式为:
|
||||
|
||||
```
|
||||
Template_Model
|
||||
├── model.py
|
||||
└── model.safetensors
|
||||
```
|
||||
|
||||
其中,`model.py` 是模型的入口,`model.safetensors` 是 Template 模型的权重文件。关于如何构建 Template 模型,请参考文档 [Template 模型训练](Template_Model_Training.md),或参考[现有的 Template 模型](https://modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Brightness)。
|
||||
@@ -66,6 +66,15 @@ image.save("image.jpg")
|
||||
|[black-forest-labs/FLUX.2-klein-9B](https://www.modelscope.cn/models/black-forest-labs/FLUX.2-klein-9B)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/FLUX.2-klein-9B.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/FLUX.2-klein-9B.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/FLUX.2-klein-9B.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/FLUX.2-klein-9B.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/lora/FLUX.2-klein-9B.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_lora/FLUX.2-klein-9B.py)|
|
||||
|[black-forest-labs/FLUX.2-klein-base-4B](https://www.modelscope.cn/models/black-forest-labs/FLUX.2-klein-base-4B)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/FLUX.2-klein-base-4B.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/FLUX.2-klein-base-4B.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/FLUX.2-klein-base-4B.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/FLUX.2-klein-base-4B.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/lora/FLUX.2-klein-base-4B.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_lora/FLUX.2-klein-base-4B.py)|
|
||||
|[black-forest-labs/FLUX.2-klein-base-9B](https://www.modelscope.cn/models/black-forest-labs/FLUX.2-klein-base-9B)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/FLUX.2-klein-base-9B.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/FLUX.2-klein-base-9B.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/FLUX.2-klein-base-9B.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/FLUX.2-klein-base-9B.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/lora/FLUX.2-klein-base-9B.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_lora/FLUX.2-klein-base-9B.py)|
|
||||
|[DiffSynth-Studio/Template-KleinBase4B-Aesthetic](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Aesthetic)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-Aesthetic.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Aesthetic.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-Aesthetic.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-Aesthetic.py)|-|-|
|
||||
|[DiffSynth-Studio/Template-KleinBase4B-Brightness](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Brightness)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-Brightness.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Brightness.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-Brightness.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-Brightness.py)|-|-|
|
||||
|[DiffSynth-Studio/Template-KleinBase4B-ControlNet](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-ControlNet)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-ControlNet.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-ControlNet.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-ControlNet.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-ControlNet.py)|-|-|
|
||||
|[DiffSynth-Studio/Template-KleinBase4B-Edit](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Edit)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-Edit.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Edit.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-Edit.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-Edit.py)|-|-|
|
||||
|[DiffSynth-Studio/Template-KleinBase4B-Inpaint](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Inpaint)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-Inpaint.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Inpaint.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-Inpaint.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-Inpaint.py)|-|-|
|
||||
|[DiffSynth-Studio/Template-KleinBase4B-PandaMeme](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-PandaMeme)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-PandaMeme.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-PandaMeme.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-PandaMeme.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-PandaMeme.py)|-|-|
|
||||
|[DiffSynth-Studio/Template-KleinBase4B-Sharpness](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Sharpness)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-Sharpness.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Sharpness.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-Sharpness.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-Sharpness.py)|-|-|
|
||||
|[DiffSynth-Studio/Template-KleinBase4B-SoftRGB](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-SoftRGB)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-SoftRGB.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-SoftRGB.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-SoftRGB.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-SoftRGB.py)|-|-|
|
||||
|[DiffSynth-Studio/Template-KleinBase4B-Upscaler](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Upscaler)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference/Template-KleinBase4B-Upscaler.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Upscaler.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/full/Template-KleinBase4B-Upscaler.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/flux2/model_training/validate_full/Template-KleinBase4B-Upscaler.py)|-|-|
|
||||
|
||||
特殊训练脚本:
|
||||
|
||||
|
||||
@@ -16,8 +16,9 @@ graph LR;
|
||||
我想要基于此框架进行二次开发-->sec5[Section 5: API 参考];
|
||||
我想要基于本项目探索新的技术-->sec4[Section 4: 模型接入];
|
||||
我想要基于本项目探索新的技术-->sec5[Section 5: API 参考];
|
||||
我想要基于本项目探索新的技术-->sec6[Section 6: 学术导引];
|
||||
我遇到了问题-->sec7[Section 7: 常见问题];
|
||||
我想要基于本项目探索新的技术-->sec6[Section 6: Diffusion Templates]
|
||||
我想要基于本项目探索新的技术-->sec7[Section 7: 学术导引];
|
||||
我遇到了问题-->sec8[Section 8: 常见问题];
|
||||
```
|
||||
|
||||
</details>
|
||||
@@ -75,7 +76,15 @@ graph LR;
|
||||
* [`diffsynth.core.loader`](./API_Reference/core/loader.md): 模型下载与加载
|
||||
* [`diffsynth.core.vram`](./API_Reference/core/vram.md): 显存管理
|
||||
|
||||
## Section 6: 学术导引
|
||||
## Section 6: Diffusion Templates
|
||||
|
||||
本节介绍 Diffusion 模型可控生成插件框架 Diffusion Templates,讲解 Diffusion Templates 框架的运行机制,展示如何使用 Template 模型进行推理和训练。
|
||||
|
||||
* [理解 Diffusion Templates](./Diffusion_Templates/Understanding_Diffusion_Templates.md)
|
||||
* [Template 模型推理](./Diffusion_Templates/Template_Model_Inference.md)
|
||||
* [Template 模型训练](./Diffusion_Templates/Template_Model_Training.md)
|
||||
|
||||
## Section 7: 学术导引
|
||||
|
||||
本节介绍如何利用 `DiffSynth-Studio` 训练新的模型,帮助科研工作者探索新的模型技术。
|
||||
|
||||
@@ -84,7 +93,7 @@ graph LR;
|
||||
* 设计可控生成模型【coming soon】
|
||||
* 创建新的训练范式【coming soon】
|
||||
|
||||
## Section 7: 常见问题
|
||||
## Section 8: 常见问题
|
||||
|
||||
本节总结了开发者常见的问题,如果你在使用和开发中遇到了问题,请参考本节内容,如果仍无法解决,请到 GitHub 上给我们提 issue。
|
||||
|
||||
|
||||
@@ -60,6 +60,14 @@
|
||||
API_Reference/core/loader
|
||||
API_Reference/core/vram
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Diffusion Templates
|
||||
|
||||
Diffusion_Templates/Understanding_Diffusion_Templates.md
|
||||
Diffusion_Templates/Template_Model_Inference.md
|
||||
Diffusion_Templates/Template_Model_Training.md
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: 学术导引
|
||||
|
||||
Reference in New Issue
Block a user