From 54345f8678683e7e3156770fb0d82039bba0a838 Mon Sep 17 00:00:00 2001
From: mi804 <1576993271@qq.com>
Date: Fri, 24 Apr 2026 15:41:13 +0800
Subject: [PATCH] sd docs
---
README.md | 125 ++++++++++++++++
README_zh.md | 125 ++++++++++++++++
docs/en/Model_Details/Stable-Diffusion-XL.md | 141 +++++++++++++++++++
docs/en/Model_Details/Stable-Diffusion.md | 138 ++++++++++++++++++
docs/en/index.rst | 2 +
docs/zh/Model_Details/Stable-Diffusion-XL.md | 141 +++++++++++++++++++
docs/zh/Model_Details/Stable-Diffusion.md | 138 ++++++++++++++++++
docs/zh/index.rst | 2 +
8 files changed, 812 insertions(+)
create mode 100644 docs/en/Model_Details/Stable-Diffusion-XL.md
create mode 100644 docs/en/Model_Details/Stable-Diffusion.md
create mode 100644 docs/zh/Model_Details/Stable-Diffusion-XL.md
create mode 100644 docs/zh/Model_Details/Stable-Diffusion.md
diff --git a/README.md b/README.md
index ff46977..b0fc163 100644
--- a/README.md
+++ b/README.md
@@ -34,6 +34,8 @@ We believe that a well-developed open-source code framework can lower the thresh
> Currently, the development personnel of this project are limited, with most of the work handled by [Artiprocher](https://github.com/Artiprocher) and [mi804](https://github.com/mi804). Therefore, the progress of new feature development will be relatively slow, and the speed of responding to and resolving issues is limited. We apologize for this and ask developers to understand.
+- **April 24, 2026** We add support for Stable Diffusion v1.5 and SDXL, including inference, low VRAM inference, and training capabilities. For details, please refer to the [documentation](/docs/en/Model_Details/Stable-Diffusion.md), [documentation](/docs/en/Model_Details/Stable-Diffusion-XL.md) and [example code](/examples/stable_diffusion/).
+
- **April 14, 2026** JoyAI-Image open-sourced, welcome a new member to the image editing model family! Support includes instruction-guided image editing, low VRAM inference, and training capabilities. For details, please refer to the [documentation](/docs/en/Model_Details/JoyAI-Image.md) and [example code](/examples/joyai_image/).
- **March 19, 2026**: Added support for [openmoss/MOVA-720p](https://modelscope.cn/models/openmoss/MOVA-720p) and [openmoss/MOVA-360p](https://modelscope.cn/models/openmoss/MOVA-360p) models, including training and inference capabilities. [Documentation](/docs/en/Model_Details/Wan.md) and [example code](/examples/mova/) are now available.
@@ -299,6 +301,129 @@ Example code for Z-Image is available at: [/examples/z_image/](/examples/z_image
+#### Stable Diffusion: [/docs/en/Model_Details/Stable-Diffusion.md](/docs/en/Model_Details/Stable-Diffusion.md)
+
+
+
+Quick Start
+
+Running the following code will quickly load the [AI-ModelScope/stable-diffusion-v1-5](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5) model for inference. VRAM management is enabled, the framework automatically controls parameter loading based on available VRAM, requiring a minimum of 2GB VRAM.
+
+```python
+import torch
+from diffsynth.core import ModelConfig
+from diffsynth.pipelines.stable_diffusion import StableDiffusionPipeline
+
+vram_config = {
+ "offload_dtype": torch.float32,
+ "offload_device": "cpu",
+ "onload_dtype": torch.float32,
+ "onload_device": "cpu",
+ "preparing_dtype": torch.float32,
+ "preparing_device": "cuda",
+ "computation_dtype": torch.float32,
+ "computation_device": "cuda",
+}
+pipe = StableDiffusionPipeline.from_pretrained(
+ torch_dtype=torch.float32,
+ model_configs=[
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="unet/diffusion_pytorch_model.safetensors", **vram_config),
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="vae/diffusion_pytorch_model.safetensors", **vram_config),
+ ],
+ tokenizer_config=ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="tokenizer/"),
+ vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
+)
+
+image = pipe(
+ prompt="a photo of an astronaut riding a horse on mars, high quality, detailed",
+ negative_prompt="blurry, low quality, deformed",
+ cfg_scale=7.5,
+ height=512,
+ width=512,
+ seed=42,
+ rand_device="cuda",
+ num_inference_steps=50,
+)
+image.save("image.jpg")
+```
+
+
+
+
+
+Examples
+
+Example code for Stable Diffusion is available at: [/examples/stable_diffusion/](/examples/stable_diffusion/)
+
+|Model ID|Inference|Low VRAM Inference|Full Training|Full Training Validation|LoRA Training|LoRA Training Validation|
+|-|-|-|-|-|-|-|
+|[AI-ModelScope/stable-diffusion-v1-5](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5)|[code](/examples/stable_diffusion/model_inference/stable-diffusion-v1-5.py)|[code](/examples/stable_diffusion/model_inference_low_vram/stable-diffusion-v1-5.py)|[code](/examples/stable_diffusion/model_training/full/stable-diffusion-v1-5.sh)|[code](/examples/stable_diffusion/model_training/validate_full/stable-diffusion-v1-5.py)|[code](/examples/stable_diffusion/model_training/lora/stable-diffusion-v1-5.sh)|[code](/examples/stable_diffusion/model_training/validate_lora/stable-diffusion-v1-5.py)|
+
+
+
+#### Stable Diffusion XL: [/docs/en/Model_Details/Stable-Diffusion-XL.md](/docs/en/Model_Details/Stable-Diffusion-XL.md)
+
+
+
+Quick Start
+
+Running the following code will quickly load the [stabilityai/stable-diffusion-xl-base-1.0](https://www.modelscope.cn/models/stabilityai/stable-diffusion-xl-base-1.0) model for inference. VRAM management is enabled, the framework automatically controls parameter loading based on available VRAM, requiring a minimum of 6GB VRAM.
+
+```python
+import torch
+from diffsynth.core import ModelConfig
+from diffsynth.pipelines.stable_diffusion_xl import StableDiffusionXLPipeline
+
+vram_config = {
+ "offload_dtype": torch.float32,
+ "offload_device": "cpu",
+ "onload_dtype": torch.float32,
+ "onload_device": "cpu",
+ "preparing_dtype": torch.float32,
+ "preparing_device": "cuda",
+ "computation_dtype": torch.float32,
+ "computation_device": "cuda",
+}
+pipe = StableDiffusionXLPipeline.from_pretrained(
+ torch_dtype=torch.float32,
+ model_configs=[
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="text_encoder_2/model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="unet/diffusion_pytorch_model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="vae/diffusion_pytorch_model.safetensors", **vram_config),
+ ],
+ tokenizer_config=ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="tokenizer/"),
+ tokenizer_2_config=ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="tokenizer_2/"),
+ vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
+)
+
+image = pipe(
+ prompt="a photo of an astronaut riding a horse on mars",
+ negative_prompt="",
+ cfg_scale=5.0,
+ height=1024,
+ width=1024,
+ seed=42,
+ num_inference_steps=50,
+)
+image.save("image.jpg")
+```
+
+
+
+
+
+Examples
+
+Example code for Stable Diffusion XL is available at: [/examples/stable_diffusion_xl/](/examples/stable_diffusion_xl/)
+
+|Model ID|Inference|Low VRAM Inference|Full Training|Full Training Validation|LoRA Training|LoRA Training Validation|
+|-|-|-|-|-|-|-|
+|[stabilityai/stable-diffusion-xl-base-1.0](https://www.modelscope.cn/models/stabilityai/stable-diffusion-xl-base-1.0)|[code](/examples/stable_diffusion_xl/model_inference/stable-diffusion-xl-base-1.0.py)|[code](/examples/stable_diffusion_xl/model_inference_low_vram/stable-diffusion-xl-base-1.0.py)|[code](/examples/stable_diffusion_xl/model_training/full/stable-diffusion-xl-base-1.0.sh)|[code](/examples/stable_diffusion_xl/model_training/validate_full/stable-diffusion-xl-base-1.0.py)|[code](/examples/stable_diffusion_xl/model_training/lora/stable-diffusion-xl-base-1.0.sh)|[code](/examples/stable_diffusion_xl/model_training/validate_lora/stable-diffusion-xl-base-1.0.py)|
+
+
+
#### FLUX.2: [/docs/en/Model_Details/FLUX2.md](/docs/en/Model_Details/FLUX2.md)
diff --git a/README_zh.md b/README_zh.md
index 77dfaf0..6156623 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -34,6 +34,8 @@ DiffSynth 目前包括两个开源项目:
> 目前本项目的开发人员有限,大部分工作由 [Artiprocher](https://github.com/Artiprocher) 和 [mi804](https://github.com/mi804) 负责,因此新功能的开发进展会比较缓慢,issue 的回复和解决速度有限,我们对此感到非常抱歉,请各位开发者理解。
+- **2026年4月24日** 我们新增对 Stable Diffusion v1.5 和 SDXL 的支持,包括推理、低显存推理和训练能力。详情请参考[文档](/docs/zh/Model_Details/Stable-Diffusion.md)和[示例代码](/examples/stable_diffusion/)。
+
- **2026年4月14日** JoyAI-Image 开源,欢迎加入图像编辑模型家族!支持指令引导的图像编辑推理、低显存推理和训练能力。详情请参考[文档](/docs/zh/Model_Details/JoyAI-Image.md)和[示例代码](/examples/joyai_image/)。
- **2026年3月19日** 新增对 [openmoss/MOVA-720p](https://modelscope.cn/models/openmoss/MOVA-720p) 和 [openmoss/MOVA-360p](https://modelscope.cn/models/openmoss/MOVA-360p) 模型的支持,包括完整的训练和推理功能。[文档](/docs/zh/Model_Details/Wan.md)和[示例代码](/examples/mova/)现已可用。
@@ -299,6 +301,129 @@ Z-Image 的示例代码位于:[/examples/z_image/](/examples/z_image/)
+#### Stable Diffusion:[/docs/zh/Model_Details/Stable-Diffusion.md](/docs/zh/Model_Details/Stable-Diffusion.md)
+
+
+
+快速开始
+
+运行以下代码可以快速加载 [AI-ModelScope/stable-diffusion-v1-5](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5) 模型并进行推理。显存管理已启用,框架会自动根据剩余显存控制模型参数的加载,最低 2GB 显存即可运行。
+
+```python
+import torch
+from diffsynth.core import ModelConfig
+from diffsynth.pipelines.stable_diffusion import StableDiffusionPipeline
+
+vram_config = {
+ "offload_dtype": torch.float32,
+ "offload_device": "cpu",
+ "onload_dtype": torch.float32,
+ "onload_device": "cpu",
+ "preparing_dtype": torch.float32,
+ "preparing_device": "cuda",
+ "computation_dtype": torch.float32,
+ "computation_device": "cuda",
+}
+pipe = StableDiffusionPipeline.from_pretrained(
+ torch_dtype=torch.float32,
+ model_configs=[
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="unet/diffusion_pytorch_model.safetensors", **vram_config),
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="vae/diffusion_pytorch_model.safetensors", **vram_config),
+ ],
+ tokenizer_config=ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="tokenizer/"),
+ vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
+)
+
+image = pipe(
+ prompt="a photo of an astronaut riding a horse on mars, high quality, detailed",
+ negative_prompt="blurry, low quality, deformed",
+ cfg_scale=7.5,
+ height=512,
+ width=512,
+ seed=42,
+ rand_device="cuda",
+ num_inference_steps=50,
+)
+image.save("image.jpg")
+```
+
+
+
+
+
+示例代码
+
+Stable Diffusion 的示例代码位于:[/examples/stable_diffusion/](/examples/stable_diffusion/)
+
+| 模型 ID | 推理 | 低显存推理 | 全量训练 | 全量训练后验证 | LoRA 训练 | LoRA 训练后验证 |
+|-|-|-|-|-|-|-|
+|[AI-ModelScope/stable-diffusion-v1-5](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5)|[code](/examples/stable_diffusion/model_inference/stable-diffusion-v1-5.py)|[code](/examples/stable_diffusion/model_inference_low_vram/stable-diffusion-v1-5.py)|[code](/examples/stable_diffusion/model_training/full/stable-diffusion-v1-5.sh)|[code](/examples/stable_diffusion/model_training/validate_full/stable-diffusion-v1-5.py)|[code](/examples/stable_diffusion/model_training/lora/stable-diffusion-v1-5.sh)|[code](/examples/stable_diffusion/model_training/validate_lora/stable-diffusion-v1-5.py)|
+
+
+
+#### Stable Diffusion XL:[/docs/zh/Model_Details/Stable-Diffusion-XL.md](/docs/zh/Model_Details/Stable-Diffusion-XL.md)
+
+
+
+快速开始
+
+运行以下代码可以快速加载 [stabilityai/stable-diffusion-xl-base-1.0](https://www.modelscope.cn/models/stabilityai/stable-diffusion-xl-base-1.0) 模型并进行推理。显存管理已启用,框架会自动根据剩余显存控制模型参数的加载,最低 6GB 显存即可运行。
+
+```python
+import torch
+from diffsynth.core import ModelConfig
+from diffsynth.pipelines.stable_diffusion_xl import StableDiffusionXLPipeline
+
+vram_config = {
+ "offload_dtype": torch.float32,
+ "offload_device": "cpu",
+ "onload_dtype": torch.float32,
+ "onload_device": "cpu",
+ "preparing_dtype": torch.float32,
+ "preparing_device": "cuda",
+ "computation_dtype": torch.float32,
+ "computation_device": "cuda",
+}
+pipe = StableDiffusionXLPipeline.from_pretrained(
+ torch_dtype=torch.float32,
+ model_configs=[
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="text_encoder_2/model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="unet/diffusion_pytorch_model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="vae/diffusion_pytorch_model.safetensors", **vram_config),
+ ],
+ tokenizer_config=ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="tokenizer/"),
+ tokenizer_2_config=ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="tokenizer_2/"),
+ vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
+)
+
+image = pipe(
+ prompt="a photo of an astronaut riding a horse on mars",
+ negative_prompt="",
+ cfg_scale=5.0,
+ height=1024,
+ width=1024,
+ seed=42,
+ num_inference_steps=50,
+)
+image.save("image.jpg")
+```
+
+
+
+
+
+示例代码
+
+Stable Diffusion XL 的示例代码位于:[/examples/stable_diffusion_xl/](/examples/stable_diffusion_xl/)
+
+| 模型 ID | 推理 | 低显存推理 | 全量训练 | 全量训练后验证 | LoRA 训练 | LoRA 训练后验证 |
+|-|-|-|-|-|-|-|
+|[stabilityai/stable-diffusion-xl-base-1.0](https://www.modelscope.cn/models/stabilityai/stable-diffusion-xl-base-1.0)|[code](/examples/stable_diffusion_xl/model_inference/stable-diffusion-xl-base-1.0.py)|[code](/examples/stable_diffusion_xl/model_inference_low_vram/stable-diffusion-xl-base-1.0.py)|[code](/examples/stable_diffusion_xl/model_training/full/stable-diffusion-xl-base-1.0.sh)|[code](/examples/stable_diffusion_xl/model_training/validate_full/stable-diffusion-xl-base-1.0.py)|[code](/examples/stable_diffusion_xl/model_training/lora/stable-diffusion-xl-base-1.0.sh)|[code](/examples/stable_diffusion_xl/model_training/validate_lora/stable-diffusion-xl-base-1.0.py)|
+
+
+
#### FLUX.2: [/docs/zh/Model_Details/FLUX2.md](/docs/zh/Model_Details/FLUX2.md)
diff --git a/docs/en/Model_Details/Stable-Diffusion-XL.md b/docs/en/Model_Details/Stable-Diffusion-XL.md
new file mode 100644
index 0000000..3d765bc
--- /dev/null
+++ b/docs/en/Model_Details/Stable-Diffusion-XL.md
@@ -0,0 +1,141 @@
+# Stable Diffusion XL
+
+Stable Diffusion XL (SDXL) is an open-source diffusion-based text-to-image generation model developed by Stability AI, supporting 1024x1024 resolution high-quality text-to-image generation with a dual text encoder (CLIP-L + CLIP-bigG) architecture.
+
+## Installation
+
+Before performing model inference and training, please install DiffSynth-Studio first.
+
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+
+For more information on installation, please refer to [Setup Dependencies](../Pipeline_Usage/Setup.md).
+
+## Quick Start
+
+Running the following code will quickly load the [stabilityai/stable-diffusion-xl-base-1.0](https://www.modelscope.cn/models/stabilityai/stable-diffusion-xl-base-1.0) model for inference. VRAM management is enabled, the framework automatically controls parameter loading based on available VRAM, requiring a minimum of 6GB VRAM.
+
+```python
+import torch
+from diffsynth.core import ModelConfig
+from diffsynth.pipelines.stable_diffusion_xl import StableDiffusionXLPipeline
+
+vram_config = {
+ "offload_dtype": torch.float32,
+ "offload_device": "cpu",
+ "onload_dtype": torch.float32,
+ "onload_device": "cpu",
+ "preparing_dtype": torch.float32,
+ "preparing_device": "cuda",
+ "computation_dtype": torch.float32,
+ "computation_device": "cuda",
+}
+pipe = StableDiffusionXLPipeline.from_pretrained(
+ torch_dtype=torch.float32,
+ model_configs=[
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="text_encoder_2/model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="unet/diffusion_pytorch_model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="vae/diffusion_pytorch_model.safetensors", **vram_config),
+ ],
+ tokenizer_config=ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="tokenizer/"),
+ tokenizer_2_config=ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="tokenizer_2/"),
+ vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
+)
+
+image = pipe(
+ prompt="a photo of an astronaut riding a horse on mars",
+ negative_prompt="",
+ cfg_scale=5.0,
+ height=1024,
+ width=1024,
+ seed=42,
+ num_inference_steps=50,
+)
+image.save("image.jpg")
+```
+
+## Model Overview
+
+|Model ID|Inference|Low VRAM Inference|Full Training|Full Training Validation|LoRA Training|LoRA Training Validation|
+|-|-|-|-|-|-|-|
+|[stabilityai/stable-diffusion-xl-base-1.0](https://www.modelscope.cn/models/stabilityai/stable-diffusion-xl-base-1.0)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_inference/stable-diffusion-xl-base-1.0.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_inference_low_vram/stable-diffusion-xl-base-1.0.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_training/full/stable-diffusion-xl-base-1.0.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_training/validate_full/stable-diffusion-xl-base-1.0.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_training/lora/stable-diffusion-xl-base-1.0.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_training/validate_lora/stable-diffusion-xl-base-1.0.py)|
+
+## Model Inference
+
+The model is loaded via `StableDiffusionXLPipeline.from_pretrained`, see [Loading Models](../Pipeline_Usage/Model_Inference.md#loading-models) for details.
+
+The input parameters for `StableDiffusionXLPipeline` inference include:
+
+* `prompt`: Text prompt.
+* `negative_prompt`: Negative prompt, defaults to an empty string.
+* `cfg_scale`: Classifier-Free Guidance scale factor, default 5.0.
+* `height`: Output image height, default 1024.
+* `width`: Output image width, default 1024.
+* `seed`: Random seed, defaults to a random value if not set.
+* `rand_device`: Noise generation device, defaults to "cpu".
+* `num_inference_steps`: Number of inference steps, default 50.
+* `guidance_rescale`: Guidance rescale factor, default 0.0.
+* `progress_bar_cmd`: Progress bar callback function.
+
+> `StableDiffusionXLPipeline` requires dual tokenizer configurations (`tokenizer_config` and `tokenizer_2_config`), corresponding to the CLIP-L and CLIP-bigG text encoders.
+
+## Model Training
+
+Models in the stable_diffusion_xl series are trained via `examples/stable_diffusion_xl/model_training/train.py`. The script parameters include:
+
+* General Training Parameters
+ * Dataset Configuration
+ * `--dataset_base_path`: Root directory of the dataset.
+ * `--dataset_metadata_path`: Path to the dataset metadata file.
+ * `--dataset_repeat`: Number of dataset repeats per epoch.
+ * `--dataset_num_workers`: Number of processes per DataLoader.
+ * `--data_file_keys`: Field names to load from metadata, typically paths to image or video files, separated by `,`.
+ * Model Loading Configuration
+ * `--model_paths`: Paths to load models from, in JSON format.
+ * `--model_id_with_origin_paths`: Model IDs with original paths, separated by commas.
+ * `--extra_inputs`: Additional input parameters required by the model Pipeline, separated by `,`.
+ * `--fp8_models`: Models to load in FP8 format, currently only supported for models whose parameters are not updated by gradients.
+ * Basic Training Configuration
+ * `--learning_rate`: Learning rate.
+ * `--num_epochs`: Number of epochs.
+ * `--trainable_models`: Trainable models, e.g., `dit`, `vae`, `text_encoder`.
+ * `--find_unused_parameters`: Whether unused parameters exist in DDP training.
+ * `--weight_decay`: Weight decay magnitude.
+ * `--task`: Training task, defaults to `sft`.
+ * Output Configuration
+ * `--output_path`: Path to save the model.
+ * `--remove_prefix_in_ckpt`: Remove prefix in the model's state dict.
+ * `--save_steps`: Interval in training steps to save the model.
+ * LoRA Configuration
+ * `--lora_base_model`: Which model to add LoRA to.
+ * `--lora_target_modules`: Which layers to add LoRA to.
+ * `--lora_rank`: Rank of LoRA.
+ * `--lora_checkpoint`: Path to LoRA checkpoint.
+ * `--preset_lora_path`: Path to preset LoRA checkpoint for LoRA differential training.
+ * `--preset_lora_model`: Which model to integrate preset LoRA into, e.g., `dit`.
+ * Gradient Configuration
+ * `--use_gradient_checkpointing`: Whether to enable gradient checkpointing.
+ * `--use_gradient_checkpointing_offload`: Whether to offload gradient checkpointing to CPU memory.
+ * `--gradient_accumulation_steps`: Number of gradient accumulation steps.
+ * Resolution Configuration
+ * `--height`: Height of the image/video. Leave empty to enable dynamic resolution.
+ * `--width`: Width of the image/video. Leave empty to enable dynamic resolution.
+ * `--max_pixels`: Maximum pixel area, images larger than this will be scaled down during dynamic resolution.
+ * `--num_frames`: Number of frames for video (video generation models only).
+* Stable Diffusion XL Specific Parameters
+ * `--tokenizer_path`: Path to the first tokenizer.
+ * `--tokenizer_2_path`: Path to the second tokenizer, defaults to `stabilityai/stable-diffusion-xl-base-1.0:tokenizer_2/`.
+
+Example dataset download:
+
+```shell
+modelscope download --dataset DiffSynth-Studio/diffsynth_example_dataset --include "stable_diffusion_xl/*" --local_dir ./data/diffsynth_example_dataset
+```
+
+[stable-diffusion-xl-base-1.0 training scripts](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_training/lora/stable-diffusion-xl-base-1.0.sh)
+
+We provide recommended training scripts for each model, please refer to the table in "Model Overview" above. For guidance on writing model training scripts, see [Model Training](../Pipeline_Usage/Model_Training.md); for more advanced training algorithms, see [Training Framework Overview](https://github.com/modelscope/DiffSynth-Studio/tree/main/docs/en/Training/).
diff --git a/docs/en/Model_Details/Stable-Diffusion.md b/docs/en/Model_Details/Stable-Diffusion.md
new file mode 100644
index 0000000..6d45edb
--- /dev/null
+++ b/docs/en/Model_Details/Stable-Diffusion.md
@@ -0,0 +1,138 @@
+# Stable Diffusion
+
+Stable Diffusion is an open-source diffusion-based text-to-image generation model developed by Stability AI, supporting 512x512 resolution text-to-image generation.
+
+## Installation
+
+Before performing model inference and training, please install DiffSynth-Studio first.
+
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+
+For more information on installation, please refer to [Setup Dependencies](../Pipeline_Usage/Setup.md).
+
+## Quick Start
+
+Running the following code will quickly load the [AI-ModelScope/stable-diffusion-v1-5](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5) model for inference. VRAM management is enabled, the framework automatically controls parameter loading based on available VRAM, requiring a minimum of 2GB VRAM.
+
+```python
+import torch
+from diffsynth.core import ModelConfig
+from diffsynth.pipelines.stable_diffusion import StableDiffusionPipeline
+
+vram_config = {
+ "offload_dtype": torch.float32,
+ "offload_device": "cpu",
+ "onload_dtype": torch.float32,
+ "onload_device": "cpu",
+ "preparing_dtype": torch.float32,
+ "preparing_device": "cuda",
+ "computation_dtype": torch.float32,
+ "computation_device": "cuda",
+}
+pipe = StableDiffusionPipeline.from_pretrained(
+ torch_dtype=torch.float32,
+ model_configs=[
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="unet/diffusion_pytorch_model.safetensors", **vram_config),
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="vae/diffusion_pytorch_model.safetensors", **vram_config),
+ ],
+ tokenizer_config=ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="tokenizer/"),
+ vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
+)
+
+image = pipe(
+ prompt="a photo of an astronaut riding a horse on mars, high quality, detailed",
+ negative_prompt="blurry, low quality, deformed",
+ cfg_scale=7.5,
+ height=512,
+ width=512,
+ seed=42,
+ rand_device="cuda",
+ num_inference_steps=50,
+)
+image.save("image.jpg")
+```
+
+## Model Overview
+
+|Model ID|Inference|Low VRAM Inference|Full Training|Full Training Validation|LoRA Training|LoRA Training Validation|
+|-|-|-|-|-|-|-|
+|[AI-ModelScope/stable-diffusion-v1-5](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_inference/stable-diffusion-v1-5.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_inference_low_vram/stable-diffusion-v1-5.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_training/full/stable-diffusion-v1-5.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_training/validate_full/stable-diffusion-v1-5.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_training/lora/stable-diffusion-v1-5.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_training/validate_lora/stable-diffusion-v1-5.py)|
+
+## Model Inference
+
+The model is loaded via `StableDiffusionPipeline.from_pretrained`, see [Loading Models](../Pipeline_Usage/Model_Inference.md#loading-models) for details.
+
+The input parameters for `StableDiffusionPipeline` inference include:
+
+* `prompt`: Text prompt.
+* `negative_prompt`: Negative prompt, defaults to an empty string.
+* `cfg_scale`: Classifier-Free Guidance scale factor, default 7.5.
+* `height`: Output image height, default 512.
+* `width`: Output image width, default 512.
+* `seed`: Random seed, defaults to a random value if not set.
+* `rand_device`: Noise generation device, defaults to "cpu".
+* `num_inference_steps`: Number of inference steps, default 50.
+* `eta`: DDIM scheduler eta parameter, default 0.0.
+* `guidance_rescale`: Guidance rescale factor, default 0.0.
+* `progress_bar_cmd`: Progress bar callback function.
+
+## Model Training
+
+Models in the stable_diffusion series are trained via `examples/stable_diffusion/model_training/train.py`. The script parameters include:
+
+* General Training Parameters
+ * Dataset Configuration
+ * `--dataset_base_path`: Root directory of the dataset.
+ * `--dataset_metadata_path`: Path to the dataset metadata file.
+ * `--dataset_repeat`: Number of dataset repeats per epoch.
+ * `--dataset_num_workers`: Number of processes per DataLoader.
+ * `--data_file_keys`: Field names to load from metadata, typically paths to image or video files, separated by `,`.
+ * Model Loading Configuration
+ * `--model_paths`: Paths to load models from, in JSON format.
+ * `--model_id_with_origin_paths`: Model IDs with original paths, separated by commas.
+ * `--extra_inputs`: Additional input parameters required by the model Pipeline, separated by `,`.
+ * `--fp8_models`: Models to load in FP8 format, currently only supported for models whose parameters are not updated by gradients.
+ * Basic Training Configuration
+ * `--learning_rate`: Learning rate.
+ * `--num_epochs`: Number of epochs.
+ * `--trainable_models`: Trainable models, e.g., `dit`, `vae`, `text_encoder`.
+ * `--find_unused_parameters`: Whether unused parameters exist in DDP training.
+ * `--weight_decay`: Weight decay magnitude.
+ * `--task`: Training task, defaults to `sft`.
+ * Output Configuration
+ * `--output_path`: Path to save the model.
+ * `--remove_prefix_in_ckpt`: Remove prefix in the model's state dict.
+ * `--save_steps`: Interval in training steps to save the model.
+ * LoRA Configuration
+ * `--lora_base_model`: Which model to add LoRA to.
+ * `--lora_target_modules`: Which layers to add LoRA to.
+ * `--lora_rank`: Rank of LoRA.
+ * `--lora_checkpoint`: Path to LoRA checkpoint.
+ * `--preset_lora_path`: Path to preset LoRA checkpoint for LoRA differential training.
+ * `--preset_lora_model`: Which model to integrate preset LoRA into, e.g., `dit`.
+ * Gradient Configuration
+ * `--use_gradient_checkpointing`: Whether to enable gradient checkpointing.
+ * `--use_gradient_checkpointing_offload`: Whether to offload gradient checkpointing to CPU memory.
+ * `--gradient_accumulation_steps`: Number of gradient accumulation steps.
+ * Resolution Configuration
+ * `--height`: Height of the image/video. Leave empty to enable dynamic resolution.
+ * `--width`: Width of the image/video. Leave empty to enable dynamic resolution.
+ * `--max_pixels`: Maximum pixel area, images larger than this will be scaled down during dynamic resolution.
+ * `--num_frames`: Number of frames for video (video generation models only).
+* Stable Diffusion Specific Parameters
+ * `--tokenizer_path`: Tokenizer path, defaults to `AI-ModelScope/stable-diffusion-v1-5:tokenizer/`.
+
+Example dataset download:
+
+```shell
+modelscope download --dataset DiffSynth-Studio/diffsynth_example_dataset --include "stable_diffusion/*" --local_dir ./data/diffsynth_example_dataset
+```
+
+[stable-diffusion-v1-5 training scripts](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_training/lora/stable-diffusion-v1-5.sh)
+
+We provide recommended training scripts for each model, please refer to the table in "Model Overview" above. For guidance on writing model training scripts, see [Model Training](../Pipeline_Usage/Model_Training.md); for more advanced training algorithms, see [Training Framework Overview](https://github.com/modelscope/DiffSynth-Studio/tree/main/docs/en/Training/).
diff --git a/docs/en/index.rst b/docs/en/index.rst
index ad333be..0449463 100644
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -32,6 +32,8 @@ Welcome to DiffSynth-Studio's Documentation
Model_Details/LTX-2
Model_Details/ERNIE-Image
Model_Details/JoyAI-Image
+ Model_Details/Stable-Diffusion
+ Model_Details/Stable-Diffusion-XL
.. toctree::
:maxdepth: 2
diff --git a/docs/zh/Model_Details/Stable-Diffusion-XL.md b/docs/zh/Model_Details/Stable-Diffusion-XL.md
new file mode 100644
index 0000000..9464dbd
--- /dev/null
+++ b/docs/zh/Model_Details/Stable-Diffusion-XL.md
@@ -0,0 +1,141 @@
+# Stable Diffusion XL
+
+Stable Diffusion XL (SDXL) 是由 Stability AI 开发的开源扩散式文本到图像生成模型,支持 1024x1024 分辨率的高质量文本到图像生成,采用双文本编码器(CLIP-L + CLIP-bigG)架构。
+
+## 安装
+
+在使用本项目进行模型推理和训练前,请先安装 DiffSynth-Studio。
+
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+
+更多关于安装的信息,请参考[安装依赖](../Pipeline_Usage/Setup.md)。
+
+## 快速开始
+
+运行以下代码可以快速加载 [stabilityai/stable-diffusion-xl-base-1.0](https://www.modelscope.cn/models/stabilityai/stable-diffusion-xl-base-1.0) 模型并进行推理。显存管理已启动,框架会自动根据剩余显存控制模型参数的加载,最低 6GB 显存即可运行。
+
+```python
+import torch
+from diffsynth.core import ModelConfig
+from diffsynth.pipelines.stable_diffusion_xl import StableDiffusionXLPipeline
+
+vram_config = {
+ "offload_dtype": torch.float32,
+ "offload_device": "cpu",
+ "onload_dtype": torch.float32,
+ "onload_device": "cpu",
+ "preparing_dtype": torch.float32,
+ "preparing_device": "cuda",
+ "computation_dtype": torch.float32,
+ "computation_device": "cuda",
+}
+pipe = StableDiffusionXLPipeline.from_pretrained(
+ torch_dtype=torch.float32,
+ model_configs=[
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="text_encoder_2/model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="unet/diffusion_pytorch_model.safetensors", **vram_config),
+ ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="vae/diffusion_pytorch_model.safetensors", **vram_config),
+ ],
+ tokenizer_config=ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="tokenizer/"),
+ tokenizer_2_config=ModelConfig(model_id="stabilityai/stable-diffusion-xl-base-1.0", origin_file_pattern="tokenizer_2/"),
+ vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
+)
+
+image = pipe(
+ prompt="a photo of an astronaut riding a horse on mars",
+ negative_prompt="",
+ cfg_scale=5.0,
+ height=1024,
+ width=1024,
+ seed=42,
+ num_inference_steps=50,
+)
+image.save("image.jpg")
+```
+
+## 模型总览
+
+|模型 ID|推理|低显存推理|全量训练|全量训练后验证|LoRA 训练|LoRA 训练后验证|
+|-|-|-|-|-|-|-|
+|[stabilityai/stable-diffusion-xl-base-1.0](https://www.modelscope.cn/models/stabilityai/stable-diffusion-xl-base-1.0)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_inference/stable-diffusion-xl-base-1.0.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_inference_low_vram/stable-diffusion-xl-base-1.0.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_training/full/stable-diffusion-xl-base-1.0.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_training/validate_full/stable-diffusion-xl-base-1.0.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_training/lora/stable-diffusion-xl-base-1.0.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_training/validate_lora/stable-diffusion-xl-base-1.0.py)|
+
+## 模型推理
+
+模型通过 `StableDiffusionXLPipeline.from_pretrained` 加载,详见[加载模型](../Pipeline_Usage/Model_Inference.md#加载模型)。
+
+`StableDiffusionXLPipeline` 的推理输入参数包括:
+
+* `prompt`: 文本提示词。
+* `negative_prompt`: 负面提示词,默认为空字符串。
+* `cfg_scale`: Classifier-Free Guidance 缩放系数,默认 5.0。
+* `height`: 输出图像高度,默认 1024。
+* `width`: 输出图像宽度,默认 1024。
+* `seed`: 随机种子,默认不设置时使用随机种子。
+* `rand_device`: 噪声生成设备,默认 "cpu"。
+* `num_inference_steps`: 推理步数,默认 50。
+* `guidance_rescale`: Guidance rescale 系数,默认 0.0。
+* `progress_bar_cmd`: 进度条回调函数。
+
+> `StableDiffusionXLPipeline` 需要双 tokenizer 配置(`tokenizer_config` 和 `tokenizer_2_config`),分别对应 CLIP-L 和 CLIP-bigG 文本编码器。
+
+## 模型训练
+
+stable_diffusion_xl 系列模型通过 `examples/stable_diffusion_xl/model_training/train.py` 进行训练,脚本的参数包括:
+
+* 通用训练参数
+ * 数据集基础配置
+ * `--dataset_base_path`: 数据集的根目录。
+ * `--dataset_metadata_path`: 数据集的元数据文件路径。
+ * `--dataset_repeat`: 每个 epoch 中数据集重复的次数。
+ * `--dataset_num_workers`: 每个 Dataloader 的进程数量。
+ * `--data_file_keys`: 元数据中需要加载的字段名称,通常是图像或视频文件的路径,以 `,` 分隔。
+ * 模型加载配置
+ * `--model_paths`: 要加载的模型路径。JSON 格式。
+ * `--model_id_with_origin_paths`: 带原始路径的模型 ID。用逗号分隔。
+ * `--extra_inputs`: 模型 Pipeline 所需的额外输入参数,以 `,` 分隔。
+ * `--fp8_models`: 以 FP8 格式加载的模型,目前仅支持参数不被梯度更新的模型。
+ * 训练基础配置
+ * `--learning_rate`: 学习率。
+ * `--num_epochs`: 轮数(Epoch)。
+ * `--trainable_models`: 可训练的模型,例如 `dit`、`vae`、`text_encoder`。
+ * `--find_unused_parameters`: DDP 训练中是否存在未使用的参数。
+ * `--weight_decay`: 权重衰减大小。
+ * `--task`: 训练任务,默认为 `sft`。
+ * 输出配置
+ * `--output_path`: 模型保存路径。
+ * `--remove_prefix_in_ckpt`: 在模型文件的 state dict 中移除前缀。
+ * `--save_steps`: 保存模型的训练步数间隔。
+ * LoRA 配置
+ * `--lora_base_model`: LoRA 添加到哪个模型上。
+ * `--lora_target_modules`: LoRA 添加到哪些层上。
+ * `--lora_rank`: LoRA 的秩(Rank)。
+ * `--lora_checkpoint`: LoRA 检查点的路径。
+ * `--preset_lora_path`: 预置 LoRA 检查点路径,用于 LoRA 差分训练。
+ * `--preset_lora_model`: 预置 LoRA 融入的模型,例如 `dit`。
+ * 梯度配置
+ * `--use_gradient_checkpointing`: 是否启用 gradient checkpointing。
+ * `--use_gradient_checkpointing_offload`: 是否将 gradient checkpointing 卸载到内存中。
+ * `--gradient_accumulation_steps`: 梯度累积步数。
+ * 分辨率配置
+ * `--height`: 图像/视频的高度。留空启用动态分辨率。
+ * `--width`: 图像/视频的宽度。留空启用动态分辨率。
+ * `--max_pixels`: 最大像素面积,动态分辨率时大于此值的图片会被缩小。
+ * `--num_frames`: 视频的帧数(仅视频生成模型)。
+* Stable Diffusion XL 专有参数
+ * `--tokenizer_path`: 第一个 Tokenizer 路径。
+ * `--tokenizer_2_path`: 第二个 Tokenizer 路径,默认为 `stabilityai/stable-diffusion-xl-base-1.0:tokenizer_2/`。
+
+样例数据集下载:
+
+```shell
+modelscope download --dataset DiffSynth-Studio/diffsynth_example_dataset --include "stable_diffusion_xl/*" --local_dir ./data/diffsynth_example_dataset
+```
+
+[stable-diffusion-xl-base-1.0 训练脚本](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion_xl/model_training/lora/stable-diffusion-xl-base-1.0.sh)
+
+我们为每个模型编写了推荐的训练脚本,请参考前文"模型总览"中的表格。关于如何编写模型训练脚本,请参考[模型训练](../Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](https://github.com/modelscope/DiffSynth-Studio/tree/main/docs/zh/Training/)。
diff --git a/docs/zh/Model_Details/Stable-Diffusion.md b/docs/zh/Model_Details/Stable-Diffusion.md
new file mode 100644
index 0000000..3a3ee70
--- /dev/null
+++ b/docs/zh/Model_Details/Stable-Diffusion.md
@@ -0,0 +1,138 @@
+# Stable Diffusion
+
+Stable Diffusion 是由 Stability AI 开发的开源扩散式文本到图像生成模型,支持 512x512 分辨率的文本到图像生成。
+
+## 安装
+
+在使用本项目进行模型推理和训练前,请先安装 DiffSynth-Studio。
+
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+
+更多关于安装的信息,请参考[安装依赖](../Pipeline_Usage/Setup.md)。
+
+## 快速开始
+
+运行以下代码可以快速加载 [AI-ModelScope/stable-diffusion-v1-5](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5) 模型并进行推理。显存管理已启动,框架会自动根据剩余显存控制模型参数的加载,最低 2GB 显存即可运行。
+
+```python
+import torch
+from diffsynth.core import ModelConfig
+from diffsynth.pipelines.stable_diffusion import StableDiffusionPipeline
+
+vram_config = {
+ "offload_dtype": torch.float32,
+ "offload_device": "cpu",
+ "onload_dtype": torch.float32,
+ "onload_device": "cpu",
+ "preparing_dtype": torch.float32,
+ "preparing_device": "cuda",
+ "computation_dtype": torch.float32,
+ "computation_device": "cuda",
+}
+pipe = StableDiffusionPipeline.from_pretrained(
+ torch_dtype=torch.float32,
+ model_configs=[
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="unet/diffusion_pytorch_model.safetensors", **vram_config),
+ ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="vae/diffusion_pytorch_model.safetensors", **vram_config),
+ ],
+ tokenizer_config=ModelConfig(model_id="AI-ModelScope/stable-diffusion-v1-5", origin_file_pattern="tokenizer/"),
+ vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
+)
+
+image = pipe(
+ prompt="a photo of an astronaut riding a horse on mars, high quality, detailed",
+ negative_prompt="blurry, low quality, deformed",
+ cfg_scale=7.5,
+ height=512,
+ width=512,
+ seed=42,
+ rand_device="cuda",
+ num_inference_steps=50,
+)
+image.save("image.jpg")
+```
+
+## 模型总览
+
+|模型 ID|推理|低显存推理|全量训练|全量训练后验证|LoRA 训练|LoRA 训练后验证|
+|-|-|-|-|-|-|-|
+|[AI-ModelScope/stable-diffusion-v1-5](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_inference/stable-diffusion-v1-5.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_inference_low_vram/stable-diffusion-v1-5.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_training/full/stable-diffusion-v1-5.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_training/validate_full/stable-diffusion-v1-5.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_training/lora/stable-diffusion-v1-5.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_training/validate_lora/stable-diffusion-v1-5.py)|
+
+## 模型推理
+
+模型通过 `StableDiffusionPipeline.from_pretrained` 加载,详见[加载模型](../Pipeline_Usage/Model_Inference.md#加载模型)。
+
+`StableDiffusionPipeline` 的推理输入参数包括:
+
+* `prompt`: 文本提示词。
+* `negative_prompt`: 负面提示词,默认为空字符串。
+* `cfg_scale`: Classifier-Free Guidance 缩放系数,默认 7.5。
+* `height`: 输出图像高度,默认 512。
+* `width`: 输出图像宽度,默认 512。
+* `seed`: 随机种子,默认不设置时使用随机种子。
+* `rand_device`: 噪声生成设备,默认 "cpu"。
+* `num_inference_steps`: 推理步数,默认 50。
+* `eta`: DDIM 调度器的 eta 参数,默认 0.0。
+* `guidance_rescale`: Guidance rescale 系数,默认 0.0。
+* `progress_bar_cmd`: 进度条回调函数。
+
+## 模型训练
+
+stable_diffusion 系列模型通过 `examples/stable_diffusion/model_training/train.py` 进行训练,脚本的参数包括:
+
+* 通用训练参数
+ * 数据集基础配置
+ * `--dataset_base_path`: 数据集的根目录。
+ * `--dataset_metadata_path`: 数据集的元数据文件路径。
+ * `--dataset_repeat`: 每个 epoch 中数据集重复的次数。
+ * `--dataset_num_workers`: 每个 Dataloader 的进程数量。
+ * `--data_file_keys`: 元数据中需要加载的字段名称,通常是图像或视频文件的路径,以 `,` 分隔。
+ * 模型加载配置
+ * `--model_paths`: 要加载的模型路径。JSON 格式。
+ * `--model_id_with_origin_paths`: 带原始路径的模型 ID。用逗号分隔。
+ * `--extra_inputs`: 模型 Pipeline 所需的额外输入参数,以 `,` 分隔。
+ * `--fp8_models`: 以 FP8 格式加载的模型,目前仅支持参数不被梯度更新的模型。
+ * 训练基础配置
+ * `--learning_rate`: 学习率。
+ * `--num_epochs`: 轮数(Epoch)。
+ * `--trainable_models`: 可训练的模型,例如 `dit`、`vae`、`text_encoder`。
+ * `--find_unused_parameters`: DDP 训练中是否存在未使用的参数。
+ * `--weight_decay`: 权重衰减大小。
+ * `--task`: 训练任务,默认为 `sft`。
+ * 输出配置
+ * `--output_path`: 模型保存路径。
+ * `--remove_prefix_in_ckpt`: 在模型文件的 state dict 中移除前缀。
+ * `--save_steps`: 保存模型的训练步数间隔。
+ * LoRA 配置
+ * `--lora_base_model`: LoRA 添加到哪个模型上。
+ * `--lora_target_modules`: LoRA 添加到哪些层上。
+ * `--lora_rank`: LoRA 的秩(Rank)。
+ * `--lora_checkpoint`: LoRA 检查点的路径。
+ * `--preset_lora_path`: 预置 LoRA 检查点路径,用于 LoRA 差分训练。
+ * `--preset_lora_model`: 预置 LoRA 融入的模型,例如 `dit`。
+ * 梯度配置
+ * `--use_gradient_checkpointing`: 是否启用 gradient checkpointing。
+ * `--use_gradient_checkpointing_offload`: 是否将 gradient checkpointing 卸载到内存中。
+ * `--gradient_accumulation_steps`: 梯度累积步数。
+ * 分辨率配置
+ * `--height`: 图像/视频的高度。留空启用动态分辨率。
+ * `--width`: 图像/视频的宽度。留空启用动态分辨率。
+ * `--max_pixels`: 最大像素面积,动态分辨率时大于此值的图片会被缩小。
+ * `--num_frames`: 视频的帧数(仅视频生成模型)。
+* Stable Diffusion 专有参数
+ * `--tokenizer_path`: Tokenizer 路径,默认为 `AI-ModelScope/stable-diffusion-v1-5:tokenizer/`。
+
+样例数据集下载:
+
+```shell
+modelscope download --dataset DiffSynth-Studio/diffsynth_example_dataset --include "stable_diffusion/*" --local_dir ./data/diffsynth_example_dataset
+```
+
+[stable-diffusion-v1-5 训练脚本](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/stable_diffusion/model_training/lora/stable-diffusion-v1-5.sh)
+
+我们为每个模型编写了推荐的训练脚本,请参考前文"模型总览"中的表格。关于如何编写模型训练脚本,请参考[模型训练](../Pipeline_Usage/Model_Training.md);更多高阶训练算法,请参考[训练框架详解](https://github.com/modelscope/DiffSynth-Studio/tree/main/docs/zh/Training/)。
diff --git a/docs/zh/index.rst b/docs/zh/index.rst
index 526f0fb..71498d5 100644
--- a/docs/zh/index.rst
+++ b/docs/zh/index.rst
@@ -32,6 +32,8 @@
Model_Details/LTX-2
Model_Details/ERNIE-Image
Model_Details/JoyAI-Image
+ Model_Details/Stable-Diffusion
+ Model_Details/Stable-Diffusion-XL
.. toctree::
:maxdepth: 2