diff --git a/docs/source/GetStarted/A_simple_example.md b/docs/source/GetStarted/A_simple_example.md deleted file mode 100644 index 6a62b75..0000000 --- a/docs/source/GetStarted/A_simple_example.md +++ /dev/null @@ -1,87 +0,0 @@ - -# 基于Flux的文生图示例 - -以下是如何使用FLUX.1模型进行文生图任务的示例。该脚本提供了一个简单的设置,用于从文本描述生成图像。包括下载必要的模型、配置pipeline,以及在启用和禁用 classifier-free guidance 的情况下生成图像。 - -其他 DiffSynth 支持的模型详见 [模型.md](模型.md) - -## 准备 - -首先,确保已下载并配置了必要的模型: - -```python -import torch -from diffsynth import ModelManager, FluxImagePipeline, download_models - -# Download the FLUX.1-dev model files -download_models(["FLUX.1-dev"]) -``` - -下载模型的用法详见 [下载模型.md](下载模型.md) - -## 加载模型 - -使用您的设备和数据类型初始化模型管理器 - -```python -model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda") -model_manager.load_models([ - "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors", - "models/FLUX/FLUX.1-dev/text_encoder_2", - "models/FLUX/FLUX.1-dev/ae.safetensors", - "models/FLUX/FLUX.1-dev/flux1-dev.safetensors" -]) -``` - -模型加载的用法详见 [ModelManager.md](ModelManager.md) - -## 创建 Pipeline - -从加载的模型管理器中创建FluxImagePipeline实例: - -```python -pipe = FluxImagePipeline.from_model_manager(model_manager) -``` - -Pipeline 的用法详见 [Pipeline.md](Pipeline.md) - -## 文生图 - -使用简短的提示语生成图像。以下是启用和禁用 classifier-free guidance 的图像生成示例。 - -### 基础文生图 - -```python -prompt = "A cute little turtle" -negative_prompt = "" - -torch.manual_seed(6) -image = pipe( - prompt=prompt, - num_inference_steps=30, embedded_guidance=3.5 -) -image.save("image_1024.jpg") -``` - -### 使用 Classifier-Free Guidance 生成 -```python -torch.manual_seed(6) -image = pipe( - prompt=prompt, negative_prompt=negative_prompt, - num_inference_steps=30, cfg_scale=2.0, embedded_guidance=3.5 -) -image.save("image_1024_cfg.jpg") -``` - -### 高分辨率修复 - -```python -torch.manual_seed(7) -image = pipe( - prompt=prompt, - num_inference_steps=30, embedded_guidance=3.5, - input_image=image.resize((2048, 2048)), height=2048, width=2048, denoising_strength=0.6, tiled=True -) -image.save("image_2048_highres.jpg") -``` - diff --git a/docs/source/GetStarted/Download_models.md b/docs/source/GetStarted/Download_models.md deleted file mode 100644 index dc089df..0000000 --- a/docs/source/GetStarted/Download_models.md +++ /dev/null @@ -1,20 +0,0 @@ -# 下载模型 - -下载预设模型,模型ID可参考 [config file](/diffsynth/configs/model_config.py). - -```python -from diffsynth import download_models - -download_models(["FLUX.1-dev", "Kolors"]) -``` - -下载非预设模型,可以选择 [ModelScope](https://modelscope.cn/models) 和 [HuggingFace](https://huggingface.co/models) 两个下载源中的模型。 - -```python -from diffsynth.models.downloader import download_from_huggingface, download_from_modelscope - -# From Modelscope (recommended) -download_from_modelscope("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.bin", "models/kolors/Kolors/vae") -# From Huggingface -download_from_huggingface("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.safetensors", "models/kolors/Kolors/vae") -``` diff --git a/docs/source/GetStarted/Fine-Tuning.md b/docs/source/GetStarted/Fine-Tuning.md deleted file mode 100644 index 22d0b5a..0000000 --- a/docs/source/GetStarted/Fine-Tuning.md +++ /dev/null @@ -1,431 +0,0 @@ -# 微调 - -我们实现了一个用于文本到图像扩散模型的训练框架,使用户能够轻松地使用我们的框架训练 LoRA 模型。我们提供的脚本具有以下特点: - -* **全面功能与用户友好性**:我们的训练框架支持多GPU和多机器配置,便于使用 DeepSpeed 加速,并包括梯度检查点优化,适用于内存需求较大的模型。 -* **代码简洁与研究者可及性**:我们避免了大块复杂的代码。通用模块实现于 `diffsynth/trainers/text_to_image.py` 中,而模型特定的训练脚本仅包含与模型架构相关的最少代码,便于研究人员使用。 -* **模块化设计与开发者灵活性**:基于通用的 Pytorch-Lightning 框架,我们的训练框架在功能上是解耦的,允许开发者通过修改我们的脚本轻松引入额外的训练技术,以满足他们的需求。 - -LoRA 微调的图像示例。提示词为 "一只小狗蹦蹦跳跳,周围是姹紫嫣红的鲜花,远处是山脉"(针对中文模型)或 "a dog is jumping, flowers around the dog, the background is mountains and clouds"(针对英文模型)。 - -||Kolors|Stable Diffusion 3|Hunyuan-DiT| -|-|-|-|-| -|Without LoRA|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/9d79ed7a-e8cf-4d98-800a-f182809db318)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/ddb834a5-6366-412b-93dc-6d957230d66e)|![image_without_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/1aa21de5-a992-4b66-b14f-caa44e08876e)| -|With LoRA|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/02f62323-6ee5-4788-97a1-549732dbe4f0)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/8e7b2888-d874-4da4-a75b-11b6b214b9bf)|![image_with_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/83a0a41a-691f-4610-8e7b-d8e17c50a282)| - -## 下载需要的包 - -```bash -pip install peft lightning -``` - -## 准备你的数据 - -我们提供了一个 [示例数据集](https://modelscope.cn/datasets/buptwq/lora-stable-diffusion-finetune/files)。你需要将训练数据集按照如下形式组织: - - -``` -data/dog/ -└── train - ├── 00.jpg - ├── 01.jpg - ├── 02.jpg - ├── 03.jpg - ├── 04.jpg - └── metadata.csv -``` - -`metadata.csv`: - -``` -file_name,text -00.jpg,a dog -01.jpg,a dog -02.jpg,a dog -03.jpg,a dog -04.jpg,a dog -``` - -请注意,如果模型是中文模型(例如,Hunyuan-DiT 和 Kolors),我们建议在数据集中使用中文文本。例如: - -``` -file_name,text -00.jpg,一只小狗 -01.jpg,一只小狗 -02.jpg,一只小狗 -03.jpg,一只小狗 -04.jpg,一只小狗 -``` - -## 训练 LoRA 模型 - -参数选项: - -``` - --lora_target_modules LORA_TARGET_MODULES - LoRA 模块所在的层。 - --dataset_path DATASET_PATH - 数据集的路径。 - --output_path OUTPUT_PATH - 模型保存路径。 - --steps_per_epoch STEPS_PER_EPOCH - 每个周期的步数。 - --height HEIGHT 图像高度。 - --width WIDTH 图像宽度。 - --center_crop 是否将输入图像中心裁剪到指定分辨率。如果未设置,图像将被随机裁剪。图像会在裁剪前先调整到指定分辨率。 - --random_flip 是否随机水平翻转图像。 - --batch_size BATCH_SIZE - 训练数据加载器的批量大小(每设备)。 - --dataloader_num_workers DATALOADER_NUM_WORKERS - 数据加载使用的子进程数量。0 表示数据将在主进程中加载。 - --precision {32,16,16-mixed} - 训练精度。 - --learning_rate LEARNING_RATE - 学习率。 - --lora_rank LORA_RANK - LoRA 更新矩阵的维度。 - --lora_alpha LORA_ALPHA - LoRA 更新矩阵的权重。 - --use_gradient_checkpointing - 是否使用梯度检查点。 - --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES - 梯度累积的批次数量。 - --training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3} - 训练策略。 - --max_epochs MAX_EPOCHS - 训练周期数。 - --modelscope_model_id MODELSCOPE_MODEL_ID - ModelScope 上的模型 ID (https://www.modelscope.cn/)。如果提供模型 ID,模型将自动上传到 ModelScope。 - -``` - -### Kolors - -以下文件将用于构建 Kolors。你可以从 [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) 或 [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors) 下载 Kolors。由于精度溢出问题,我们需要下载额外的 VAE 模型(从 [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) 或 [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix))。你可以使用以下代码下载这些文件: - - -```python -from diffsynth import download_models - -download_models(["Kolors", "SDXL-vae-fp16-fix"]) -``` - -``` -models -├── kolors -│ └── Kolors -│ ├── text_encoder -│ │ ├── config.json -│ │ ├── pytorch_model-00001-of-00007.bin -│ │ ├── pytorch_model-00002-of-00007.bin -│ │ ├── pytorch_model-00003-of-00007.bin -│ │ ├── pytorch_model-00004-of-00007.bin -│ │ ├── pytorch_model-00005-of-00007.bin -│ │ ├── pytorch_model-00006-of-00007.bin -│ │ ├── pytorch_model-00007-of-00007.bin -│ │ └── pytorch_model.bin.index.json -│ ├── unet -│ │ └── diffusion_pytorch_model.safetensors -│ └── vae -│ └── diffusion_pytorch_model.safetensors -└── sdxl-vae-fp16-fix - └── diffusion_pytorch_model.safetensors -``` - -使用下面的命令启动训练任务: - -``` -CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \ - --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ - --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ - --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ - --dataset_path data/dog \ - --output_path ./models \ - --max_epochs 1 \ - --steps_per_epoch 500 \ - --height 1024 \ - --width 1024 \ - --center_crop \ - --precision "16-mixed" \ - --learning_rate 1e-4 \ - --lora_rank 4 \ - --lora_alpha 4 \ - --use_gradient_checkpointing -``` - -有关参数的更多信息,请使用 `python examples/train/kolors/train_kolors_lora.py -h` 查看详细信息。 - -训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 - - - -```python -from diffsynth import ModelManager, SD3ImagePipeline -import torch - -model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", - file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"]) -model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) -pipe = SD3ImagePipeline.from_model_manager(model_manager) - -torch.manual_seed(0) -image = pipe( - prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", - negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails", - cfg_scale=7.5, - num_inference_steps=100, width=1024, height=1024, -) -image.save("image_with_lora.jpg") -``` - - -### Stable Diffusion 3 - -训练脚本只需要一个文件。你可以使用 [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)(没有 T5 Encoder)或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)(有 T5 Encoder)。请使用以下代码下载这些文件: - - -```python -from diffsynth import download_models - -download_models(["StableDiffusion3", "StableDiffusion3_without_T5"]) -``` - -``` -models/stable_diffusion_3/ -├── Put Stable Diffusion 3 checkpoints here.txt -├── sd3_medium_incl_clips.safetensors -└── sd3_medium_incl_clips_t5xxlfp16.safetensors -``` - -使用下面的命令启动训练任务: - -``` -CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora.py \ - --pretrained_path models/stable_diffusion_3/sd3_medium_incl_clips.safetensors \ - --dataset_path data/dog \ - --output_path ./models \ - --max_epochs 1 \ - --steps_per_epoch 500 \ - --height 1024 \ - --width 1024 \ - --center_crop \ - --precision "16-mixed" \ - --learning_rate 1e-4 \ - --lora_rank 4 \ - --lora_alpha 4 \ - --use_gradient_checkpointing -``` - -有关参数的更多信息,请使用 `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` 查看详细信息。 - -训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 - -```python -from diffsynth import ModelManager, SD3ImagePipeline -import torch - -model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", - file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"]) -model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) -pipe = SD3ImagePipeline.from_model_manager(model_manager) - -torch.manual_seed(0) -image = pipe( - prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", - negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails", - cfg_scale=7.5, - num_inference_steps=100, width=1024, height=1024, -) -image.save("image_with_lora.jpg") -``` - -### Hunyuan-DiT - -构建 Hunyuan DiT 需要四个文件。你可以从 [HuggingFace](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT) 或 [ModelScope](https://www.modelscope.cn/models/modelscope/HunyuanDiT/summary) 下载这些文件。你可以使用以下代码下载这些文件: - - -```python -from diffsynth import download_models - -download_models(["HunyuanDiT"]) -``` - -``` -models/HunyuanDiT/ -├── Put Hunyuan DiT checkpoints here.txt -└── t2i - ├── clip_text_encoder - │ └── pytorch_model.bin - ├── model - │ └── pytorch_model_ema.pt - ├── mt5 - │ └── pytorch_model.bin - └── sdxl-vae-fp16-fix - └── diffusion_pytorch_model.bin -``` - -Launch the training task using the following command: - -``` -CUDA_VISIBLE_DEVICES="0" python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py \ - --pretrained_path models/HunyuanDiT/t2i \ - --dataset_path data/dog \ - --output_path ./models \ - --max_epochs 1 \ - --steps_per_epoch 500 \ - --height 1024 \ - --width 1024 \ - --center_crop \ - --precision "16-mixed" \ - --learning_rate 1e-4 \ - --lora_rank 4 \ - --lora_alpha 4 \ - --use_gradient_checkpointing -``` - -有关参数的更多信息,请使用 `python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py -h` 查看详细信息。 - -训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 - - -```python -from diffsynth import ModelManager, HunyuanDiTImagePipeline -import torch - -model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", - file_path_list=[ - "models/HunyuanDiT/t2i/clip_text_encoder/pytorch_model.bin", - "models/HunyuanDiT/t2i/model/pytorch_model_ema.pt", - "models/HunyuanDiT/t2i/mt5/pytorch_model.bin", - "models/HunyuanDiT/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin" - ]) -model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) -pipe = HunyuanDiTImagePipeline.from_model_manager(model_manager) - -torch.manual_seed(0) -image = pipe( - prompt="一只小狗蹦蹦跳跳,周围是姹紫嫣红的鲜花,远处是山脉", - negative_prompt="", - cfg_scale=7.5, - num_inference_steps=100, width=1024, height=1024, -) -image.save("image_with_lora.jpg") -``` - -### Stable Diffusion - -训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下,我们使用基础的 Stable Diffusion v1.5。你可以从 [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors) 下载。你可以使用以下代码下载这个文件: - -```python -from diffsynth import download_models - -download_models(["StableDiffusion_v15"]) -``` - -``` -models/stable_diffusion -├── Put Stable Diffusion checkpoints here.txt -└── v1-5-pruned-emaonly.safetensors -``` - -Launch the training task using the following command: - -``` -CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py \ - --pretrained_path models/stable_diffusion/v1-5-pruned-emaonly.safetensors \ - --dataset_path data/dog \ - --output_path ./models \ - --max_epochs 1 \ - --steps_per_epoch 500 \ - --height 512 \ - --width 512 \ - --center_crop \ - --precision "16-mixed" \ - --learning_rate 1e-4 \ - --lora_rank 4 \ - --lora_alpha 4 \ - --use_gradient_checkpointing -``` - -有关参数的更多信息,请使用 `python examples/train/stable_diffusion/train_sd_lora.py -h` 查看详细信息。 - -训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 - - - -```python -from diffsynth import ModelManager, SDImagePipeline -import torch - -model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", - file_path_list=["models/stable_diffusion/v1-5-pruned-emaonly.safetensors"]) -model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) -pipe = SDImagePipeline.from_model_manager(model_manager) - -torch.manual_seed(0) -image = pipe( - prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", - negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails", - cfg_scale=7.5, - num_inference_steps=100, width=512, height=512, -) -image.save("image_with_lora.jpg") -``` - -### Stable Diffusion XL - -训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下,我们使用基础的 Stable Diffusion XL。你可以从 [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors) 下载。也可以使用以下代码下载这个文件: - -```python -from diffsynth import download_models - -download_models(["StableDiffusionXL_v1"]) -``` - -``` -models/stable_diffusion_xl -├── Put Stable Diffusion XL checkpoints here.txt -└── sd_xl_base_1.0.safetensors -``` - -We observed that Stable Diffusion XL is not float16-safe, thus we recommand users to use float32. - -``` -CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lora.py \ - --pretrained_path models/stable_diffusion_xl/sd_xl_base_1.0.safetensors \ - --dataset_path data/dog \ - --output_path ./models \ - --max_epochs 1 \ - --steps_per_epoch 500 \ - --height 1024 \ - --width 1024 \ - --center_crop \ - --precision "32" \ - --learning_rate 1e-4 \ - --lora_rank 4 \ - --lora_alpha 4 \ - --use_gradient_checkpointing -``` - -有关参数的更多信息,请使用 `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` 查看详细信息。 - -训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 - -```python -from diffsynth import ModelManager, SDXLImagePipeline -import torch - -model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", - file_path_list=["models/stable_diffusion_xl/sd_xl_base_1.0.safetensors"]) -model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) -pipe = SDXLImagePipeline.from_model_manager(model_manager) - -torch.manual_seed(0) -image = pipe( - prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", - negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails", - cfg_scale=7.5, - num_inference_steps=100, width=1024, height=1024, -) -image.save("image_with_lora.jpg") -``` diff --git a/docs/source/GetStarted/WebUI.md b/docs/source/GetStarted/WebUI.md deleted file mode 100644 index e69de29..0000000 diff --git a/docs/source/finetune/overview.md b/docs/source/finetune/overview.md new file mode 100644 index 0000000..ded131c --- /dev/null +++ b/docs/source/finetune/overview.md @@ -0,0 +1,98 @@ +# 训练框架 + +我们实现了一个用于文本到图像扩散模型的训练框架,使用户能够轻松地使用我们的框架训练 LoRA 模型。我们提供的脚本具有以下特点: + +* **功能全面**:我们的训练框架支持多GPU和多机器配置,便于使用 DeepSpeed 加速,并包括梯度检查点优化,适用于内存需求较大的模型。 +* **代码简洁**:我们避免了大块复杂的代码。通用模块实现于 `diffsynth/trainers/text_to_image.py` 中,而模型特定的训练脚本仅包含与模型架构相关的最少代码,便于学术研究人员使用。 +* **模块化设计**:基于通用的 Pytorch-Lightning 框架,我们的训练框架在功能上是解耦的,允许开发者通过修改我们的脚本轻松引入额外的训练技术,以满足他们的需求。 + +LoRA 微调的图像示例。提示词为 "一只小狗蹦蹦跳跳,周围是姹紫嫣红的鲜花,远处是山脉"(针对中文模型)或 "a dog is jumping, flowers around the dog, the background is mountains and clouds"(针对英文模型)。 + +||FLUX.1-dev|Kolors|Stable Diffusion 3|Hunyuan-DiT| +|-|-|-|-|-| +|Without LoRA|![image_without_lora](https://github.com/user-attachments/assets/df62cef6-d54f-4e3d-a602-5dd290079d49)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/9d79ed7a-e8cf-4d98-800a-f182809db318)|![image_without_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/ddb834a5-6366-412b-93dc-6d957230d66e)|![image_without_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/1aa21de5-a992-4b66-b14f-caa44e08876e)| +|With LoRA|![image_with_lora](https://github.com/user-attachments/assets/4fd39890-0291-4d19-8a88-d70d0ae18533)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/02f62323-6ee5-4788-97a1-549732dbe4f0)|![image_with_lora](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/8e7b2888-d874-4da4-a75b-11b6b214b9bf)|![image_with_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/83a0a41a-691f-4610-8e7b-d8e17c50a282)| + +## 安装额外包 + +``` +pip install peft lightning +``` + +## 准备数据集 + +我们提供了一个[示例数据集](https://modelscope.cn/datasets/buptwq/lora-stable-diffusion-finetune/files)。你需要将训练数据集按照如下形式组织: + +``` +data/dog/ +└── train + ├── 00.jpg + ├── 01.jpg + ├── 02.jpg + ├── 03.jpg + ├── 04.jpg + └── metadata.csv +``` + +`metadata.csv`: + +``` +file_name,text +00.jpg,a dog +01.jpg,a dog +02.jpg,a dog +03.jpg,a dog +04.jpg,a dog +``` + +请注意,如果模型是中文模型(例如,Hunyuan-DiT 和 Kolors),我们建议在数据集中使用中文文本。例如: + +``` +file_name,text +00.jpg,一只小狗 +01.jpg,一只小狗 +02.jpg,一只小狗 +03.jpg,一只小狗 +04.jpg,一只小狗 +``` + +## 训练 LoRA 模型 + +通用参数选项: + +``` + --lora_target_modules LORA_TARGET_MODULES + LoRA 模块所在的层。 + --dataset_path DATASET_PATH + 数据集的路径。 + --output_path OUTPUT_PATH + 模型保存路径。 + --steps_per_epoch STEPS_PER_EPOCH + 每个周期的步数。 + --height HEIGHT 图像高度。 + --width WIDTH 图像宽度。 + --center_crop 是否将输入图像中心裁剪到指定分辨率。如果未设置,图像将被随机裁剪。图像会在裁剪前先调整到指定分辨率。 + --random_flip 是否随机水平翻转图像。 + --batch_size BATCH_SIZE + 训练数据加载器的批量大小(每设备)。 + --dataloader_num_workers DATALOADER_NUM_WORKERS + 数据加载使用的子进程数量。0 表示数据将在主进程中加载。 + --precision {32,16,16-mixed} + 训练精度。 + --learning_rate LEARNING_RATE + 学习率。 + --lora_rank LORA_RANK + LoRA 更新矩阵的维度。 + --lora_alpha LORA_ALPHA + LoRA 更新矩阵的权重。 + --use_gradient_checkpointing + 是否使用梯度检查点。 + --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES + 梯度累积的批次数量。 + --training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3} + 训练策略。 + --max_epochs MAX_EPOCHS + 训练轮数。 + --modelscope_model_id MODELSCOPE_MODEL_ID + ModelScope 上的模型 ID (https://www.modelscope.cn/)。如果提供模型 ID,模型将自动上传到 ModelScope。 +``` diff --git a/docs/source/finetune/train_flux_lora.md b/docs/source/finetune/train_flux_lora.md new file mode 100644 index 0000000..89ae5cf --- /dev/null +++ b/docs/source/finetune/train_flux_lora.md @@ -0,0 +1,71 @@ +# 训练 FLUX LoRA + +以下文件将会被用于构建 FLUX 模型。 你可以从[huggingface](https://huggingface.co/black-forest-labs/FLUX.1-dev)或[modelscope](https://www.modelscope.cn/models/ai-modelscope/flux.1-dev)下载,也可以使用以下代码下载这些文件: + +```python +from diffsynth import download_models + +download_models(["FLUX.1-dev"]) +``` + +``` +models/FLUX/ +└── FLUX.1-dev + ├── ae.safetensors + ├── flux1-dev.safetensors + ├── text_encoder + │ └── model.safetensors + └── text_encoder_2 + ├── config.json + ├── model-00001-of-00002.safetensors + ├── model-00002-of-00002.safetensors + └── model.safetensors.index.json +``` + +使用以下命令启动训练任务: + +``` +CUDA_VISIBLE_DEVICES="0" python examples/train/flux/train_flux_lora.py \ + --pretrained_text_encoder_path models/FLUX/FLUX.1-dev/text_encoder/model.safetensors \ + --pretrained_text_encoder_2_path models/FLUX/FLUX.1-dev/text_encoder_2 \ + --pretrained_dit_path models/FLUX/FLUX.1-dev/flux1-dev.safetensors \ + --pretrained_vae_path models/FLUX/FLUX.1-dev/ae.safetensors \ + --dataset_path data/dog \ + --output_path ./models \ + --max_epochs 1 \ + --steps_per_epoch 500 \ + --height 1024 \ + --width 1024 \ + --center_crop \ + --precision "bf16" \ + --learning_rate 1e-4 \ + --lora_rank 4 \ + --lora_alpha 4 \ + --use_gradient_checkpointing +``` + +有关参数的更多信息,请使用 `python examples/train/flux/train_flux_lora.py -h` 查看详细信息。 + +训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 + +```python +from diffsynth import ModelManager, FluxImagePipeline +import torch + +model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", + file_path_list=[ + "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors", + "models/FLUX/FLUX.1-dev/text_encoder_2", + "models/FLUX/FLUX.1-dev/ae.safetensors", + "models/FLUX/FLUX.1-dev/flux1-dev.safetensors" + ]) +model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) +pipe = SDXLImagePipeline.from_model_manager(model_manager) + +torch.manual_seed(0) +image = pipe( + prompt=prompt, + num_inference_steps=30, embedded_guidance=3.5 +) +image.save("image_with_lora.jpg") +``` diff --git a/docs/source/finetune/train_hunyuan_dit_lora.md b/docs/source/finetune/train_hunyuan_dit_lora.md new file mode 100644 index 0000000..cbd050c --- /dev/null +++ b/docs/source/finetune/train_hunyuan_dit_lora.md @@ -0,0 +1,72 @@ +# 训练 Hunyuan-DiT LoRA + +构建 Hunyuan DiT 需要四个文件。你可以从 [HuggingFace](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT) 或 [ModelScope](https://www.modelscope.cn/models/modelscope/HunyuanDiT/summary) 下载这些文件。你可以使用以下代码下载这些文件: + + +```python +from diffsynth import download_models + +download_models(["HunyuanDiT"]) +``` + +``` +models/HunyuanDiT/ +├── Put Hunyuan DiT checkpoints here.txt +└── t2i + ├── clip_text_encoder + │ └── pytorch_model.bin + ├── model + │ └── pytorch_model_ema.pt + ├── mt5 + │ └── pytorch_model.bin + └── sdxl-vae-fp16-fix + └── diffusion_pytorch_model.bin +``` + +使用以下命令启动训练任务: + +``` +CUDA_VISIBLE_DEVICES="0" python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py \ + --pretrained_path models/HunyuanDiT/t2i \ + --dataset_path data/dog \ + --output_path ./models \ + --max_epochs 1 \ + --steps_per_epoch 500 \ + --height 1024 \ + --width 1024 \ + --center_crop \ + --precision "16-mixed" \ + --learning_rate 1e-4 \ + --lora_rank 4 \ + --lora_alpha 4 \ + --use_gradient_checkpointing +``` + +有关参数的更多信息,请使用 `python examples/train/hunyuan_dit/train_hunyuan_dit_lora.py -h` 查看详细信息。 + +训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 + + +```python +from diffsynth import ModelManager, HunyuanDiTImagePipeline +import torch + +model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", + file_path_list=[ + "models/HunyuanDiT/t2i/clip_text_encoder/pytorch_model.bin", + "models/HunyuanDiT/t2i/model/pytorch_model_ema.pt", + "models/HunyuanDiT/t2i/mt5/pytorch_model.bin", + "models/HunyuanDiT/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin" + ]) +model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) +pipe = HunyuanDiTImagePipeline.from_model_manager(model_manager) + +torch.manual_seed(0) +image = pipe( + prompt="一只小狗蹦蹦跳跳,周围是姹紫嫣红的鲜花,远处是山脉", + negative_prompt="", + cfg_scale=7.5, + num_inference_steps=100, width=1024, height=1024, +) +image.save("image_with_lora.jpg") +``` diff --git a/docs/source/finetune/train_kolors_lora.md b/docs/source/finetune/train_kolors_lora.md new file mode 100644 index 0000000..d7bab00 --- /dev/null +++ b/docs/source/finetune/train_kolors_lora.md @@ -0,0 +1,78 @@ +# 训练 Kolors LoRA + +以下文件将用于构建 Kolors。你可以从 [HuggingFace](https://huggingface.co/Kwai-Kolors/Kolors) 或 [ModelScope](https://modelscope.cn/models/Kwai-Kolors/Kolors) 下载 Kolors。由于精度溢出问题,我们需要下载额外的 VAE 模型(从 [HuggingFace](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) 或 [ModelScope](https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix))。你可以使用以下代码下载这些文件: + + +```python +from diffsynth import download_models + +download_models(["Kolors", "SDXL-vae-fp16-fix"]) +``` + +``` +models +├── kolors +│ └── Kolors +│ ├── text_encoder +│ │ ├── config.json +│ │ ├── pytorch_model-00001-of-00007.bin +│ │ ├── pytorch_model-00002-of-00007.bin +│ │ ├── pytorch_model-00003-of-00007.bin +│ │ ├── pytorch_model-00004-of-00007.bin +│ │ ├── pytorch_model-00005-of-00007.bin +│ │ ├── pytorch_model-00006-of-00007.bin +│ │ ├── pytorch_model-00007-of-00007.bin +│ │ └── pytorch_model.bin.index.json +│ ├── unet +│ │ └── diffusion_pytorch_model.safetensors +│ └── vae +│ └── diffusion_pytorch_model.safetensors +└── sdxl-vae-fp16-fix + └── diffusion_pytorch_model.safetensors +``` + +使用下面的命令启动训练任务: + +``` +CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \ + --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ + --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ + --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ + --dataset_path data/dog \ + --output_path ./models \ + --max_epochs 1 \ + --steps_per_epoch 500 \ + --height 1024 \ + --width 1024 \ + --center_crop \ + --precision "16-mixed" \ + --learning_rate 1e-4 \ + --lora_rank 4 \ + --lora_alpha 4 \ + --use_gradient_checkpointing +``` + +有关参数的更多信息,请使用 `python examples/train/kolors/train_kolors_lora.py -h` 查看详细信息。 + +训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 + + + +```python +from diffsynth import ModelManager, SD3ImagePipeline +import torch + +model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", + file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"]) +model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) +pipe = SD3ImagePipeline.from_model_manager(model_manager) + +torch.manual_seed(0) +image = pipe( + prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", + negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails", + cfg_scale=7.5, + num_inference_steps=100, width=1024, height=1024, +) +image.save("image_with_lora.jpg") +``` diff --git a/docs/source/finetune/train_sd3_lora.md b/docs/source/finetune/train_sd3_lora.md new file mode 100644 index 0000000..bb6f383 --- /dev/null +++ b/docs/source/finetune/train_sd3_lora.md @@ -0,0 +1,59 @@ +# 训练 Stable Diffusion 3 LoRA + +训练脚本只需要一个文件。你可以使用 [`sd3_medium_incl_clips.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips.safetensors)(没有 T5 Encoder)或 [`sd3_medium_incl_clips_t5xxlfp16.safetensors`](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors)(有 T5 Encoder)。请使用以下代码下载这些文件: + + +```python +from diffsynth import download_models + +download_models(["StableDiffusion3", "StableDiffusion3_without_T5"]) +``` + +``` +models/stable_diffusion_3/ +├── Put Stable Diffusion 3 checkpoints here.txt +├── sd3_medium_incl_clips.safetensors +└── sd3_medium_incl_clips_t5xxlfp16.safetensors +``` + +使用下面的命令启动训练任务: + +``` +CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_3/train_sd3_lora.py \ + --pretrained_path models/stable_diffusion_3/sd3_medium_incl_clips.safetensors \ + --dataset_path data/dog \ + --output_path ./models \ + --max_epochs 1 \ + --steps_per_epoch 500 \ + --height 1024 \ + --width 1024 \ + --center_crop \ + --precision "16-mixed" \ + --learning_rate 1e-4 \ + --lora_rank 4 \ + --lora_alpha 4 \ + --use_gradient_checkpointing +``` + +有关参数的更多信息,请使用 `python examples/train/stable_diffusion_3/train_sd3_lora.py -h` 查看详细信息。 + +训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 + +```python +from diffsynth import ModelManager, SD3ImagePipeline +import torch + +model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", + file_path_list=["models/stable_diffusion_3/sd3_medium_incl_clips.safetensors"]) +model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) +pipe = SD3ImagePipeline.from_model_manager(model_manager) + +torch.manual_seed(0) +image = pipe( + prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", + negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails", + cfg_scale=7.5, + num_inference_steps=100, width=1024, height=1024, +) +image.save("image_with_lora.jpg") +``` diff --git a/docs/source/finetune/train_sd_lora.md b/docs/source/finetune/train_sd_lora.md new file mode 100644 index 0000000..dc792c7 --- /dev/null +++ b/docs/source/finetune/train_sd_lora.md @@ -0,0 +1,59 @@ +# 训练 Stable Diffusion LoRA + +训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下,我们使用基础的 Stable Diffusion v1.5。你可以从 [HuggingFace](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5/resolve/master/v1-5-pruned-emaonly.safetensors) 下载。你可以使用以下代码下载这个文件: + +```python +from diffsynth import download_models + +download_models(["StableDiffusion_v15"]) +``` + +``` +models/stable_diffusion +├── Put Stable Diffusion checkpoints here.txt +└── v1-5-pruned-emaonly.safetensors +``` + +使用以下命令启动训练任务: + +``` +CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion/train_sd_lora.py \ + --pretrained_path models/stable_diffusion/v1-5-pruned-emaonly.safetensors \ + --dataset_path data/dog \ + --output_path ./models \ + --max_epochs 1 \ + --steps_per_epoch 500 \ + --height 512 \ + --width 512 \ + --center_crop \ + --precision "16-mixed" \ + --learning_rate 1e-4 \ + --lora_rank 4 \ + --lora_alpha 4 \ + --use_gradient_checkpointing +``` + +有关参数的更多信息,请使用 `python examples/train/stable_diffusion/train_sd_lora.py -h` 查看详细信息。 + +训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 + + + +```python +from diffsynth import ModelManager, SDImagePipeline +import torch + +model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", + file_path_list=["models/stable_diffusion/v1-5-pruned-emaonly.safetensors"]) +model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) +pipe = SDImagePipeline.from_model_manager(model_manager) + +torch.manual_seed(0) +image = pipe( + prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", + negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails", + cfg_scale=7.5, + num_inference_steps=100, width=512, height=512, +) +image.save("image_with_lora.jpg") +``` diff --git a/docs/source/finetune/train_sdxl_lora.md b/docs/source/finetune/train_sdxl_lora.md new file mode 100644 index 0000000..e51f092 --- /dev/null +++ b/docs/source/finetune/train_sdxl_lora.md @@ -0,0 +1,57 @@ +# 训练 Stable Diffusion XL LoRA + +训练脚本只需要一个文件。我们支持 [CivitAI](https://civitai.com/) 中的主流检查点。默认情况下,我们使用基础的 Stable Diffusion XL。你可以从 [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors) 或 [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/resolve/master/sd_xl_base_1.0.safetensors) 下载。也可以使用以下代码下载这个文件: + +```python +from diffsynth import download_models + +download_models(["StableDiffusionXL_v1"]) +``` + +``` +models/stable_diffusion_xl +├── Put Stable Diffusion XL checkpoints here.txt +└── sd_xl_base_1.0.safetensors +``` + +我们观察到 Stable Diffusion XL 在 float16 精度下会出现数值精度溢出,因此我们建议用户使用 float32 精度训练,使用以下命令启动训练任务: + +``` +CUDA_VISIBLE_DEVICES="0" python examples/train/stable_diffusion_xl/train_sdxl_lora.py \ + --pretrained_path models/stable_diffusion_xl/sd_xl_base_1.0.safetensors \ + --dataset_path data/dog \ + --output_path ./models \ + --max_epochs 1 \ + --steps_per_epoch 500 \ + --height 1024 \ + --width 1024 \ + --center_crop \ + --precision "32" \ + --learning_rate 1e-4 \ + --lora_rank 4 \ + --lora_alpha 4 \ + --use_gradient_checkpointing +``` + +有关参数的更多信息,请使用 `python examples/train/stable_diffusion_xl/train_sdxl_lora.py -h` 查看详细信息。 + +训练完成后,使用 `model_manager.load_lora` 加载 LoRA 以进行推理。 + +```python +from diffsynth import ModelManager, SDXLImagePipeline +import torch + +model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", + file_path_list=["models/stable_diffusion_xl/sd_xl_base_1.0.safetensors"]) +model_manager.load_lora("models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", lora_alpha=1.0) +pipe = SDXLImagePipeline.from_model_manager(model_manager) + +torch.manual_seed(0) +image = pipe( + prompt="a dog is jumping, flowers around the dog, the background is mountains and clouds", + negative_prompt="bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 3d, cgi, extra tails", + cfg_scale=7.5, + num_inference_steps=100, width=1024, height=1024, +) +image.save("image_with_lora.jpg") +``` diff --git a/docs/source/index.rst b/docs/source/index.rst index af5c933..82f1d74 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -6,28 +6,30 @@ DiffSynth-Studio 文档 ============================== -Add your content using ``reStructuredText`` syntax. See the -`reStructuredText `_ -documentation for details. - +欢迎来到 DiffSynth-Studio,我们旨在构建 Diffusion 模型的开源互联生态,在这里,你可以体验到 AIGC(AI Generated Content)技术魔法般的魅力! .. toctree:: :maxdepth: 1 - :caption: Contents: - - GetStarted/A_simple_example.md - GetStarted/Download_models.md - GetStarted/ModelManager.md - GetStarted/Models.md - GetStarted/Pipelines.md - GetStarted/PromptProcessing.md - GetStarted/Schedulers.md - GetStarted/Fine-tuning.md - GetStarted/Extensions.md - GetStarted/WebUI.md - + :caption: 快速开始 + tutorial/ASimpleExample.md + tutorial/Installation.md + tutorial/DownloadModels.md + tutorial/Models.md + tutorial/Pipelines.md + tutorial/PromptProcessing.md + tutorial/Extensions.md + tutorial/Schedulers.md .. toctree:: :maxdepth: 1 - :caption: API Docs + :caption: 微调 + + finetune/overview.md + finetune/train_flux_lora.md + finetune/train_kolors_lora.md + finetune/train_sd3_lora.md + finetune/train_hunyuan_dit_lora.md + finetune/train_sdxl_lora.md + finetune/train_sd_lora.md + diff --git a/docs/source/tutorial/ASimpleExample.md b/docs/source/tutorial/ASimpleExample.md new file mode 100644 index 0000000..8d47eb7 --- /dev/null +++ b/docs/source/tutorial/ASimpleExample.md @@ -0,0 +1,81 @@ +# 快速开始 + +在这篇文档中,我们通过一段代码为你介绍如何快速上手使用 DiffSynth-Studio 进行创作。 + +## 安装 + +使用以下命令从 GitHub 克隆并安装 DiffSynth-Studio。更多信息请参考[安装](./Installation.md)。 + +```shell +git clone https://github.com/modelscope/DiffSynth-Studio.git +cd DiffSynth-Studio +pip install -e . +``` + +## 下载模型 + +我们在 DiffSynth-Studio 中预置了一些主流 Diffusion 模型的下载链接,你可以直接使用 `download_models` 函数下载预置的模型文件。 + +```python +from diffsynth import download_models + +download_models(["FLUX.1-dev"]) +``` + +我们支持从 [ModelScope](https://www.modelscope.cn/) 和 [HuggingFace](https://huggingface.co/) 下载模型,也支持下载非预置的模型,请参考[模型下载](./DownloadModels.md)。 + +## 加载模型 + +在 DiffSynth-Studio 中,模型由统一的 `ModelManager` 维护。以 FLUX.1-dev 模型为例,模型包括两个文本编码器、一个 DiT、一个 VAE,使用方式如下所示: + +```python +import torch +from diffsynth import ModelManager + +model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda") +model_manager.load_models([ + "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors", + "models/FLUX/FLUX.1-dev/text_encoder_2", + "models/FLUX/FLUX.1-dev/ae.safetensors", + "models/FLUX/FLUX.1-dev/flux1-dev.safetensors" +]) +``` + +你可以把所有想要加载的模型路径放入其中。对于 `.safetensors` 等格式的模型权重文件,`ModelManager` 在加载后会自动判断模型类型;对于文件夹格式的模型,`ModelManager` 会尝试解析其中的 `config.json` 文件并尝试调用 `transformers` 等第三方库中的对应模块。关于 DiffSynth-Studio 支持的模型,请参考[支持的模型](./Models.md)。 + +## 构建 Pipeline + +DiffSynth-Studio 提供了多个推理 `Pipeline`,这些 `Pipeline` 可以直接通过 `ModelManager` 获取所需的模型并初始化。例如,FLUX.1-dev 模型的文生图 `Pipeline` 可以这样构建: + +```python +pipe = FluxImagePipeline.from_model_manager(model_manager) +``` + +更多用于图像生成和视频生成的 `Pipeline` 详见[推理流水线](./Pipelines.md)。 + +## 生成! + +写好你的提示词,交给 DiffSynth-Studio,启动生成任务吧! + +```python +import torch +from diffsynth import ModelManager, FluxImagePipeline + +model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda") +model_manager.load_models([ + "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors", + "models/FLUX/FLUX.1-dev/text_encoder_2", + "models/FLUX/FLUX.1-dev/ae.safetensors", + "models/FLUX/FLUX.1-dev/flux1-dev.safetensors" +]) +pipe = FluxImagePipeline.from_model_manager(model_manager) + +torch.manual_seed(0) +image = pipe( + prompt="In a forest, a wooden plank sign reading DiffSynth", + height=576, width=1024 +) +image.save("image.jpg") +``` + +![image](https://github.com/user-attachments/assets/15a52a2b-2f18-46fe-810c-cb3ad2853919) diff --git a/docs/source/tutorial/DownloadModels.md b/docs/source/tutorial/DownloadModels.md new file mode 100644 index 0000000..f1a227f --- /dev/null +++ b/docs/source/tutorial/DownloadModels.md @@ -0,0 +1,30 @@ +# 下载模型 + +我们在 DiffSynth-Studio 中预置了一些主流 Diffusion 模型的下载链接,你可以轻松地下载并使用这些模型。 + +## 下载预置模型 + +你可以直接使用 `download_models` 函数下载预置的模型文件,其中模型 ID 可参考 [config file](/diffsynth/configs/model_config.py)。 + +```python +from diffsynth import download_models + +download_models(["FLUX.1-dev"]) +``` + +对于 VSCode 用户,激活 Pylance 或其他 Python 语言服务后,在代码中输入 `""` 即可显示支持的所有模型 ID。 + +![image](https://github.com/user-attachments/assets/2bbfec32-e015-45a7-98d9-57af13200b7c) + +## 下载非预置模型 + +你可以选择 [ModelScope](https://modelscope.cn/models) 和 [HuggingFace](https://huggingface.co/models) 两个下载源中的模型。当然,你也可以通过浏览器等工具选择手动下载自己所需的模型。 + +```python +from diffsynth.models.downloader import download_from_huggingface, download_from_modelscope + +# From Modelscope (recommended) +download_from_modelscope("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.bin", "models/kolors/Kolors/vae") +# From Huggingface +download_from_huggingface("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.safetensors", "models/kolors/Kolors/vae") +``` diff --git a/docs/source/GetStarted/Extensions.md b/docs/source/tutorial/Extensions.md similarity index 100% rename from docs/source/GetStarted/Extensions.md rename to docs/source/tutorial/Extensions.md diff --git a/docs/source/GetStarted/Installation.md b/docs/source/tutorial/Installation.md similarity index 68% rename from docs/source/GetStarted/Installation.md rename to docs/source/tutorial/Installation.md index e008d3a..424fc09 100644 --- a/docs/source/GetStarted/Installation.md +++ b/docs/source/tutorial/Installation.md @@ -1,5 +1,7 @@ # 安装 +目前,DiffSynth-Studio 支持从 GitHub 克隆安装或使用 pip 安装,我们建议用户从 GitHub 克隆安装,从而体验最新的功能。 + ## 从源码下载 1. 克隆源码仓库: diff --git a/docs/source/GetStarted/Models.md b/docs/source/tutorial/Models.md similarity index 94% rename from docs/source/GetStarted/Models.md rename to docs/source/tutorial/Models.md index cde502e..e8842ab 100644 --- a/docs/source/GetStarted/Models.md +++ b/docs/source/tutorial/Models.md @@ -2,6 +2,7 @@ 目前为止,DiffSynth Studio 支持的模型如下所示: +* [CogVideo](https://huggingface.co/THUDM/CogVideoX-5b) * [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev) * [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1) * [Kolors](https://huggingface.co/Kwai-Kolors/Kolors) diff --git a/docs/source/GetStarted/Pipelines.md b/docs/source/tutorial/Pipelines.md similarity index 82% rename from docs/source/GetStarted/Pipelines.md rename to docs/source/tutorial/Pipelines.md index 9d5b7de..21810a1 100644 --- a/docs/source/GetStarted/Pipelines.md +++ b/docs/source/tutorial/Pipelines.md @@ -1,27 +1,22 @@ -# Pipelines +# 流水线 -So far, the following table lists our pipelines and the models supported by each pipeline. +DiffSynth-Studio 中包括多个流水线,分为图像生成和视频生成两类。 -## Image Pipelines - -Pipelines for generating images from text descriptions. Each pipeline relies on specific encoder and decoder models. +## 图像生成流水线 | Pipeline | Models | |----------------------------|----------------------------------------------------------------| -| HunyuanDiTImagePipeline | text_encoder: HunyuanDiTCLIPTextEncoder
text_encoder_t5: HunyuanDiTT5TextEncoder
dit: HunyuanDiT
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder | | SDImagePipeline | text_encoder: SDTextEncoder
unet: SDUNet
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder
controlnet: MultiControlNetManager
ipadapter_image_encoder: IpAdapterCLIPImageEmbedder
ipadapter: SDIpAdapter | -| SD3ImagePipeline | text_encoder_1: SD3TextEncoder1
text_encoder_2: SD3TextEncoder2
text_encoder_3: SD3TextEncoder3
dit: SD3DiT
vae_decoder: SD3VAEDecoder
vae_encoder: SD3VAEEncoder | | SDXLImagePipeline | text_encoder: SDXLTextEncoder
text_encoder_2: SDXLTextEncoder2
text_encoder_kolors: ChatGLMModel
unet: SDXLUNet
vae_decoder: SDXLVAEDecoder
vae_encoder: SDXLVAEEncoder
controlnet: MultiControlNetManager
ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder
ipadapter: SDXLIpAdapter | +| SD3ImagePipeline | text_encoder_1: SD3TextEncoder1
text_encoder_2: SD3TextEncoder2
text_encoder_3: SD3TextEncoder3
dit: SD3DiT
vae_decoder: SD3VAEDecoder
vae_encoder: SD3VAEEncoder | +| HunyuanDiTImagePipeline | text_encoder: HunyuanDiTCLIPTextEncoder
text_encoder_t5: HunyuanDiTT5TextEncoder
dit: HunyuanDiT
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder | +| FluxImagePipeline | text_encoder_1: FluxTextEncoder1
text_encoder_2: FluxTextEncoder2
dit: FluxDiT
vae_decoder: FluxVAEDecoder
vae_encoder: FluxVAEEncoder | -## Video Pipelines - -Pipelines for generating videos from text descriptions. In addition to the models required for image generation, they include models for handling motion modules. +## 视频生成流水线 | Pipeline | Models | |----------------------------|----------------------------------------------------------------| | SDVideoPipeline | text_encoder: SDTextEncoder
unet: SDUNet
vae_decoder: SDVAEDecoder
vae_encoder: SDVAEEncoder
controlnet: MultiControlNetManager
ipadapter_image_encoder: IpAdapterCLIPImageEmbedder
ipadapter: SDIpAdapter
motion_modules: SDMotionModel | | SDXLVideoPipeline | text_encoder: SDXLTextEncoder
text_encoder_2: SDXLTextEncoder2
text_encoder_kolors: ChatGLMModel
unet: SDXLUNet
vae_decoder: SDXLVAEDecoder
vae_encoder: SDXLVAEEncoder
ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder
ipadapter: SDXLIpAdapter
motion_modules: SDXLMotionModel | | SVDVideoPipeline | image_encoder: SVDImageEncoder
unet: SVDUNet
vae_encoder: SVDVAEEncoder
vae_decoder: SVDVAEDecoder | - - - +| CogVideoPipeline | text_encoder: FluxTextEncoder2
dit: CogDiT
vae_encoder: CogVAEEncoder
vae_decoder: CogVAEDecoder | diff --git a/docs/source/GetStarted/PromptProcessing.md b/docs/source/tutorial/PromptProcessing.md similarity index 98% rename from docs/source/GetStarted/PromptProcessing.md rename to docs/source/tutorial/PromptProcessing.md index 2fd96ec..539aa5d 100644 --- a/docs/source/GetStarted/PromptProcessing.md +++ b/docs/source/tutorial/PromptProcessing.md @@ -1,4 +1,4 @@ -# 提示词(Prompt)处理 +# 提示词处理 DiffSynth 内置了提示词处理功能,分为: diff --git a/docs/source/GetStarted/Schedulers.md b/docs/source/tutorial/Schedulers.md similarity index 100% rename from docs/source/GetStarted/Schedulers.md rename to docs/source/tutorial/Schedulers.md