DiffSynth-Studio 2.0 major update

2026-03-18 22:08:13 +00:00 · 2025-12-04 16:33:07 +08:00
parent afd101f345
commit 72af7122b3
758 changed files with 26462 additions and 2221398 deletions
--- a/examples/ArtAug/README.md
+++ b/examples/ArtAug/README.md
@@ -1,43 +0,0 @@
-# FLUX Aesthetics Enhancement LoRA
-
-## Introduction
-
-This is a LoRA model trained for FLUX.1-dev, which enhances the aesthetic quality of images generated by the model. The improvements include, but are not limited to: rich details, beautiful lighting and shadows, aesthetic composition, and clear visuals. This model does not require any trigger words.
-
-* Paper: https://arxiv.org/abs/2412.12888
-* Github: https://github.com/modelscope/DiffSynth-Studio
-* Model: [ModelScope](https://www.modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1), [HuggingFace](https://huggingface.co/ECNU-CILab/ArtAug-lora-FLUX.1dev-v1)
-* Demo: [ModelScope](https://modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=7228&modelType=LoRA&sdVersion=FLUX_1&modelUrl=modelscope%3A%2F%2FDiffSynth-Studio%2FArtAug-lora-FLUX.1dev-v1%3Frevision%3Dv1.0), HuggingFace (Coming soon)
-
-## Methodology
-
-![workflow](https://github.com/user-attachments/assets/cee969af-d49f-4480-911c-bedc1c095f9b)
-
-The ArtAug project is inspired by reasoning approaches like GPT-o1, which rely on model interaction and self-correction. We developed a framework aimed at enhancing the capabilities of image generation models through interaction with image understanding models. The training process of ArtAug consists of the following steps:
-
-1. **Synthesis-Understanding Interaction**: After generating an image using the image generation model, we employ a multimodal large language model (Qwen2-VL-72B) to analyze the image content and provide suggestions for modifications, which then lead to the regeneration of a higher quality image.
-   
-2. **Data Generation and Filtering**: Interactive generation involves long inference times and sometimes produce poor image content. Therefore, we generate a large batch of image pairs offline, filter them, and use them for subsequent training.
-
-3. **Differential Training**: We apply differential training techniques to train a LoRA model, enabling it to learn the differences between images before and after enhancement, rather than directly training on the dataset of enhanced images.
-
-4. **Iterative Enhancement**: The trained LoRA model is fused into the base model, and the entire process is repeated multiple times with the fused model until the interaction algorithm no longer provides significant enhancements. The LoRA models produced in each iteration are combined to produce this final model.
-
-This model integrates the aesthetic understanding of Qwen2-VL-72B into FLUX.1[dev], leading to an improvement in the quality of generated images.
-
-## Usage
-
-Please see [./artaug_flux.py](./artaug_flux.py) for more details.
-
-Since this model is encapsulated in the universal FLUX LoRA format, it can be loaded by most LoRA loaders, allowing you to integrate this LoRA model into your own workflow.
-
-## Examples
-
-|FLUX.1-dev|FLUX.1-dev + ArtAug LoRA|
-|-|-|
-|![image_1_base](https://github.com/user-attachments/assets/e1d5c505-b423-45fe-be01-25c2758f5417)|![image_1_enhance](https://github.com/user-attachments/assets/335908e3-d0bd-41c2-9d99-d10528a2d719)|
-|![image_2_base](https://github.com/user-attachments/assets/7f38e8d4-3c62-492e-bd96-be60f0855037)|![image_2_enhance](https://github.com/user-attachments/assets/ae3a1daf-7a7c-44fd-bdbc-1d2a83bc3de3)|
-|![image_3_base](https://github.com/user-attachments/assets/e2ae4879-9202-45d6-9df7-fbcbd2093d19)|![image_3_enhance](https://github.com/user-attachments/assets/4df6e5b9-65de-408b-88c6-51db39aad801)|
-|![image_4_base](https://github.com/user-attachments/assets/dbc65387-60df-4a18-b1bb-45eaa5be5c1d)|![image_4_enhance](https://github.com/user-attachments/assets/fc19860d-3e28-468b-b013-8745255ac6db)|
-|![image_5_base](https://github.com/user-attachments/assets/bb65c1ba-c0c6-4d3b-b3ef-bdbbb5f03a48)|![image_5_enhance](https://github.com/user-attachments/assets/03570c62-9a0b-428f-8c86-6e01c1421202)|
-|![image_6_base](https://github.com/user-attachments/assets/18e9a4e7-2afd-4ca9-bc49-7736042c25dc)|![image_6_enhance](https://github.com/user-attachments/assets/aa73571f-098a-4e65-9eda-b9729ba379cd)|
--- a/examples/ArtAug/artaug_flux.py
+++ b/examples/ArtAug/artaug_flux.py
@@ -1,14 +0,0 @@
-import torch
-from diffsynth import ModelManager, FluxImagePipeline, download_customized_models
-
-lora_path = download_customized_models(
-    model_id="DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1",
-    origin_file_path="merged_lora.safetensors",
-    local_dir="models/lora"
-)[0]
-model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda", model_id_list=["FLUX.1-dev"])
-model_manager.load_lora(lora_path, lora_alpha=1.0)
-pipe = FluxImagePipeline.from_model_manager(model_manager)
-
-image = pipe(prompt="a house", seed=0)
-image.save("image_artaug.jpg")
--- a/examples/CogVideoX/README.md
+++ b/examples/CogVideoX/README.md
@@ -1,39 +0,0 @@
-# CogVideoX
-
-### Example: Text-to-Video using CogVideoX-5B (Experimental)
-
-See [cogvideo_text_to_video.py](cogvideo_text_to_video.py).
-
-First, we generate a video using prompt "an astronaut riding a horse on Mars".
-
-https://github.com/user-attachments/assets/4c91c1cd-e4a0-471a-bd8d-24d761262941
-
-Then, we convert the astronaut to a robot.
-
-https://github.com/user-attachments/assets/225a00a4-2bc8-4740-8e86-a64b460a29ec
-
-Upscale the video using the model itself.
-
-https://github.com/user-attachments/assets/c02cb30c-de60-473c-8242-32c67b3155ad
-
-Make the video look smoother by interpolating frames.
-
-https://github.com/user-attachments/assets/f0e465b4-45df-4435-ab10-7a084ca2b0a0
-
-Here is another example.
-
-First, we generate a video using prompt "a dog is running".
-
-https://github.com/user-attachments/assets/e3696297-99f5-4d0c-a5ca-1d1566db85b4
-
-Then, we add a blue collar to the dog.
-
-https://github.com/user-attachments/assets/7ff22be7-4390-4d33-ae6c-53f6f056e18d
-
-Upscale the video using the model itself.
-
-https://github.com/user-attachments/assets/a909c32c-0b7d-495c-a53c-d23a99a3d3e9
-
-Make the video look smoother by interpolating frames.
-
-https://github.com/user-attachments/assets/ea37c150-97a0-4858-8003-0c2e5eef3331
--- a/examples/CogVideoX/cogvideo_text_to_video.py
+++ b/examples/CogVideoX/cogvideo_text_to_video.py
@@ -1,73 +0,0 @@
-from diffsynth import ModelManager, save_video, VideoData, download_models, CogVideoPipeline
-from diffsynth.extensions.RIFE import RIFEInterpolater
-import torch, os
-os.environ["TOKENIZERS_PARALLELISM"] = "True"
-
-
-
-def text_to_video(model_manager, prompt, seed, output_path):
-    pipe = CogVideoPipeline.from_model_manager(model_manager)
-    torch.manual_seed(seed)
-    video = pipe(
-        prompt=prompt,
-        height=480, width=720,
-        cfg_scale=7.0, num_inference_steps=200
-    )
-    save_video(video, output_path, fps=8, quality=5)
-
-
-def edit_video(model_manager, prompt, seed, input_path, output_path):
-    pipe = CogVideoPipeline.from_model_manager(model_manager)
-    input_video = VideoData(video_file=input_path)
-    torch.manual_seed(seed)
-    video = pipe(
-        prompt=prompt,
-        height=480, width=720,
-        cfg_scale=7.0, num_inference_steps=200,
-        input_video=input_video, denoising_strength=0.7
-    )
-    save_video(video, output_path, fps=8, quality=5)
-
-
-def self_upscale(model_manager, prompt, seed, input_path, output_path):
-    pipe = CogVideoPipeline.from_model_manager(model_manager)
-    input_video = VideoData(video_file=input_path, height=480*2, width=720*2).raw_data()
-    torch.manual_seed(seed)
-    video = pipe(
-        prompt=prompt,
-        height=480*2, width=720*2,
-        cfg_scale=7.0, num_inference_steps=30,
-        input_video=input_video, denoising_strength=0.4, tiled=True
-    )
-    save_video(video, output_path, fps=8, quality=7)
-
-
-def interpolate_video(model_manager, input_path, output_path):
-    rife = RIFEInterpolater.from_model_manager(model_manager)
-    video = VideoData(video_file=input_path).raw_data()
-    video = rife.interpolate(video, num_iter=2)
-    save_video(video, output_path, fps=32, quality=5)
-
-
-
-download_models(["CogVideoX-5B", "RIFE"])
-
-model_manager = ModelManager(torch_dtype=torch.bfloat16)
-model_manager.load_models([
-    "models/CogVideo/CogVideoX-5b/text_encoder",
-    "models/CogVideo/CogVideoX-5b/transformer",
-    "models/CogVideo/CogVideoX-5b/vae/diffusion_pytorch_model.safetensors",
-    "models/RIFE/flownet.pkl",
-])
-
-# Example 1
-text_to_video(model_manager, "an astronaut riding a horse on Mars.", 0, "1_video_1.mp4")
-edit_video(model_manager, "a white robot riding a horse on Mars.", 1, "1_video_1.mp4", "1_video_2.mp4")
-self_upscale(model_manager, "a white robot riding a horse on Mars.", 2, "1_video_2.mp4", "1_video_3.mp4")
-interpolate_video(model_manager, "1_video_3.mp4", "1_video_4.mp4")
-
-# Example 2
-text_to_video(model_manager, "a dog is running.", 1, "2_video_1.mp4")
-edit_video(model_manager, "a dog with blue collar.", 2, "2_video_1.mp4", "2_video_2.mp4")
-self_upscale(model_manager, "a dog with blue collar.", 3, "2_video_2.mp4", "2_video_3.mp4")
-interpolate_video(model_manager, "2_video_3.mp4", "2_video_4.mp4")
--- a/examples/ControlNet/README.md
+++ b/examples/ControlNet/README.md
@@ -1,91 +0,0 @@
-# ControlNet
-
-We provide extensive ControlNet support. Taking the FLUX model as an example, we support many different ControlNet models that can be freely combined, even if their structures differ. Additionally, ControlNet models are compatible with high-resolution refinement and partition control techniques, enabling very powerful controllable image generation.
-
-These examples are in [`flux_controlnet.py`](./flux_controlnet.py).
-
-## Canny/Depth/Normal: Structure Control
-
-Structural control is the most fundamental capability of the ControlNet model. By using Canny to extract edge information, or by utilizing depth maps and normal maps, we can extract the structure of an image, which can then serve as control information during the image generation process.
-
-Model link: https://modelscope.cn/models/InstantX/FLUX.1-dev-Controlnet-Union-alpha
-
-For example, if we generate an image of a cat and use a model like InstantX/FLUX.1-dev-Controlnet-Union-alpha that supports multiple control conditions, we can simultaneously enable both Canny and Depth controls to transform the environment into a twilight setting.
-
-|![image_5](https://github.com/user-attachments/assets/19d2abc4-36ae-4163-a8da-df5732d1a737)|![image_6](https://github.com/user-attachments/assets/28378271-3782-484c-bd51-3d3311dd85c6)|
-|-|-|
-
-The control strength of ControlNet for structure can be adjusted. For example, in the case below, when we move the girl from summer to winter, we can appropriately lower the control strength of ControlNet so that the model will adapt to the content of the image and change her into warm clothes.
-
-|![image_7](https://github.com/user-attachments/assets/a7b8555b-bfd9-4e92-aa77-16bca81b07e3)|![image_8](https://github.com/user-attachments/assets/a1bab36b-6cce-4f29-8233-4cb824b524a8)|
-|-|-|
-
-## Upscaler/Tile/Blur: High-Resolution Image Synthesis
-
-There are many ControlNet models that support high definition, such as:
-
-Model link: https://modelscope.cn/models/jasperai/Flux.1-dev-Controlnet-Upscaler, https://modelscope.cn/models/InstantX/FLUX.1-dev-Controlnet-Union-alpha, https://modelscope.cn/models/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro
-
-These models can transform blurry, noisy low-quality images into clear ones. In DiffSynth-Studio, the native high-resolution patch processing technology supported by the framework can overcome the resolution limitations of the models, enabling image generation at resolutions of 2048 or even higher, significantly enhancing the capabilities of these models. In the example below, we can see that in the high-definition image enlarged to 2048 resolution, the cat's fur is rendered in exquisite detail, and the skin texture of the characters is delicate and realistic.
-
-|![image_1](https://github.com/user-attachments/assets/9038158a-118c-4ad7-ab01-22865f6a06fc)|![image_2](https://github.com/user-attachments/assets/88583a33-cd74-4cb9-8fd4-c6e14c0ada0c)|
-|-|-|
-
-|![image_3](https://github.com/user-attachments/assets/13061ecf-bb57-448a-82c6-7e4655c9cd85)|![image_4](https://github.com/user-attachments/assets/0b7ae80f-de58-4d1d-a49c-ad17e7631bdc)|
-|-|-|
-
-## Inpaint: Image Restoration
-
-The Inpaint ControlNet model can repaint specific areas in an image. For example, we can put sunglasses on a cat.
-
-Model link: https://modelscope.cn/models/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta
-
-|![image_9](https://github.com/user-attachments/assets/babddad0-2d67-4624-b77a-c953250ebdab)|![mask_9](https://github.com/user-attachments/assets/d5bc2878-1817-457a-bdfa-200f955233d3)|![image_10](https://github.com/user-attachments/assets/e3197f2c-190b-4522-83ab-a2e0451b39f6)|
-|-|-|-|
-
-However, we noticed that the head movements of the cat have changed. If we want to preserve the original structural features, we can use the Canny, Depth, and Normal models. DiffSynth-Studio provides seamless support for ControlNet of different structures. By using a Normal ControlNet, we can ensure that the structure of the image remains unchanged during local redrawing.
-
-Model link: https://modelscope.cn/models/jasperai/Flux.1-dev-Controlnet-Surface-Normals
-
-|![image_11](https://github.com/user-attachments/assets/c028e6fc-5125-4cba-b35a-b6211c2e6600)|![mask_11](https://github.com/user-attachments/assets/1928ee9a-7594-4c6e-9c71-5bd0b043d8f4)|![image_12](https://github.com/user-attachments/assets/97b3b9e1-f821-405e-971b-9e1c31a209aa)|
-|-|-|-|
-
-## MultiControlNet+MultiDiffusion: Fine-Grained Control
-
-DiffSynth-Studio not only supports the simultaneous activation of multiple ControlNet structures, but also allows for the partitioned control of content within an image using different prompts. Additionally, it supports the chunk processing of ultra-high-resolution large images, enabling us to achieve extremely detailed high-level control. Next, we will showcase the creative process behind a beautiful image.
-
-First, use the prompt "a beautiful Asian woman and a cat on a bed. The woman wears a dress" to generate a cat and a young girl.
-
-![image_13](https://github.com/user-attachments/assets/8da006e4-0e68-4fa5-b407-31ef5dbe8e5a)
-
-Then, enable Inpaint ControlNet and Canny ControlNet.
-
-Model link: https://modelscope.cn/models/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta, https://modelscope.cn/models/InstantX/FLUX.1-dev-Controlnet-Union-alpha
-
-We control the image using two component.
-
-|Prompt: an orange cat, highly detailed|Prompt: a girl wearing a red camisole|
-|-|-|
-|![mask_13_1](https://github.com/user-attachments/assets/188530a0-913c-48db-a7f1-62f0384bfdc3)|![mask_13_2](https://github.com/user-attachments/assets/99c4d0d5-8cc3-47a0-8e56-ceb37db4dfdc)|
-
-Generate!
-
-![image_14](https://github.com/user-attachments/assets/f5b9d3dd-a690-4597-91a8-a019c6fc2523)
-
-The background is a bit blurry, so we use deblurring LoRA for image-to-image generation.
-
-Model link: https://modelscope.cn/models/LiblibAI/FLUX.1-dev-LoRA-AntiBlur
-
-![image_15](https://github.com/user-attachments/assets/32ed2667-2260-4d80-aaa9-4435d6920a2a)
-
-The entire image is much clearer now. Next, let's use the high-definition model to increase the resolution to 4096*4096!
-
-Model link: https://modelscope.cn/models/jasperai/Flux.1-dev-Controlnet-Upscaler
-
-![image_17](https://github.com/user-attachments/assets/1a688a12-1544-4973-8aca-aa3a23cb34c1)
-
-Zoom in to see details.
-
-![image_17_cropped](https://github.com/user-attachments/assets/461a1fbc-9ffa-4da5-80fd-e1af9667c804)
-
-Enjoy!
--- a/examples/ControlNet/flux_controlnet.py
+++ b/examples/ControlNet/flux_controlnet.py
@@ -1,299 +0,0 @@
-from diffsynth import ModelManager, FluxImagePipeline, ControlNetConfigUnit, download_models, download_customized_models
-import torch
-from PIL import Image
-import numpy as np
-
-
-
-def example_1():
-    model_manager = ModelManager(torch_dtype=torch.bfloat16, model_id_list=["FLUX.1-dev", "jasperai/Flux.1-dev-Controlnet-Upscaler"])
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="tile",
-            model_path="models/ControlNet/jasperai/Flux.1-dev-Controlnet-Upscaler/diffusion_pytorch_model.safetensors",
-            scale=0.7
-        ),
-    ])
-
-    image_1 = pipe(
-        prompt="a photo of a cat, highly detailed",
-        height=768, width=768,
-        seed=0
-    )
-    image_1.save("image_1.jpg")
-
-    image_2 = pipe(
-        prompt="a photo of a cat, highly detailed",
-        controlnet_image=image_1.resize((2048, 2048)),
-        input_image=image_1.resize((2048, 2048)), denoising_strength=0.99,
-        height=2048, width=2048, tiled=True,
-        seed=1
-    )
-    image_2.save("image_2.jpg")
-
-
-
-def example_2():
-    model_manager = ModelManager(torch_dtype=torch.bfloat16, model_id_list=["FLUX.1-dev", "jasperai/Flux.1-dev-Controlnet-Upscaler"])
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="tile",
-            model_path="models/ControlNet/jasperai/Flux.1-dev-Controlnet-Upscaler/diffusion_pytorch_model.safetensors",
-            scale=0.7
-        ),
-    ])
-
-    image_1 = pipe(
-        prompt="a beautiful Chinese girl, delicate skin texture",
-        height=768, width=768,
-        seed=2
-    )
-    image_1.save("image_3.jpg")
-
-    image_2 = pipe(
-        prompt="a beautiful Chinese girl, delicate skin texture",
-        controlnet_image=image_1.resize((2048, 2048)),
-        input_image=image_1.resize((2048, 2048)), denoising_strength=0.99,
-        height=2048, width=2048, tiled=True,
-        seed=3
-    )
-    image_2.save("image_4.jpg")
-
-
-def example_3():
-    model_manager = ModelManager(torch_dtype=torch.bfloat16, model_id_list=["FLUX.1-dev", "InstantX/FLUX.1-dev-Controlnet-Union-alpha"])
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="canny",
-            model_path="models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-            scale=0.3
-        ),
-        ControlNetConfigUnit(
-            processor_id="depth",
-            model_path="models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-            scale=0.3
-        ),
-    ])
-
-    image_1 = pipe(
-        prompt="a cat is running",
-        height=1024, width=1024,
-        seed=4
-    )
-    image_1.save("image_5.jpg")
-
-    image_2 = pipe(
-        prompt="sunshine, a cat is running",
-        controlnet_image=image_1,
-        height=1024, width=1024,
-        seed=5
-    )
-    image_2.save("image_6.jpg")
-
-
-def example_4():
-    model_manager = ModelManager(torch_dtype=torch.bfloat16, model_id_list=["FLUX.1-dev", "InstantX/FLUX.1-dev-Controlnet-Union-alpha"])
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="canny",
-            model_path="models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-            scale=0.3
-        ),
-        ControlNetConfigUnit(
-            processor_id="depth",
-            model_path="models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-            scale=0.3
-        ),
-    ])
-
-    image_1 = pipe(
-        prompt="a beautiful Asian girl, full body, red dress, summer",
-        height=1024, width=1024,
-        seed=6
-    )
-    image_1.save("image_7.jpg")
-
-    image_2 = pipe(
-        prompt="a beautiful Asian girl, full body, red dress, winter",
-        controlnet_image=image_1,
-        height=1024, width=1024,
-        seed=7
-    )
-    image_2.save("image_8.jpg")
-
-
-
-def example_5():
-    model_manager = ModelManager(torch_dtype=torch.bfloat16, model_id_list=["FLUX.1-dev", "alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta"])
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="inpaint",
-            model_path="models/ControlNet/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta/diffusion_pytorch_model.safetensors",
-            scale=0.9
-        ),
-    ])
-
-    image_1 = pipe(
-        prompt="a cat sitting on a chair",
-        height=1024, width=1024,
-        seed=8
-    )
-    image_1.save("image_9.jpg")
-
-    mask = np.zeros((1024, 1024, 3), dtype=np.uint8)
-    mask[100:350, 350: -300] = 255
-    mask = Image.fromarray(mask)
-    mask.save("mask_9.jpg")
-
-    image_2 = pipe(
-        prompt="a cat sitting on a chair, wearing sunglasses",
-        controlnet_image=image_1, controlnet_inpaint_mask=mask,
-        height=1024, width=1024,
-        seed=9
-    )
-    image_2.save("image_10.jpg")
-
-
-
-def example_6():
-    model_manager = ModelManager(torch_dtype=torch.bfloat16, model_id_list=[
-        "FLUX.1-dev",
-        "jasperai/Flux.1-dev-Controlnet-Surface-Normals",
-        "alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta"
-    ])
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="inpaint",
-            model_path="models/ControlNet/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta/diffusion_pytorch_model.safetensors",
-            scale=0.9
-        ),
-        ControlNetConfigUnit(
-            processor_id="normal",
-            model_path="models/ControlNet/jasperai/Flux.1-dev-Controlnet-Surface-Normals/diffusion_pytorch_model.safetensors",
-            scale=0.6
-        ),
-    ])
-
-    image_1 = pipe(
-        prompt="a beautiful Asian woman looking at the sky, wearing a blue t-shirt.",
-        height=1024, width=1024,
-        seed=10
-    )
-    image_1.save("image_11.jpg")
-
-    mask = np.zeros((1024, 1024, 3), dtype=np.uint8)
-    mask[-400:, 10:-40] = 255
-    mask = Image.fromarray(mask)
-    mask.save("mask_11.jpg")
-
-    image_2 = pipe(
-        prompt="a beautiful Asian woman looking at the sky, wearing a yellow t-shirt.",
-        controlnet_image=image_1, controlnet_inpaint_mask=mask,
-        height=1024, width=1024,
-        seed=11
-    )
-    image_2.save("image_12.jpg")
-
-
-def example_7():
-    model_manager = ModelManager(torch_dtype=torch.bfloat16, model_id_list=[
-        "FLUX.1-dev",
-        "InstantX/FLUX.1-dev-Controlnet-Union-alpha",
-        "alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta",
-        "jasperai/Flux.1-dev-Controlnet-Upscaler",
-    ])
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="inpaint",
-            model_path="models/ControlNet/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta/diffusion_pytorch_model.safetensors",
-            scale=0.9
-        ),
-        ControlNetConfigUnit(
-            processor_id="canny",
-            model_path="models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-            scale=0.5
-        ),
-    ])
-
-    image_1 = pipe(
-        prompt="a beautiful Asian woman and a cat on a bed. The woman wears a dress.",
-        height=1024, width=1024,
-        seed=100
-    )
-    image_1.save("image_13.jpg")
-
-    mask_global = np.zeros((1024, 1024, 3), dtype=np.uint8)
-    mask_global = Image.fromarray(mask_global)
-    mask_global.save("mask_13_global.jpg")
-
-    mask_1 = np.zeros((1024, 1024, 3), dtype=np.uint8)
-    mask_1[300:-100, 30: 450] = 255
-    mask_1 = Image.fromarray(mask_1)
-    mask_1.save("mask_13_1.jpg")
-
-    mask_2 = np.zeros((1024, 1024, 3), dtype=np.uint8)
-    mask_2[500:-100, -400:] = 255
-    mask_2[-200:-100, -500:-400] = 255
-    mask_2 = Image.fromarray(mask_2)
-    mask_2.save("mask_13_2.jpg")
-
-    image_2 = pipe(
-        prompt="a beautiful Asian woman and a cat on a bed. The woman wears a dress.",
-        controlnet_image=image_1, controlnet_inpaint_mask=mask_global,
-        local_prompts=["an orange cat, highly detailed", "a girl wearing a red camisole"], masks=[mask_1, mask_2], mask_scales=[10.0, 10.0],
-        height=1024, width=1024,
-        seed=101
-    )
-    image_2.save("image_14.jpg")
-
-    model_manager.load_lora("models/lora/FLUX-dev-lora-AntiBlur.safetensors", lora_alpha=2)
-    image_3 = pipe(
-        prompt="a beautiful Asian woman wearing a red camisole and an orange cat on a bed. clear background.",
-        negative_prompt="blur, blurry",
-        input_image=image_2, denoising_strength=0.7,
-        height=1024, width=1024,
-        cfg_scale=2.0, num_inference_steps=50,
-        seed=102
-    )
-    image_3.save("image_15.jpg")
-
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="tile",
-            model_path="models/ControlNet/jasperai/Flux.1-dev-Controlnet-Upscaler/diffusion_pytorch_model.safetensors",
-            scale=0.7
-        ),
-    ])
-    image_4 = pipe(
-        prompt="a beautiful Asian woman wearing a red camisole and an orange cat on a bed. highly detailed, delicate skin texture, clear background.",
-        controlnet_image=image_3.resize((2048, 2048)),
-        input_image=image_3.resize((2048, 2048)), denoising_strength=0.99,
-        height=2048, width=2048, tiled=True,
-        seed=103
-    )
-    image_4.save("image_16.jpg")
-
-    image_5 = pipe(
-        prompt="a beautiful Asian woman wearing a red camisole and an orange cat on a bed. highly detailed, delicate skin texture, clear background.",
-        controlnet_image=image_4.resize((4096, 4096)),
-        input_image=image_4.resize((4096, 4096)), denoising_strength=0.99,
-        height=4096, width=4096, tiled=True,
-        seed=104
-    )
-    image_5.save("image_17.jpg")
-
-
-
-download_models(["Annotators:Depth", "Annotators:Normal"])
-download_customized_models(
-    model_id="LiblibAI/FLUX.1-dev-LoRA-AntiBlur",
-    origin_file_path="FLUX-dev-lora-AntiBlur.safetensors",
-    local_dir="models/lora"
-)
-example_1()
-example_2()
-example_3()
-example_4()
-example_5()
-example_6()
-example_7()
--- a/examples/ControlNet/flux_controlnet_quantization.py
+++ b/examples/ControlNet/flux_controlnet_quantization.py
@@ -1,447 +0,0 @@
-from diffsynth import ModelManager, FluxImagePipeline, ControlNetConfigUnit, download_models, download_customized_models
-import torch
-from PIL import Image
-import numpy as np
-
-
-
-def example_1():
-    download_models(["FLUX.1-dev", "jasperai/Flux.1-dev-Controlnet-Upscaler"])
-    model_manager = ModelManager(
-            torch_dtype=torch.bfloat16,
-            device="cpu" 
-        )
-    model_manager.load_models([
-        "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
-        "models/FLUX/FLUX.1-dev/text_encoder_2",
-        "models/FLUX/FLUX.1-dev/ae.safetensors",
-        ])
-    model_manager.load_models(
-        ["models/FLUX/FLUX.1-dev/flux1-dev.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    model_manager.load_models(
-        ["models/ControlNet/jasperai/Flux.1-dev-Controlnet-Upscaler/diffusion_pytorch_model.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="tile",
-            model_path="models/ControlNet/jasperai/Flux.1-dev-Controlnet-Upscaler/diffusion_pytorch_model.safetensors",
-            scale=0.7
-        ),
-    ],device="cuda")
-    pipe.enable_cpu_offload()
-    pipe.dit.quantize()
-    for model in pipe.controlnet.models:
-        model.quantize()
-
-    image_1 = pipe(
-        prompt="a photo of a cat, highly detailed",
-        height=768, width=768,
-        seed=0
-    )
-    image_1.save("image_1.jpg")
-
-    image_2 = pipe(
-        prompt="a photo of a cat, highly detailed",
-        controlnet_image=image_1.resize((2048, 2048)),
-        input_image=image_1.resize((2048, 2048)), denoising_strength=0.99,
-        height=2048, width=2048, tiled=True,
-        seed=1
-    )
-    image_2.save("image_2.jpg")
-
-
-
-def example_2():
-    download_models(["FLUX.1-dev", "jasperai/Flux.1-dev-Controlnet-Upscaler"])
-    model_manager = ModelManager(
-            torch_dtype=torch.bfloat16,
-            device="cpu" 
-        )
-    model_manager.load_models([
-        "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
-        "models/FLUX/FLUX.1-dev/text_encoder_2",
-        "models/FLUX/FLUX.1-dev/ae.safetensors",
-        ])
-    model_manager.load_models(
-        ["models/FLUX/FLUX.1-dev/flux1-dev.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    model_manager.load_models(
-        ["models/ControlNet/jasperai/Flux.1-dev-Controlnet-Upscaler/diffusion_pytorch_model.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="tile",
-            model_path="models/ControlNet/jasperai/Flux.1-dev-Controlnet-Upscaler/diffusion_pytorch_model.safetensors",
-            scale=0.7
-        ),
-    ],device="cuda")
-    pipe.enable_cpu_offload()
-    pipe.dit.quantize()
-    for model in pipe.controlnet.models:
-        model.quantize()
-    image_1 = pipe(
-        prompt="a beautiful Chinese girl, delicate skin texture",
-        height=768, width=768,
-        seed=2
-    )
-    image_1.save("image_3.jpg")
-
-    image_2 = pipe(
-        prompt="a beautiful Chinese girl, delicate skin texture",
-        controlnet_image=image_1.resize((2048, 2048)),
-        input_image=image_1.resize((2048, 2048)), denoising_strength=0.99,
-        height=2048, width=2048, tiled=True,
-        seed=3
-    )
-    image_2.save("image_4.jpg")
-
-
-def example_3():
-    download_models(["FLUX.1-dev", "InstantX/FLUX.1-dev-Controlnet-Union-alpha"])
-    model_manager = ModelManager(
-            torch_dtype=torch.bfloat16,
-            device="cpu" 
-        )
-    model_manager.load_models([
-        "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
-        "models/FLUX/FLUX.1-dev/text_encoder_2",
-        "models/FLUX/FLUX.1-dev/ae.safetensors",
-        ])
-    model_manager.load_models(
-        ["models/FLUX/FLUX.1-dev/flux1-dev.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    model_manager.load_models(
-        ["models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="canny",
-            model_path="models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-            scale=0.3
-        ),
-        ControlNetConfigUnit(
-            processor_id="depth",
-            model_path="models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-            scale=0.3
-        ),
-    ],device="cuda")
-    pipe.enable_cpu_offload()
-    pipe.dit.quantize()
-    for model in pipe.controlnet.models:
-        model.quantize()
-    image_1 = pipe(
-        prompt="a cat is running",
-        height=1024, width=1024,
-        seed=4
-    )
-    image_1.save("image_5.jpg")
-
-    image_2 = pipe(
-        prompt="sunshine, a cat is running",
-        controlnet_image=image_1,
-        height=1024, width=1024,
-        seed=5
-    )
-    image_2.save("image_6.jpg")
-
-
-def example_4():
-    download_models(["FLUX.1-dev", "InstantX/FLUX.1-dev-Controlnet-Union-alpha"])
-    model_manager = ModelManager(
-            torch_dtype=torch.bfloat16,
-            device="cpu" 
-        )
-    model_manager.load_models([
-        "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
-        "models/FLUX/FLUX.1-dev/text_encoder_2",
-        "models/FLUX/FLUX.1-dev/ae.safetensors",
-        ])
-    model_manager.load_models(
-        ["models/FLUX/FLUX.1-dev/flux1-dev.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    model_manager.load_models(
-        ["models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="canny",
-            model_path="models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-            scale=0.3
-        ),
-        ControlNetConfigUnit(
-            processor_id="depth",
-            model_path="models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-            scale=0.3
-        ),
-    ],device="cuda")
-    pipe.enable_cpu_offload()
-    pipe.dit.quantize()
-    for model in pipe.controlnet.models:
-        model.quantize()
-    image_1 = pipe(
-        prompt="a beautiful Asian girl, full body, red dress, summer",
-        height=1024, width=1024,
-        seed=6
-    )
-    image_1.save("image_7.jpg")
-
-    image_2 = pipe(
-        prompt="a beautiful Asian girl, full body, red dress, winter",
-        controlnet_image=image_1,
-        height=1024, width=1024,
-        seed=7
-    )
-    image_2.save("image_8.jpg")
-
-
-
-def example_5():
-    download_models(["FLUX.1-dev", "alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta"])
-    model_manager = ModelManager(
-            torch_dtype=torch.bfloat16,
-            device="cpu" 
-        )
-    model_manager.load_models([
-        "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
-        "models/FLUX/FLUX.1-dev/text_encoder_2",
-        "models/FLUX/FLUX.1-dev/ae.safetensors",
-        ])
-    model_manager.load_models(
-        ["models/FLUX/FLUX.1-dev/flux1-dev.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    model_manager.load_models(
-        ["models/ControlNet/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta/diffusion_pytorch_model.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="inpaint",
-            model_path="models/ControlNet/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta/diffusion_pytorch_model.safetensors",
-            scale=0.9
-        ),
-    ],device="cuda")
-    pipe.enable_cpu_offload()
-    pipe.dit.quantize()
-    for model in pipe.controlnet.models:
-        model.quantize()
-    image_1 = pipe(
-        prompt="a cat sitting on a chair",
-        height=1024, width=1024,
-        seed=8
-    )
-    image_1.save("image_9.jpg")
-
-    mask = np.zeros((1024, 1024, 3), dtype=np.uint8)
-    mask[100:350, 350: -300] = 255
-    mask = Image.fromarray(mask)
-    mask.save("mask_9.jpg")
-
-    image_2 = pipe(
-        prompt="a cat sitting on a chair, wearing sunglasses",
-        controlnet_image=image_1, controlnet_inpaint_mask=mask,
-        height=1024, width=1024,
-        seed=9
-    )
-    image_2.save("image_10.jpg")
-
-
-
-def example_6():
-    download_models([
-        "FLUX.1-dev",
-        "jasperai/Flux.1-dev-Controlnet-Surface-Normals",
-        "alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta"
-    ])
-    model_manager = ModelManager(
-            torch_dtype=torch.bfloat16,
-            device="cpu" 
-        )
-    model_manager.load_models([
-        "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
-        "models/FLUX/FLUX.1-dev/text_encoder_2",
-        "models/FLUX/FLUX.1-dev/ae.safetensors",
-        ])
-    model_manager.load_models(
-        ["models/FLUX/FLUX.1-dev/flux1-dev.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    model_manager.load_models(
-        ["models/ControlNet/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta/diffusion_pytorch_model.safetensors",
-         "models/ControlNet/jasperai/Flux.1-dev-Controlnet-Surface-Normals/diffusion_pytorch_model.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="inpaint",
-            model_path="models/ControlNet/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta/diffusion_pytorch_model.safetensors",
-            scale=0.9
-        ),
-        ControlNetConfigUnit(
-            processor_id="normal",
-            model_path="models/ControlNet/jasperai/Flux.1-dev-Controlnet-Surface-Normals/diffusion_pytorch_model.safetensors",
-            scale=0.6
-        ),
-    ],device="cuda")
-    pipe.enable_cpu_offload()
-    pipe.dit.quantize()
-    for model in pipe.controlnet.models:
-        model.quantize()
-    image_1 = pipe(
-        prompt="a beautiful Asian woman looking at the sky, wearing a blue t-shirt.",
-        height=1024, width=1024,
-        seed=10
-    )
-    image_1.save("image_11.jpg")
-
-    mask = np.zeros((1024, 1024, 3), dtype=np.uint8)
-    mask[-400:, 10:-40] = 255
-    mask = Image.fromarray(mask)
-    mask.save("mask_11.jpg")
-
-    image_2 = pipe(
-        prompt="a beautiful Asian woman looking at the sky, wearing a yellow t-shirt.",
-        controlnet_image=image_1, controlnet_inpaint_mask=mask,
-        height=1024, width=1024,
-        seed=11
-    )
-    image_2.save("image_12.jpg")
-
-
-def example_7():
-    download_models([
-        "FLUX.1-dev",
-        "InstantX/FLUX.1-dev-Controlnet-Union-alpha",
-        "alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta",
-        "jasperai/Flux.1-dev-Controlnet-Upscaler",
-    ])
-    model_manager = ModelManager(
-            torch_dtype=torch.bfloat16,
-            device="cpu"
-        )
-    model_manager.load_models([
-        "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
-        "models/FLUX/FLUX.1-dev/text_encoder_2",
-        "models/FLUX/FLUX.1-dev/ae.safetensors",
-        ])
-    model_manager.load_models(
-        ["models/FLUX/FLUX.1-dev/flux1-dev.safetensors"],
-        torch_dtype=torch.float8_e4m3fn
-    )
-    model_manager.load_models(
-        ["models/ControlNet/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta/diffusion_pytorch_model.safetensors",
-         "models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-         "models/ControlNet/jasperai/Flux.1-dev-Controlnet-Upscaler/diffusion_pytorch_model.safetensors"],
-        torch_dtype=torch.float8_e4m3fn 
-    )
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="inpaint",
-            model_path="models/ControlNet/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta/diffusion_pytorch_model.safetensors",
-            scale=0.9
-        ),
-        ControlNetConfigUnit(
-            processor_id="canny",
-            model_path="models/ControlNet/InstantX/FLUX.1-dev-Controlnet-Union-alpha/diffusion_pytorch_model.safetensors",
-            scale=0.5
-        ),
-    ],device="cuda")
-    pipe.enable_cpu_offload()
-    pipe.dit.quantize()
-    for model in pipe.controlnet.models:
-        model.quantize()
-    image_1 = pipe(
-        prompt="a beautiful Asian woman and a cat on a bed. The woman wears a dress.",
-        height=1024, width=1024,
-        seed=100
-    )
-    image_1.save("image_13.jpg")
-
-    mask_global = np.zeros((1024, 1024, 3), dtype=np.uint8)
-    mask_global = Image.fromarray(mask_global)
-    mask_global.save("mask_13_global.jpg")
-
-    mask_1 = np.zeros((1024, 1024, 3), dtype=np.uint8)
-    mask_1[300:-100, 30: 450] = 255
-    mask_1 = Image.fromarray(mask_1)
-    mask_1.save("mask_13_1.jpg")
-
-    mask_2 = np.zeros((1024, 1024, 3), dtype=np.uint8)
-    mask_2[500:-100, -400:] = 255
-    mask_2[-200:-100, -500:-400] = 255
-    mask_2 = Image.fromarray(mask_2)
-    mask_2.save("mask_13_2.jpg")
-
-    image_2 = pipe(
-        prompt="a beautiful Asian woman and a cat on a bed. The woman wears a dress.",
-        controlnet_image=image_1, controlnet_inpaint_mask=mask_global,
-        local_prompts=["an orange cat, highly detailed", "a girl wearing a red camisole"], masks=[mask_1, mask_2], mask_scales=[10.0, 10.0],
-        height=1024, width=1024,
-        seed=101
-    )
-    image_2.save("image_14.jpg")
-
-    model_manager.load_lora("models/lora/FLUX-dev-lora-AntiBlur.safetensors", lora_alpha=2)
-    image_3 = pipe(
-        prompt="a beautiful Asian woman wearing a red camisole and an orange cat on a bed. clear background.",
-        negative_prompt="blur, blurry",
-        input_image=image_2, denoising_strength=0.7,
-        height=1024, width=1024,
-        cfg_scale=2.0, num_inference_steps=50,
-        seed=102
-    )
-    image_3.save("image_15.jpg")
-
-    pipe = FluxImagePipeline.from_model_manager(model_manager, controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="tile",
-            model_path="models/ControlNet/jasperai/Flux.1-dev-Controlnet-Upscaler/diffusion_pytorch_model.safetensors",
-            scale=0.7
-        ),
-    ],device="cuda")
-    pipe.enable_cpu_offload()
-    pipe.dit.quantize()
-    for model in pipe.controlnet.models:
-        model.quantize()
-    image_4 = pipe(
-        prompt="a beautiful Asian woman wearing a red camisole and an orange cat on a bed. highly detailed, delicate skin texture, clear background.",
-        controlnet_image=image_3.resize((2048, 2048)),
-        input_image=image_3.resize((2048, 2048)), denoising_strength=0.99,
-        height=2048, width=2048, tiled=True,
-        seed=103
-    )
-    image_4.save("image_16.jpg")
-
-    image_5 = pipe(
-        prompt="a beautiful Asian woman wearing a red camisole and an orange cat on a bed. highly detailed, delicate skin texture, clear background.",
-        controlnet_image=image_4.resize((4096, 4096)),
-        input_image=image_4.resize((4096, 4096)), denoising_strength=0.99,
-        height=4096, width=4096, tiled=True,
-        seed=104
-    )
-    image_5.save("image_17.jpg")
-
-
-
-download_models(["Annotators:Depth", "Annotators:Normal"])
-download_customized_models(
-    model_id="LiblibAI/FLUX.1-dev-LoRA-AntiBlur",
-    origin_file_path="FLUX-dev-lora-AntiBlur.safetensors",
-    local_dir="models/lora"
-)
-example_1()
-example_2()
-example_3()
-example_4()
-example_5()
-example_6()
-example_7()
--- a/examples/Diffutoon/Diffutoon.ipynb
+++ b/examples/Diffutoon/Diffutoon.ipynb
@@ -1,512 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "8ObdI5jCB8xy"
-      },
-      "source": [
-        "# DiffSynth Studio\n",
-        "\n",
-        "Welcome to DiffSynth Studio! This is an example of Diffutoon."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "XSkKX7O2BwuM"
-      },
-      "source": [
-        "## Install"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "msCpt0pLnT8W",
-        "outputId": "35d93b35-451b-4760-d1ee-ef7ff190916e"
-      },
-      "outputs": [],
-      "source": [
-        "!git clone https://github.com/Artiprocher/DiffSynth-Studio.git\n",
-        "!pip install -q transformers controlnet-aux==0.0.7 streamlit streamlit-drawable-canvas imageio imageio[ffmpeg] safetensors einops cupy-cuda12x\n",
-        "%cd /content/DiffSynth-Studio"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "5eCu_rlKB3kK"
-      },
-      "source": [
-        "## Download Models"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "9znMkpVj3qZ1"
-      },
-      "outputs": [],
-      "source": [
-        "import requests\n",
-        "\n",
-        "\n",
-        "def download_model(url, file_path):\n",
-        "  model_file = requests.get(url, allow_redirects=True)\n",
-        "  with open(file_path, \"wb\") as f:\n",
-        "    f.write(model_file.content)\n",
-        "\n",
-        "download_model(\"https://civitai.com/api/download/models/229575\", \"models/stable_diffusion/aingdiffusion_v12.safetensors\")\n",
-        "download_model(\"https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15_v2.ckpt\", \"models/AnimateDiff/mm_sd_v15_v2.ckpt\")\n",
-        "download_model(\"https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth\", \"models/ControlNet/control_v11p_sd15_lineart.pth\")\n",
-        "download_model(\"https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.pth\", \"models/ControlNet/control_v11f1e_sd15_tile.pth\")\n",
-        "download_model(\"https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth\", \"models/ControlNet/control_v11f1p_sd15_depth.pth\")\n",
-        "download_model(\"https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.pth\", \"models/ControlNet/control_v11p_sd15_softedge.pth\")\n",
-        "download_model(\"https://huggingface.co/lllyasviel/Annotators/resolve/main/dpt_hybrid-midas-501f0c75.pt\", \"models/Annotators/dpt_hybrid-midas-501f0c75.pt\")\n",
-        "download_model(\"https://huggingface.co/lllyasviel/Annotators/resolve/main/ControlNetHED.pth\", \"models/Annotators/ControlNetHED.pth\")\n",
-        "download_model(\"https://huggingface.co/lllyasviel/Annotators/resolve/main/sk_model.pth\", \"models/Annotators/sk_model.pth\")\n",
-        "download_model(\"https://huggingface.co/lllyasviel/Annotators/resolve/main/sk_model2.pth\", \"models/Annotators/sk_model2.pth\")\n",
-        "download_model(\"https://civitai.com/api/download/models/25820?type=Model&format=PickleTensor&size=full&fp=fp16\", \"models/textual_inversion/verybadimagenegative_v1.3.pt\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "iwOq2lWtKVYS"
-      },
-      "source": [
-        "## Run Diffutoon"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "tII_XRY-PJeo"
-      },
-      "source": [
-        "### Config Template"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "vsd2alA3PrGe"
-      },
-      "outputs": [],
-      "source": [
-        "config_stage_1_template = {\n",
-        "    \"models\": {\n",
-        "        \"model_list\": [\n",
-        "            \"models/stable_diffusion/aingdiffusion_v12.safetensors\",\n",
-        "            \"models/ControlNet/control_v11p_sd15_softedge.pth\",\n",
-        "            \"models/ControlNet/control_v11f1p_sd15_depth.pth\"\n",
-        "        ],\n",
-        "        \"textual_inversion_folder\": \"models/textual_inversion\",\n",
-        "        \"device\": \"cuda\",\n",
-        "        \"lora_alphas\": [],\n",
-        "        \"controlnet_units\": [\n",
-        "            {\n",
-        "                \"processor_id\": \"softedge\",\n",
-        "                \"model_path\": \"models/ControlNet/control_v11p_sd15_softedge.pth\",\n",
-        "                \"scale\": 0.5\n",
-        "            },\n",
-        "            {\n",
-        "                \"processor_id\": \"depth\",\n",
-        "                \"model_path\": \"models/ControlNet/control_v11f1p_sd15_depth.pth\",\n",
-        "                \"scale\": 0.5\n",
-        "            }\n",
-        "        ]\n",
-        "    },\n",
-        "    \"data\": {\n",
-        "        \"input_frames\": {\n",
-        "            \"video_file\": \"/content/input_video.mp4\",\n",
-        "            \"image_folder\": None,\n",
-        "            \"height\": 512,\n",
-        "            \"width\": 512,\n",
-        "            \"start_frame_id\": 0,\n",
-        "            \"end_frame_id\": 30\n",
-        "        },\n",
-        "        \"controlnet_frames\": [\n",
-        "            {\n",
-        "                \"video_file\": \"/content/input_video.mp4\",\n",
-        "                \"image_folder\": None,\n",
-        "                \"height\": 512,\n",
-        "                \"width\": 512,\n",
-        "                \"start_frame_id\": 0,\n",
-        "                \"end_frame_id\": 30\n",
-        "            },\n",
-        "            {\n",
-        "                \"video_file\": \"/content/input_video.mp4\",\n",
-        "                \"image_folder\": None,\n",
-        "                \"height\": 512,\n",
-        "                \"width\": 512,\n",
-        "                \"start_frame_id\": 0,\n",
-        "                \"end_frame_id\": 30\n",
-        "            }\n",
-        "        ],\n",
-        "        \"output_folder\": \"data/examples/diffutoon_edit/color_video\",\n",
-        "        \"fps\": 25\n",
-        "    },\n",
-        "    \"smoother_configs\": [\n",
-        "        {\n",
-        "            \"processor_type\": \"FastBlend\",\n",
-        "            \"config\": {}\n",
-        "        }\n",
-        "    ],\n",
-        "    \"pipeline\": {\n",
-        "        \"seed\": 0,\n",
-        "        \"pipeline_inputs\": {\n",
-        "            \"prompt\": \"best quality, perfect anime illustration, orange clothes, night, a girl is dancing, smile, solo, black silk stockings\",\n",
-        "            \"negative_prompt\": \"verybadimagenegative_v1.3\",\n",
-        "            \"cfg_scale\": 7.0,\n",
-        "            \"clip_skip\": 1,\n",
-        "            \"denoising_strength\": 0.9,\n",
-        "            \"num_inference_steps\": 20,\n",
-        "            \"animatediff_batch_size\": 8,\n",
-        "            \"animatediff_stride\": 4,\n",
-        "            \"unet_batch_size\": 8,\n",
-        "            \"controlnet_batch_size\": 8,\n",
-        "            \"cross_frame_attention\": True,\n",
-        "            \"smoother_progress_ids\": [-1],\n",
-        "            # The following parameters will be overwritten. You don't need to modify them.\n",
-        "            \"input_frames\": [],\n",
-        "            \"num_frames\": 30,\n",
-        "            \"width\": 512,\n",
-        "            \"height\": 512,\n",
-        "            \"controlnet_frames\": []\n",
-        "        }\n",
-        "    }\n",
-        "}\n",
-        "\n",
-        "config_stage_2_template = {\n",
-        "    \"models\": {\n",
-        "        \"model_list\": [\n",
-        "            \"models/stable_diffusion/aingdiffusion_v12.safetensors\",\n",
-        "            \"models/AnimateDiff/mm_sd_v15_v2.ckpt\",\n",
-        "            \"models/ControlNet/control_v11f1e_sd15_tile.pth\",\n",
-        "            \"models/ControlNet/control_v11p_sd15_lineart.pth\"\n",
-        "        ],\n",
-        "        \"textual_inversion_folder\": \"models/textual_inversion\",\n",
-        "        \"device\": \"cuda\",\n",
-        "        \"lora_alphas\": [],\n",
-        "        \"controlnet_units\": [\n",
-        "            {\n",
-        "                \"processor_id\": \"tile\",\n",
-        "                \"model_path\": \"models/ControlNet/control_v11f1e_sd15_tile.pth\",\n",
-        "                \"scale\": 0.5\n",
-        "            },\n",
-        "            {\n",
-        "                \"processor_id\": \"lineart\",\n",
-        "                \"model_path\": \"models/ControlNet/control_v11p_sd15_lineart.pth\",\n",
-        "                \"scale\": 0.5\n",
-        "            }\n",
-        "        ]\n",
-        "    },\n",
-        "    \"data\": {\n",
-        "        \"input_frames\": {\n",
-        "            \"video_file\": \"/content/input_video.mp4\",\n",
-        "            \"image_folder\": None,\n",
-        "            \"height\": 1024,\n",
-        "            \"width\": 1024,\n",
-        "            \"start_frame_id\": 0,\n",
-        "            \"end_frame_id\": 30\n",
-        "        },\n",
-        "        \"controlnet_frames\": [\n",
-        "            {\n",
-        "                \"video_file\": \"/content/input_video.mp4\",\n",
-        "                \"image_folder\": None,\n",
-        "                \"height\": 1024,\n",
-        "                \"width\": 1024,\n",
-        "                \"start_frame_id\": 0,\n",
-        "                \"end_frame_id\": 30\n",
-        "            },\n",
-        "            {\n",
-        "                \"video_file\": \"/content/input_video.mp4\",\n",
-        "                \"image_folder\": None,\n",
-        "                \"height\": 1024,\n",
-        "                \"width\": 1024,\n",
-        "                \"start_frame_id\": 0,\n",
-        "                \"end_frame_id\": 30\n",
-        "            }\n",
-        "        ],\n",
-        "        \"output_folder\": \"/content/output\",\n",
-        "        \"fps\": 25\n",
-        "    },\n",
-        "    \"pipeline\": {\n",
-        "        \"seed\": 0,\n",
-        "        \"pipeline_inputs\": {\n",
-        "            \"prompt\": \"best quality, perfect anime illustration, light, a girl is dancing, smile, solo\",\n",
-        "            \"negative_prompt\": \"verybadimagenegative_v1.3\",\n",
-        "            \"cfg_scale\": 7.0,\n",
-        "            \"clip_skip\": 2,\n",
-        "            \"denoising_strength\": 1.0,\n",
-        "            \"num_inference_steps\": 10,\n",
-        "            \"animatediff_batch_size\": 16,\n",
-        "            \"animatediff_stride\": 8,\n",
-        "            \"unet_batch_size\": 1,\n",
-        "            \"controlnet_batch_size\": 1,\n",
-        "            \"cross_frame_attention\": False,\n",
-        "            # The following parameters will be overwritten. You don't need to modify them.\n",
-        "            \"input_frames\": [],\n",
-        "            \"num_frames\": 30,\n",
-        "            \"width\": 1536,\n",
-        "            \"height\": 1536,\n",
-        "            \"controlnet_frames\": []\n",
-        "        }\n",
-        "    }\n",
-        "}"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "113QAmNHP6T_"
-      },
-      "source": [
-        "### Upload Input Video\n",
-        "\n",
-        "Before you run the following code, please upload your input video to `/content/input_video.mp4`."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "CyqAsj1o5U9B"
-      },
-      "source": [
-        "### Toon Shading\n",
-        "\n",
-        "Render your video in an anime style.\n",
-        "\n",
-        "We highly recommend you to use a higher resolution for better visual quality. The default resolution of Diffutoon is 1536x1536, which requires 22GB VRAM. If you don't have enough VRAM, 1024x1024 is also acceptable.\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "761nbrgeKMvj",
-        "outputId": "c0d47d5f-16e9-4a65-e664-9bd5fc491111"
-      },
-      "outputs": [],
-      "source": [
-        "from diffsynth import SDVideoPipelineRunner\n",
-        "\n",
-        "\n",
-        "config = config_stage_2_template.copy()\n",
-        "config[\"data\"][\"input_frames\"] = {\n",
-        "    \"video_file\": \"/content/input_video.mp4\",\n",
-        "    \"image_folder\": None,\n",
-        "    \"height\": 1024,\n",
-        "    \"width\": 1024,\n",
-        "    \"start_frame_id\": 0,\n",
-        "    \"end_frame_id\": 30\n",
-        "}\n",
-        "config[\"data\"][\"controlnet_frames\"] = [config[\"data\"][\"input_frames\"], config[\"data\"][\"input_frames\"]]\n",
-        "config[\"data\"][\"output_folder\"] = \"/content/toon_video\"\n",
-        "config[\"data\"][\"fps\"] = 25\n",
-        "\n",
-        "runner = SDVideoPipelineRunner()\n",
-        "runner.run(config)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "9wujhGUmDIwY"
-      },
-      "source": [
-        "Let's see the video!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/",
-          "height": 420
-        },
-        "id": "TBNAigacAq6h",
-        "outputId": "8f57c3b4-982b-4643-f3dc-53c51bd85a4b"
-      },
-      "outputs": [],
-      "source": [
-        "from IPython.display import HTML\n",
-        "from base64 import b64encode\n",
-        "\n",
-        "mp4 = open(\"/content/toon_video/video.mp4\", \"rb\").read()\n",
-        "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
-        "HTML(\"\"\"\n",
-        "<video width=400 controls>\n",
-        "<source src=\"%s\" type=\"video/mp4\">\n",
-        "</video>\n",
-        "\"\"\" % data_url)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "48hQfX--5YGi"
-      },
-      "source": [
-        "### Toon Shading with Editing Signals"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "bAQ9Zq-3-MH6"
-      },
-      "source": [
-        "In stage 1, input your prompt, and diffutoon will generate the editing signals in the format of low-resolution color video."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "BtDzYgIq5bgg",
-        "outputId": "bb27b7b9-7979-4409-f476-f25f0a164ef4"
-      },
-      "outputs": [],
-      "source": [
-        "from diffsynth import SDVideoPipelineRunner\n",
-        "\n",
-        "\n",
-        "config_stage_1 = config_stage_1_template.copy()\n",
-        "config_stage_1[\"data\"][\"input_frames\"] = {\n",
-        "    \"video_file\": \"/content/input_video.mp4\",\n",
-        "    \"image_folder\": None,\n",
-        "    \"height\": 512,\n",
-        "    \"width\": 512,\n",
-        "    \"start_frame_id\": 0,\n",
-        "    \"end_frame_id\": 30\n",
-        "}\n",
-        "config_stage_1[\"data\"][\"controlnet_frames\"] = [config_stage_1[\"data\"][\"input_frames\"], config_stage_1[\"data\"][\"input_frames\"]]\n",
-        "config_stage_1[\"data\"][\"output_folder\"] = \"/content/color_video\"\n",
-        "config_stage_1[\"data\"][\"fps\"] = 25\n",
-        "config_stage_1[\"pipeline\"][\"pipeline_inputs\"][\"prompt\"] = \"best quality, perfect anime illustration, orange clothes, night, a girl is dancing, smile, solo, black silk stockings\"\n",
-        "\n",
-        "runner = SDVideoPipelineRunner()\n",
-        "runner.run(config_stage_1)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "D9_AWwhi-pA9"
-      },
-      "source": [
-        "In stage 2, diffutoon will rerender the whole video according to the editing signals."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "JFysCk7y51i_",
-        "outputId": "475050d3-c72e-4e08-b55c-d59ed86b5497"
-      },
-      "outputs": [],
-      "source": [
-        "from diffsynth import SDVideoPipelineRunner\n",
-        "\n",
-        "\n",
-        "config_stage_2 = config_stage_2_template.copy()\n",
-        "config_stage_2[\"data\"][\"input_frames\"] = {\n",
-        "    \"video_file\": \"/content/input_video.mp4\",\n",
-        "    \"image_folder\": None,\n",
-        "    \"height\": 1024,\n",
-        "    \"width\": 1024,\n",
-        "    \"start_frame_id\": 0,\n",
-        "    \"end_frame_id\": 30\n",
-        "}\n",
-        "config_stage_2[\"data\"][\"controlnet_frames\"][0] = {\n",
-        "    \"video_file\": \"/content/color_video/video.mp4\",\n",
-        "    \"image_folder\": None,\n",
-        "    \"height\": config_stage_2[\"data\"][\"input_frames\"][\"height\"],\n",
-        "    \"width\": config_stage_2[\"data\"][\"input_frames\"][\"width\"],\n",
-        "    \"start_frame_id\": None,\n",
-        "    \"end_frame_id\": None\n",
-        "}\n",
-        "config_stage_2[\"data\"][\"controlnet_frames\"][1] = config[\"data\"][\"input_frames\"]\n",
-        "config_stage_2[\"data\"][\"output_folder\"] = \"/content/edit_video\"\n",
-        "config_stage_2[\"data\"][\"fps\"] = 25\n",
-        "\n",
-        "runner = SDVideoPipelineRunner()\n",
-        "runner.run(config)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "HIPrCAIS_Im0"
-      },
-      "source": [
-        "Let's see the video!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/",
-          "height": 420
-        },
-        "id": "Y2nz7rew-7VI",
-        "outputId": "fbcbadc6-4045-4aac-dfb0-80bacec003bf"
-      },
-      "outputs": [],
-      "source": [
-        "from IPython.display import HTML\n",
-        "from base64 import b64encode\n",
-        "\n",
-        "mp4 = open(\"/content/edit_video/video.mp4\", \"rb\").read()\n",
-        "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
-        "HTML(\"\"\"\n",
-        "<video width=400 controls>\n",
-        "<source src=\"%s\" type=\"video/mp4\">\n",
-        "</video>\n",
-        "\"\"\" % data_url)"
-      ]
-    }
-  ],
-  "metadata": {
-    "accelerator": "GPU",
-    "colab": {
-      "collapsed_sections": [
-        "tII_XRY-PJeo"
-      ],
-      "gpuType": "T4",
-      "provenance": [],
-      "toc_visible": true
-    },
-    "kernelspec": {
-      "display_name": "Python 3",
-      "name": "python3"
-    },
-    "language_info": {
-      "name": "python"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 0
-}
--- a/examples/Diffutoon/README.md
+++ b/examples/Diffutoon/README.md
@@ -1,21 +0,0 @@
-# Diffutoon
-
-[Diffutoon](https://arxiv.org/abs/2401.16224) is a toon shading approach. This approach is adept for rendering high-resoluton videos with rapid motion.
-
-## Example: Toon Shading (Diffutoon)
-
-Directly render realistic videos in a flatten style. In this example, you can easily modify the parameters in the config dict. See [`diffutoon_toon_shading.py`](./diffutoon_toon_shading.py). We also provide [an example on Colab](https://colab.research.google.com/github/Artiprocher/DiffSynth-Studio/blob/main/examples/Diffutoon.ipynb).
-
-https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-4709-be5e-b39af82404dd
-
-## Example: Toon Shading with Editing Signals (Diffutoon)
-
-This example supports video editing signals. See [`diffutoon_toon_shading_with_editing_signals.py`](./diffutoon_toon_shading_with_editing_signals.py). The editing feature is also supported in the [Colab example](https://colab.research.google.com/github/Artiprocher/DiffSynth-Studio/blob/main/examples/Diffutoon/Diffutoon.ipynb).
-
-https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/20528af5-5100-474a-8cdc-440b9efdd86c
-
-## Example: Toon Shading (in native Python code)
-
-This example is provided for developers. If you don't want to use the config to manage parameters, you can see [`sd_toon_shading.py`](./sd_toon_shading.py) to learn how to use it in native Python code.
-
-https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/607c199b-6140-410b-a111-3e4ffb01142c
--- a/examples/Diffutoon/diffutoon_toon_shading.py
+++ b/examples/Diffutoon/diffutoon_toon_shading.py
@@ -1,100 +0,0 @@
-from diffsynth import SDVideoPipelineRunner, download_models
-
-
-# Download models (automatically)
-# `models/stable_diffusion/aingdiffusion_v12.safetensors`: [link](https://civitai.com/api/download/models/229575)
-# `models/AnimateDiff/mm_sd_v15_v2.ckpt`: [link](https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15_v2.ckpt)
-# `models/ControlNet/control_v11p_sd15_lineart.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth)
-# `models/ControlNet/control_v11f1e_sd15_tile.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.pth)
-# `models/Annotators/sk_model.pth`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/sk_model.pth)
-# `models/Annotators/sk_model2.pth`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/sk_model2.pth)
-# `models/textual_inversion/verybadimagenegative_v1.3.pt`: [link](https://civitai.com/api/download/models/25820?type=Model&format=PickleTensor&size=full&fp=fp16)
-download_models([
-    "AingDiffusion_v12",
-    "AnimateDiff_v2",
-    "ControlNet_v11p_sd15_lineart",
-    "ControlNet_v11f1e_sd15_tile",
-    "TextualInversion_VeryBadImageNegative_v1.3"
-])
-# The original video in the example is https://www.bilibili.com/video/BV1iG411a7sQ/.
-
-config = {
-    "models": {
-        "model_list": [
-            "models/stable_diffusion/aingdiffusion_v12.safetensors",
-            "models/AnimateDiff/mm_sd_v15_v2.ckpt",
-            "models/ControlNet/control_v11f1e_sd15_tile.pth",
-            "models/ControlNet/control_v11p_sd15_lineart.pth"
-        ],
-        "textual_inversion_folder": "models/textual_inversion",
-        "device": "cuda",
-        "lora_alphas": [],
-        "controlnet_units": [
-            {
-                "processor_id": "tile",
-                "model_path": "models/ControlNet/control_v11f1e_sd15_tile.pth",
-                "scale": 0.5
-            },
-            {
-                "processor_id": "lineart",
-                "model_path": "models/ControlNet/control_v11p_sd15_lineart.pth",
-                "scale": 0.5
-            }
-        ]
-    },
-    "data": {
-        "input_frames": {
-            "video_file": "data/examples/diffutoon/input_video.mp4",
-            "image_folder": None,
-            "height": 1536,
-            "width": 1536,
-            "start_frame_id": 0,
-            "end_frame_id": 30
-        },
-        "controlnet_frames": [
-            {
-                "video_file": "data/examples/diffutoon/input_video.mp4",
-                "image_folder": None,
-                "height": 1536,
-                "width": 1536,
-                "start_frame_id": 0,
-                "end_frame_id": 30
-            },
-            {
-                "video_file": "data/examples/diffutoon/input_video.mp4",
-                "image_folder": None,
-                "height": 1536,
-                "width": 1536,
-                "start_frame_id": 0,
-                "end_frame_id": 30
-            }
-        ],
-        "output_folder": "output",
-        "fps": 30
-    },
-    "pipeline": {
-        "seed": 0,
-        "pipeline_inputs": {
-            "prompt": "best quality, perfect anime illustration, light, a girl is dancing, smile, solo",
-            "negative_prompt": "verybadimagenegative_v1.3",
-            "cfg_scale": 7.0,
-            "clip_skip": 2,
-            "denoising_strength": 1.0,
-            "num_inference_steps": 10,
-            "animatediff_batch_size": 16,
-            "animatediff_stride": 8,
-            "unet_batch_size": 1,
-            "controlnet_batch_size": 1,
-            "cross_frame_attention": False,
-            # The following parameters will be overwritten. You don't need to modify them.
-            "input_frames": [],
-            "num_frames": 30,
-            "width": 1536,
-            "height": 1536,
-            "controlnet_frames": []
-        }
-    }
-}
-
-runner = SDVideoPipelineRunner()
-runner.run(config)
--- a/examples/Diffutoon/diffutoon_toon_shading_with_editing_signals.py
+++ b/examples/Diffutoon/diffutoon_toon_shading_with_editing_signals.py
@@ -1,204 +0,0 @@
-from diffsynth import SDVideoPipelineRunner, download_models
-import os
-
-
-# Download models (automatically)
-# `models/stable_diffusion/aingdiffusion_v12.safetensors`: [link](https://civitai.com/api/download/models/229575)
-# `models/AnimateDiff/mm_sd_v15_v2.ckpt`: [link](https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15_v2.ckpt)
-# `models/ControlNet/control_v11p_sd15_lineart.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth)
-# `models/ControlNet/control_v11f1e_sd15_tile.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.pth)
-# `models/ControlNet/control_v11f1p_sd15_depth.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth)
-# `models/ControlNet/control_v11p_sd15_softedge.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.pth)
-# `models/Annotators/dpt_hybrid-midas-501f0c75.pt`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/dpt_hybrid-midas-501f0c75.pt)
-# `models/Annotators/ControlNetHED.pth`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/ControlNetHED.pth)
-# `models/Annotators/sk_model.pth`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/sk_model.pth)
-# `models/Annotators/sk_model2.pth`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/sk_model2.pth)
-# `models/textual_inversion/verybadimagenegative_v1.3.pt`: [link](https://civitai.com/api/download/models/25820?type=Model&format=PickleTensor&size=full&fp=fp16)
-download_models([
-    "AingDiffusion_v12",
-    "AnimateDiff_v2",
-    "ControlNet_v11p_sd15_lineart",
-    "ControlNet_v11f1e_sd15_tile",
-    "ControlNet_v11f1p_sd15_depth",
-    "ControlNet_v11p_sd15_softedge",
-    "TextualInversion_VeryBadImageNegative_v1.3"
-])
-# The original video in the example is https://www.bilibili.com/video/BV1zu4y1s7Ec/.
-
-config_stage_1 = {
-    "models": {
-        "model_list": [
-            "models/stable_diffusion/aingdiffusion_v12.safetensors",
-            "models/ControlNet/control_v11p_sd15_softedge.pth",
-            "models/ControlNet/control_v11f1p_sd15_depth.pth"
-        ],
-        "textual_inversion_folder": "models/textual_inversion",
-        "device": "cuda",
-        "lora_alphas": [],
-        "controlnet_units": [
-            {
-                "processor_id": "softedge",
-                "model_path": "models/ControlNet/control_v11p_sd15_softedge.pth",
-                "scale": 0.5
-            },
-            {
-                "processor_id": "depth",
-                "model_path": "models/ControlNet/control_v11f1p_sd15_depth.pth",
-                "scale": 0.5
-            }
-        ]
-    },
-    "data": {
-        "input_frames": {
-            "video_file": "data/examples/diffutoon_edit/input_video.mp4",
-            "image_folder": None,
-            "height": 512,
-            "width": 512,
-            "start_frame_id": 0,
-            "end_frame_id": 30
-        },
-        "controlnet_frames": [
-            {
-                "video_file": "data/examples/diffutoon_edit/input_video.mp4",
-                "image_folder": None,
-                "height": 512,
-                "width": 512,
-                "start_frame_id": 0,
-                "end_frame_id": 30
-            },
-            {
-                "video_file": "data/examples/diffutoon_edit/input_video.mp4",
-                "image_folder": None,
-                "height": 512,
-                "width": 512,
-                "start_frame_id": 0,
-                "end_frame_id": 30
-            }
-        ],
-        "output_folder": "output/color_video",
-        "fps": 25
-    },
-    "smoother_configs": [
-        {
-            "processor_type": "FastBlend",
-            "config": {}
-        }
-    ],
-    "pipeline": {
-        "seed": 0,
-        "pipeline_inputs": {
-            "prompt": "best quality, perfect anime illustration, orange clothes, night, a girl is dancing, smile, solo, black silk stockings",
-            "negative_prompt": "verybadimagenegative_v1.3",
-            "cfg_scale": 7.0,
-            "clip_skip": 1,
-            "denoising_strength": 0.9,
-            "num_inference_steps": 20,
-            "animatediff_batch_size": 8,
-            "animatediff_stride": 4,
-            "unet_batch_size": 8,
-            "controlnet_batch_size": 8,
-            "cross_frame_attention": True,
-            "smoother_progress_ids": [-1],
-            # The following parameters will be overwritten. You don't need to modify them.
-            "input_frames": [],
-            "num_frames": 30,
-            "width": 512,
-            "height": 512,
-            "controlnet_frames": []
-        }
-    }
-}
-
-
-config_stage_2 = {
-    "models": {
-        "model_list": [
-            "models/stable_diffusion/aingdiffusion_v12.safetensors",
-            "models/AnimateDiff/mm_sd_v15_v2.ckpt",
-            "models/ControlNet/control_v11f1e_sd15_tile.pth",
-            "models/ControlNet/control_v11p_sd15_lineart.pth"
-        ],
-        "textual_inversion_folder": "models/textual_inversion",
-        "device": "cuda",
-        "lora_alphas": [],
-        "controlnet_units": [
-            {
-                "processor_id": "tile",
-                "model_path": "models/ControlNet/control_v11f1e_sd15_tile.pth",
-                "scale": 0.5
-            },
-            {
-                "processor_id": "lineart",
-                "model_path": "models/ControlNet/control_v11p_sd15_lineart.pth",
-                "scale": 0.5
-            }
-        ]
-    },
-    "data": {
-        "input_frames": {
-            "video_file": "data/examples/diffutoon_edit/input_video.mp4",
-            "image_folder": None,
-            "height": 1536,
-            "width": 1536,
-            "start_frame_id": 0,
-            "end_frame_id": 30
-        },
-        "controlnet_frames": [
-            {
-                "video_file": "data/examples/diffutoon_edit/input_video.mp4",
-                "image_folder": None,
-                "height": 1536,
-                "width": 1536,
-                "start_frame_id": 0,
-                "end_frame_id": 30
-            },
-            {
-                "video_file": "data/examples/diffutoon_edit/input_video.mp4",
-                "image_folder": None,
-                "height": 1536,
-                "width": 1536,
-                "start_frame_id": 0,
-                "end_frame_id": 30
-            }
-        ],
-        "output_folder": "output/edited_video",
-        "fps": 30
-    },
-    "pipeline": {
-        "seed": 0,
-        "pipeline_inputs": {
-            "prompt": "best quality, perfect anime illustration, light, a girl is dancing, smile, solo",
-            "negative_prompt": "verybadimagenegative_v1.3",
-            "cfg_scale": 7.0,
-            "clip_skip": 2,
-            "denoising_strength": 1.0,
-            "num_inference_steps": 10,
-            "animatediff_batch_size": 16,
-            "animatediff_stride": 8,
-            "unet_batch_size": 1,
-            "controlnet_batch_size": 1,
-            "cross_frame_attention": False,
-            # The following parameters will be overwritten. You don't need to modify them.
-            "input_frames": [],
-            "num_frames": 30,
-            "width": 1536,
-            "height": 1536,
-            "controlnet_frames": []
-        }
-    }
-}
-
-
-runner = SDVideoPipelineRunner()
-runner.run(config_stage_1)
-
-# Replace the color video with the synthesized video
-config_stage_2["data"]["controlnet_frames"][0] = {
-    "video_file": os.path.join(config_stage_1["data"]["output_folder"], "video.mp4"),
-    "image_folder": None,
-    "height": config_stage_2["data"]["input_frames"]["height"],
-    "width": config_stage_2["data"]["input_frames"]["width"],
-    "start_frame_id": None,
-    "end_frame_id": None
-}
-runner.run(config_stage_2)
--- a/examples/Diffutoon/sd_toon_shading.py
+++ b/examples/Diffutoon/sd_toon_shading.py
@@ -1,65 +0,0 @@
-from diffsynth import ModelManager, SDVideoPipeline, ControlNetConfigUnit, VideoData, save_video, download_models
-import torch
-
-
-# Download models (automatically)
-# `models/stable_diffusion/flat2DAnimerge_v45Sharp.safetensors`: [link](https://civitai.com/api/download/models/266360?type=Model&format=SafeTensor&size=pruned&fp=fp16)
-# `models/AnimateDiff/mm_sd_v15_v2.ckpt`: [link](https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15_v2.ckpt)
-# `models/ControlNet/control_v11p_sd15_lineart.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth)
-# `models/ControlNet/control_v11f1e_sd15_tile.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.pth)
-# `models/Annotators/sk_model.pth`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/sk_model.pth)
-# `models/Annotators/sk_model2.pth`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/sk_model2.pth)
-# `models/textual_inversion/verybadimagenegative_v1.3.pt`: [link](https://civitai.com/api/download/models/25820?type=Model&format=PickleTensor&size=full&fp=fp16)
-download_models([
-    "Flat2DAnimerge_v45Sharp",
-    "AnimateDiff_v2",
-    "ControlNet_v11p_sd15_lineart",
-    "ControlNet_v11f1e_sd15_tile",
-    "TextualInversion_VeryBadImageNegative_v1.3"
-])
-
-# Load models
-model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
-model_manager.load_models([
-    "models/stable_diffusion/flat2DAnimerge_v45Sharp.safetensors",
-    "models/AnimateDiff/mm_sd_v15_v2.ckpt",
-    "models/ControlNet/control_v11p_sd15_lineart.pth",
-    "models/ControlNet/control_v11f1e_sd15_tile.pth",
-])
-pipe = SDVideoPipeline.from_model_manager(
-    model_manager,
-    [
-        ControlNetConfigUnit(
-            processor_id="lineart",
-            model_path="models/ControlNet/control_v11p_sd15_lineart.pth",
-            scale=0.5
-        ),
-        ControlNetConfigUnit(
-            processor_id="tile",
-            model_path="models/ControlNet/control_v11f1e_sd15_tile.pth",
-            scale=0.5
-        )
-    ]
-)
-pipe.prompter.load_textual_inversions(["models/textual_inversion/verybadimagenegative_v1.3.pt"])
-
-# Load video (we only use 60 frames for quick testing)
-# The original video is here: https://www.bilibili.com/video/BV19w411A7YJ/
-video = VideoData(
-    video_file="data/examples/bilibili/BV19w411A7YJ.mp4",
-    height=1024, width=1024)
-input_video = [video[i] for i in range(40*60, 41*60)]
-
-# Toon shading (20G VRAM)
-torch.manual_seed(0)
-output_video = pipe(
-    prompt="best quality, perfect anime illustration, light, a girl is dancing, smile, solo",
-    negative_prompt="verybadimagenegative_v1.3",
-    cfg_scale=3, clip_skip=2,
-    controlnet_frames=input_video, num_frames=len(input_video),
-    num_inference_steps=10, height=1024, width=1024,
-    animatediff_batch_size=32, animatediff_stride=16,
-)
-
-# Save video
-save_video(output_video, "output_video.mp4", fps=60)
--- a/examples/EntityControl/README.md
+++ b/examples/EntityControl/README.md
@@ -1,90 +0,0 @@
-# EliGen: Entity-Level Controlled Image Generation
-
-## Introduction
-
-We propose EliGen, a novel approach that leverages fine-grained entity-level information to enable precise and controllable text-to-image generation. EliGen excels in tasks such as entity-level controlled image generation and image inpainting, while its applicability is not limited to these areas. Additionally, it can be seamlessly integrated with existing community models, such as the IP-Adpater and In-Cotext LoRA.
-
-* Paper: [EliGen: Entity-Level Controlled Image Generation with Regional Attention](https://arxiv.org/abs/2501.01097)
-* Github: [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)
-* Model: [ModelScope](https://www.modelscope.cn/models/DiffSynth-Studio/Eligen), [HuggingFace](https://huggingface.co/modelscope/EliGen)
-* Online Demo: [ModelScope EliGen Studio](https://www.modelscope.cn/studios/DiffSynth-Studio/EliGen)
-* Training Dataset: [EliGen Train Set](https://www.modelscope.cn/datasets/DiffSynth-Studio/EliGenTrainSet)
-
-
-## Methodology
-
-![regional-attention](https://github.com/user-attachments/assets/bef5ae2b-cc03-404e-b9c8-0c037ac66190)
-
-We introduce a regional attention mechanism within the DiT framework to effectively process the conditions of each entity. This mechanism enables the local prompt associated with each entity to semantically influence specific regions through regional attention. To further enhance the layout control capabilities of EliGen, we meticulously contribute an entity-annotated dataset and fine-tune the model using the LoRA framework. 
-
-1. **Regional Attention**: Regional attention is shown in above figure, which can be easily applied to other text-to-image models. Its core principle involves transforming the positional information of each entity into an attention mask, ensuring that the mechanism only affects the designated regions.
-   
-2. **Dataset with Entity Annotation**: To construct a dedicated entity control dataset, we start by randomly selecting captions from DiffusionDB and generating the corresponding source image using Flux. Next, we employ Qwen2-VL 72B, recognized for its advanced grounding capabilities among MLLMs, to randomly identify entities within the image. These entities are annotated with local prompts and bounding boxes for precise localization, forming the foundation of our dataset for further training.
-
-3. **Training**: We utilize LoRA (Low-Rank Adaptation) and DeepSpeed to fine-tune regional attention mechanisms using a curated dataset, enabling our EliGen model to achieve effective entity-level control.
-
-## Usage
-1. **Entity-Level Controlled Image Generation**
-   EliGen achieves effective entity-level control results. See [./entity_control.py](./entity_control.py) for usage.
-2. **Image Inpainting**
-   To apply EliGen to image inpainting task, we propose a inpainting fusion pipeline to preserve the non-painting areas while enabling precise, entity-level modifications over inpaining regions.
-   See [./entity_inpaint.py](./entity_inpaint.py) for usage.
-3. **Styled Entity Control**
-   EliGen can be seamlessly integrated with existing community models. We have provided an example of how to integrate it with the IP-Adpater. See [./entity_control_ipadapter.py](./entity_control_ipadapter.py) for usage.
-4. **Entity Transfer**
-   We have provided an example of how to integrate EliGen with In-Cotext LoRA, which achieves interesting entity transfer results. See [./entity_transfer.py](./entity_transfer.py) for usage.
-5. **Play with EliGen using UI**
-   Run the following command to try interactive UI: 
-   ```bash
-   python apps/gradio/entity_level_control.py
-   ```
-## Examples
-### Entity-Level Controlled Image Generation
-
-1. The effect of generating images with continuously changing entity positions.
-
-https://github.com/user-attachments/assets/54a048c8-b663-4262-8c40-43c87c266d4b
-
-2. The image generation effect of complex Entity combinations, demonstrating the strong generalization of EliGen. See [./entity_control.py](./entity_control.py) `example_1-6` for generation prompts.
-
-|Entity Conditions|Generated Image|
-|-|-|
-|![eligen_example_1_mask_0](https://github.com/user-attachments/assets/68cbedc0-32aa-4a8e-99d2-306dbb4620de)|![eligen_example_1_0](https://github.com/user-attachments/assets/c678c4b1-aa19-41df-b612-adc01b8b2009)|
-|![eligen_example_2_mask_0](https://github.com/user-attachments/assets/1c6d9445-5022-4d91-ad2e-dc05321883d1)|![eligen_example_2_0](https://github.com/user-attachments/assets/86739945-cb07-4a49-b3b3-3bb65c90d14f)|
-|![eligen_example_3_mask_27](https://github.com/user-attachments/assets/5ca4440d-d1db-45dd-b03c-0affefbd9ac3)|![eligen_example_3_27](https://github.com/user-attachments/assets/9160c22a-89ac-4d52-be1d-17ba2d8a67eb)|
-|![eligen_example_4_mask_21](https://github.com/user-attachments/assets/26dfde2b-cc9a-4cb3-806a-7f7436d971a7)|![eligen_example_4_21](https://github.com/user-attachments/assets/1fff7346-6a8c-4eb6-986f-4ea848c6b363)|
-|![eligen_example_5_mask_0](https://github.com/user-attachments/assets/8ca94e5f-f896-451d-a700-bcdc23689adb)|![eligen_example_5_0](https://github.com/user-attachments/assets/881a9395-6cc2-43e9-89b4-30b8f5437e6d)|
-|![eligen_example_6_mask_8](https://github.com/user-attachments/assets/26c95abf-f2b1-44db-92c1-75d02c714c74)|![eligen_example_6_8](https://github.com/user-attachments/assets/8883abde-3fad-4a8b-ade0-ca5b977a290f)|
-
-1. Demonstration of the robustness of EliGen. The following examples are generated using the same prompt but different seeds. Refer to [./entity_control.py](./entity_control.py) `example_7` for the prompts.
-
-|Entity Conditions|Generated Image|
-|-|-|
-|![eligen_example_7_mask_5](https://github.com/user-attachments/assets/85630237-9d8b-41ea-9bd5-506652c61776)|![eligen_example_7_5](https://github.com/user-attachments/assets/d34b54d2-c59c-4c39-8ab4-c22f155283f1)|
-|![eligen_example_7_mask_5](https://github.com/user-attachments/assets/85630237-9d8b-41ea-9bd5-506652c61776)|![eligen_example_7_6](https://github.com/user-attachments/assets/4050a3bf-a089-4f4f-81e0-e3b391cf7ceb)|
-![eligen_example_7_mask_5](https://github.com/user-attachments/assets/85630237-9d8b-41ea-9bd5-506652c61776)|![eligen_example_7_7](https://github.com/user-attachments/assets/682feb5e-a27a-4ae4-a800-018b4e0e504c)|
-|![eligen_example_7_mask_5](https://github.com/user-attachments/assets/85630237-9d8b-41ea-9bd5-506652c61776)|![eligen_example_7_8](https://github.com/user-attachments/assets/50266950-24b3-426a-ae74-c3ebadb853d9)|
-
-### Image Inpainting
-Demonstration of the inpainting mode of EliGen, see [./entity_inpaint.py](./entity_inpaint.py) for generation prompts.
-|Inpainting Input|Inpainting Output|
-|-|-|
-|![inpaint_i1](https://github.com/user-attachments/assets/5ef499f3-3d8a-49cc-8ceb-86af7f5cb9f8)|![inpaint_o1](https://github.com/user-attachments/assets/88fc3bde-0984-4b3c-8ca9-d63de660855b)|
-|![inpaint_i2](https://github.com/user-attachments/assets/5f74c710-bf30-4db1-ae40-a1e1995ccef6)|![inpaint_o2](https://github.com/user-attachments/assets/7c3b4857-b774-47ea-b163-34d49e7c976d)|
-### Styled Entity Control
-Demonstration of the styled entity control results with EliGen and IP-Adapter, see [./entity_control_ipadapter.py](./entity_control_ipadapter.py) for generation prompts.
-|Style Reference|Entity Control Variance 1|Entity Control Variance 2|Entity Control Variance 3|
-|-|-|-|-|
-|![image_1_base](https://github.com/user-attachments/assets/5e2dd3ab-37d3-4f58-8e02-ee2f9b238604)|![result1](https://github.com/user-attachments/assets/0f6711a2-572a-41b3-938a-95deff6d732d)|![result2](https://github.com/user-attachments/assets/ce2e66e5-1fdf-44e8-bca7-555d805a50b1)|![result3](https://github.com/user-attachments/assets/ad2da233-2f7c-4065-ab57-b2d84dc2c0e2)|
-
-We also provide a demo of the styled entity control results with EliGen and specific styled lora, see [./styled_entity_control.py](./styled_entity_control.py) for details. Here is the visualization of EliGen with [Lego dreambooth lora](https://huggingface.co/merve/flux-lego-lora-dreambooth).
-|![image_1_base](https://github.com/user-attachments/assets/35fb60f5-48ef-4f22-95d8-f9e732a5f63f)|![result1](https://github.com/user-attachments/assets/441d700f-f0b1-40e0-8848-4db23520972c)|![result2](https://github.com/user-attachments/assets/c8fd4498-3c55-48ab-9abf-3a092a90c878)|![result3](https://github.com/user-attachments/assets/181ba2bb-62cf-41a8-9e3a-20ed8a7a672f)|
-|-|-|-|-|
-|![image_1_base](https://github.com/user-attachments/assets/70a3f578-8c7e-4b40-954d-8fc94d4f3ae9)|![result1](https://github.com/user-attachments/assets/65670717-6136-4594-84e5-2307fc20753d)|![result2](https://github.com/user-attachments/assets/5ec7a5bd-f2c9-4b2e-8a4e-d2655ec8036c)|![result3](https://github.com/user-attachments/assets/56f00192-9553-45a6-a971-511b9f5b1480)|
-
-### Entity Transfer
-Demonstration of the entity transfer results with EliGen and In-Context LoRA, see [./entity_transfer.py](./entity_transfer.py) for generation prompts.
-
-|Entity to Transfer|Transfer Target Image|Transfer Example 1|Transfer Example 2|
-|-|-|-|-|
-|![source](https://github.com/user-attachments/assets/0d40ef22-0a09-420d-bd5a-bfb93120b60d)|![targe](https://github.com/user-attachments/assets/f6c58ef2-54c1-4d86-8429-dad2eb0e0685)|![result1](https://github.com/user-attachments/assets/05eed2e3-097d-40af-8aae-1e0c75051f32)|![result2](https://github.com/user-attachments/assets/54314d16-244b-411e-8a91-96c500efa5f5)|
--- a/examples/EntityControl/entity_control.py
+++ b/examples/EntityControl/entity_control.py
@@ -1,83 +0,0 @@
-from diffsynth import ModelManager, FluxImagePipeline, download_customized_models
-from modelscope import dataset_snapshot_download
-from examples.EntityControl.utils import visualize_masks
-from PIL import Image
-import torch
-
-def example(pipe, seeds, example_id, global_prompt, entity_prompts):
-    dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth", local_dir="./", allow_file_pattern=f"data/examples/eligen/entity_control/example_{example_id}/*.png")
-    masks = [Image.open(f"./data/examples/eligen/entity_control/example_{example_id}/{i}.png").convert('RGB') for i in range(len(entity_prompts))]
-    negative_prompt = "worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw,"
-    for seed in seeds:
-        # generate image
-        image = pipe(
-            prompt=global_prompt,
-            cfg_scale=3.0,
-            negative_prompt=negative_prompt,
-            num_inference_steps=50,
-            embedded_guidance=3.5,
-            seed=seed,
-            height=1024,
-            width=1024,
-            eligen_entity_prompts=entity_prompts,
-            eligen_entity_masks=masks,
-        )
-        image.save(f"eligen_example_{example_id}_{seed}.png")
-        visualize_masks(image, masks, entity_prompts, f"eligen_example_{example_id}_mask_{seed}.png")
-
-# download and load model
-model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda", model_id_list=["FLUX.1-dev"])
-# set download_from_modelscope = False if you want to download model from huggingface
-download_from_modelscope = True
-if download_from_modelscope:
-    model_id = "DiffSynth-Studio/Eligen"
-    downloading_priority = ["ModelScope"]
-else:
-    model_id = "modelscope/EliGen"
-    downloading_priority = ["HuggingFace"]
-model_manager.load_lora(
-    download_customized_models(
-        model_id=model_id,
-        origin_file_path="model_bf16.safetensors",
-        local_dir="models/lora/entity_control",
-        downloading_priority=downloading_priority
-    ),
-    lora_alpha=1
-)
-pipe = FluxImagePipeline.from_model_manager(model_manager)
-
-# example 1
-global_prompt = "A breathtaking beauty of Raja Ampat by the late-night moonlight , one beautiful woman from behind wearing a pale blue long dress with soft glow, sitting at the top of a cliff looking towards the beach,pastell light colors, a group of small distant birds flying in far sky, a boat sailing on the sea, best quality, realistic, whimsical, fantastic, splash art, intricate detailed, hyperdetailed, maximalist style, photorealistic, concept art, sharp focus, harmony, serenity, tranquility, soft pastell colors,ambient occlusion, cozy ambient lighting, masterpiece, liiv1, linquivera, metix, mentixis, masterpiece, award winning, view from above\n"
-entity_prompts = ["cliff", "sea", "moon", "sailing boat", "a seated beautiful woman", "pale blue long dress with soft glow"]
-example(pipe, [0], 1, global_prompt, entity_prompts)
-
-# example 2
-global_prompt = "samurai girl wearing a kimono, she's holding a sword  glowing with red flame, her long hair is flowing in the wind, she is looking at a small bird perched on the back of her hand. ultra realist style. maximum image detail. maximum realistic render."
-entity_prompts = ["flowing hair", "sword glowing with red flame", "A cute bird", "blue belt"]
-example(pipe, [0], 2, global_prompt, entity_prompts)
-
-# example 3
-global_prompt = "Image of a neverending staircase up to a mysterious palace in the sky, The ancient palace stood majestically atop a mist-shrouded mountain, sunrise, two traditional monk walk in the stair looking at the sunrise, fog,see-through, best quality, whimsical, fantastic, splash art, intricate detailed, hyperdetailed, photorealistic, concept art, harmony, serenity, tranquility, ambient occlusion, halation, cozy ambient lighting, dynamic lighting,masterpiece, liiv1, linquivera, metix, mentixis, masterpiece, award winning,"
-entity_prompts = ["ancient palace", "stone staircase with railings", "a traditional monk", "a traditional monk"]
-example(pipe, [27], 3, global_prompt, entity_prompts)
-
-# example 4
-global_prompt = "A beautiful girl wearing shirt and shorts in the street,  holding a sign 'Entity Control'"
-entity_prompts = ["A beautiful girl", "sign 'Entity Control'", "shorts", "shirt"]
-example(pipe, [21], 4, global_prompt, entity_prompts)
-
-# example 5
-global_prompt = "A captivating, dramatic scene in a painting that exudes mystery and foreboding. A white sky, swirling blue clouds, and a crescent yellow moon illuminate a solitary woman standing near the water's edge. Her long dress flows in the wind, silhouetted against the eerie glow. The water mirrors the fiery sky and moonlight, amplifying the uneasy atmosphere."
-entity_prompts = ["crescent yellow moon", "a solitary woman", "water", "swirling blue clouds"]
-example(pipe, [0], 5, global_prompt, entity_prompts)
-
-# example 6
-global_prompt = "Snow White and the 6 Dwarfs."
-entity_prompts = ["Dwarf 1", "Dwarf 2", "Dwarf 3", "Snow White", "Dwarf 4", "Dwarf 5", "Dwarf 6"]
-example(pipe, [8], 6, global_prompt, entity_prompts)
-
-# example 7, same prompt with different seeds
-seeds = range(5, 9)
-global_prompt = "A beautiful woman wearing white dress, holding a mirror, with a warm light background;"
-entity_prompts = ["A beautiful woman", "mirror", "necklace", "glasses", "earring", "white dress", "jewelry headpiece"]
-example(pipe, seeds, 7, global_prompt, entity_prompts)
--- a/examples/EntityControl/entity_control_ipadapter.py
+++ b/examples/EntityControl/entity_control_ipadapter.py
@@ -1,46 +0,0 @@
-from diffsynth import ModelManager, FluxImagePipeline, download_customized_models
-from modelscope import dataset_snapshot_download
-from examples.EntityControl.utils import visualize_masks
-from PIL import Image
-import torch
-
-
-# download and load model
-model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda", model_id_list=["FLUX.1-dev", "InstantX/FLUX.1-dev-IP-Adapter"])
-model_manager.load_lora(
-    download_customized_models(
-        model_id="DiffSynth-Studio/Eligen",
-        origin_file_path="model_bf16.safetensors",
-        local_dir="models/lora/entity_control"
-    ),
-    lora_alpha=1
-)
-pipe = FluxImagePipeline.from_model_manager(model_manager)
-
-# download and load mask images
-dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth", local_dir="./", allow_file_pattern="data/examples/eligen/ipadapter/*")
-masks = [Image.open(f"./data/examples/eligen/ipadapter/ipadapter_mask_{i}.png") for i in range(1, 4)]
-
-entity_prompts = ['A girl', 'hat', 'sunset']
-global_prompt = "A girl wearing a hat, looking at the sunset"
-negative_prompt = "worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw"
-reference_img = Image.open("./data/examples/eligen/ipadapter/ipadapter_image.png")
-
-# generate image
-image = pipe(
-    prompt=global_prompt,
-    cfg_scale=3.0,
-    negative_prompt=negative_prompt,
-    num_inference_steps=50,
-    embedded_guidance=3.5,
-    seed=4,
-    height=1024,
-    width=1024,
-    eligen_entity_prompts=entity_prompts,
-    eligen_entity_masks=masks,
-    enable_eligen_on_negative=False,
-    ipadapter_images=[reference_img],
-    ipadapter_scale=0.7
-)
-image.save(f"styled_entity_control.png")
-visualize_masks(image, masks, entity_prompts, f"styled_entity_control_with_mask.png")
--- a/examples/EntityControl/entity_inpaint.py
+++ b/examples/EntityControl/entity_inpaint.py
@@ -1,45 +0,0 @@
-from diffsynth import ModelManager, FluxImagePipeline, download_customized_models
-from modelscope import dataset_snapshot_download
-from examples.EntityControl.utils import visualize_masks
-from PIL import Image
-import torch
-
-# download and load model
-model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda", model_id_list=["FLUX.1-dev"])
-model_manager.load_lora(
-    download_customized_models(
-        model_id="DiffSynth-Studio/Eligen",
-        origin_file_path="model_bf16.safetensors",
-        local_dir="models/lora/entity_control"
-    ),
-    lora_alpha=1
-)
-pipe = FluxImagePipeline.from_model_manager(model_manager)
-
-# download and load mask images
-dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth", local_dir="./", allow_file_pattern="data/examples/eligen/inpaint/*")
-masks = [Image.open(f"./data/examples/eligen/inpaint/inpaint_mask_{i}.png") for i in range(1, 3)]
-input_image = Image.open("./data/examples/eligen/inpaint/inpaint_image.jpg")
-
-entity_prompts = ["A person wear red shirt", "Airplane"]
-global_prompt = "A person walking on the path in front of a house; An airplane in the sky"
-negative_prompt = "worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw, blur"
-
-# generate image
-image = pipe(
-    prompt=global_prompt,
-    input_image=input_image,
-    cfg_scale=3.0,
-    negative_prompt=negative_prompt,
-    num_inference_steps=50,
-    embedded_guidance=3.5,
-    seed=0,
-    height=1024,
-    width=1024,
-    eligen_entity_prompts=entity_prompts,
-    eligen_entity_masks=masks,
-    enable_eligen_on_negative=False,
-    enable_eligen_inpaint=True,
-)
-image.save(f"entity_inpaint.png")
-visualize_masks(image, masks, entity_prompts, f"entity_inpaint_with_mask.png")
--- a/examples/EntityControl/entity_transfer.py
+++ b/examples/EntityControl/entity_transfer.py
@@ -1,84 +0,0 @@
-from diffsynth import ModelManager, FluxImagePipeline, download_customized_models
-from modelscope import dataset_snapshot_download
-from examples.EntityControl.utils import visualize_masks
-from PIL import Image
-import torch
-
-
-def build_pipeline():
-    model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda", model_id_list=["FLUX.1-dev"])
-    model_manager.load_lora(
-        download_customized_models(
-            model_id="DiffSynth-Studio/Eligen",
-            origin_file_path="model_bf16.safetensors",
-            local_dir="models/lora/entity_control"
-        ),
-        lora_alpha=1
-    )
-    model_manager.load_lora(
-        download_customized_models(
-            model_id="iic/In-Context-LoRA",
-            origin_file_path="visual-identity-design.safetensors",
-            local_dir="models/lora/In-Context-LoRA"
-        ),
-        lora_alpha=1
-    )
-    pipe = FluxImagePipeline.from_model_manager(model_manager)
-    return pipe
-
-
-def generate(pipe: FluxImagePipeline, source_image, target_image, mask, height, width, prompt, entity_prompt, image_save_path, mask_save_path, seed=0):
-    input_mask = Image.new('RGB', (width * 2, height))
-    input_mask.paste(mask.resize((width, height), resample=Image.NEAREST).convert('RGB'), (width, 0))
-
-    input_image = Image.new('RGB', (width * 2, height))
-    input_image.paste(source_image.resize((width, height)).convert('RGB'), (0, 0))
-    input_image.paste(target_image.resize((width, height)).convert('RGB'), (width, 0))
-
-    image = pipe(
-        prompt=prompt,
-        input_image=input_image,
-        cfg_scale=3.0,
-        negative_prompt="",
-        num_inference_steps=50,
-        embedded_guidance=3.5,
-        seed=seed,
-        height=height,
-        width=width * 2,
-        eligen_entity_prompts=[entity_prompt],
-        eligen_entity_masks=[input_mask],
-        enable_eligen_on_negative=False,
-        enable_eligen_inpaint=True,
-    )
-    target_image = image.crop((width, 0, 2 * width, height))
-    target_image.save(image_save_path)
-    visualize_masks(target_image, [mask], [entity_prompt], mask_save_path)
-    return target_image
-
-
-pipe = build_pipeline()
-
-dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth", local_dir="./", allow_file_pattern="data/examples/eligen/logo_transfer/*")
-
-prompt="The two-panel image showcases the joyful identity, with the left panel showing a rabbit graphic; [LEFT] while the right panel translates the design onto a shopping tote with the rabbit logo in black, held by a person in a market setting, emphasizing the brand's approachable and eco-friendly vibe."
-logo_prompt="a rabbit logo"
-
-logo_image = Image.open("data/examples/eligen/logo_transfer/source_image.png")
-target_image = Image.open("data/examples/eligen/logo_transfer/target_image.png")
-mask = Image.open("data/examples/eligen/logo_transfer/mask_1.png")
-generate(
-    pipe, logo_image, target_image, mask, 
-    height=1024, width=1024,
-    prompt=prompt, entity_prompt=logo_prompt,
-    image_save_path="entity_transfer_1.png",
-    mask_save_path="entity_transfer_with_mask_1.png"
-)
-
-mask = Image.open("data/examples/eligen/logo_transfer/mask_2.png")
-generate(
-    pipe, logo_image, target_image, mask, 
-    height=1024, width=1024,
-    prompt=prompt, entity_prompt=logo_prompt,
-    image_save_path="entity_transfer_2.png",
-    mask_save_path="entity_transfer_with_mask_2.png"
-)
--- a/examples/EntityControl/styled_entity_control.py
+++ b/examples/EntityControl/styled_entity_control.py
@@ -1,90 +0,0 @@
-from diffsynth import ModelManager, FluxImagePipeline, download_customized_models
-from modelscope import dataset_snapshot_download
-from examples.EntityControl.utils import visualize_masks
-from PIL import Image
-import torch
-
-def example(pipe, seeds, example_id, global_prompt, entity_prompts):
-    dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth", local_dir="./", allow_file_pattern=f"data/examples/eligen/entity_control/example_{example_id}/*.png")
-    masks = [Image.open(f"./data/examples/eligen/entity_control/example_{example_id}/{i}.png").convert('RGB') for i in range(len(entity_prompts))]
-    negative_prompt = "worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw,"
-    for seed in seeds:
-        # generate image
-        image = pipe(
-            prompt=global_prompt,
-            cfg_scale=3.0,
-            negative_prompt=negative_prompt,
-            num_inference_steps=50,
-            embedded_guidance=3.5,
-            seed=seed,
-            height=1024,
-            width=1024,
-            eligen_entity_prompts=entity_prompts,
-            eligen_entity_masks=masks,
-        )
-        image.save(f"styled_eligen_example_{example_id}_{seed}.png")
-        visualize_masks(image, masks, entity_prompts, f"styled_entity_control_example_{example_id}_mask_{seed}.png")
-
-# download and load model
-model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda", model_id_list=["FLUX.1-dev"])
-model_manager.load_lora(
-    download_customized_models(
-        model_id="FluxLora/merve-flux-lego-lora-dreambooth",
-        origin_file_path="pytorch_lora_weights.safetensors",
-        local_dir="models/lora/merve-flux-lego-lora-dreambooth"
-    ),
-    lora_alpha=1
-)
-model_manager.load_lora(
-    download_customized_models(
-        model_id="DiffSynth-Studio/Eligen",
-        origin_file_path="model_bf16.safetensors",
-        local_dir="models/lora/entity_control"
-    ),
-    lora_alpha=1
-)
-pipe = FluxImagePipeline.from_model_manager(model_manager)
-
-# example 1
-trigger_word = "lego set in style of TOK, "
-global_prompt = "A breathtaking beauty of Raja Ampat by the late-night moonlight , one beautiful woman from behind wearing a pale blue long dress with soft glow, sitting at the top of a cliff looking towards the beach,pastell light colors, a group of small distant birds flying in far sky, a boat sailing on the sea, best quality, realistic, whimsical, fantastic, splash art, intricate detailed, hyperdetailed, maximalist style, photorealistic, concept art, sharp focus, harmony, serenity, tranquility, soft pastell colors,ambient occlusion, cozy ambient lighting, masterpiece, liiv1, linquivera, metix, mentixis, masterpiece, award winning, view from above\n"
-global_prompt = trigger_word + global_prompt
-entity_prompts = ["cliff", "sea", "moon", "sailing boat", "a seated beautiful woman", "pale blue long dress with soft glow"]
-example(pipe, [0], 1, global_prompt, entity_prompts)
-
-# example 2
-global_prompt = "samurai girl wearing a kimono, she's holding a sword  glowing with red flame, her long hair is flowing in the wind, she is looking at a small bird perched on the back of her hand. ultra realist style. maximum image detail. maximum realistic render."
-global_prompt = trigger_word + global_prompt
-entity_prompts = ["flowing hair", "sword glowing with red flame", "A cute bird", "blue belt"]
-example(pipe, [0], 2, global_prompt, entity_prompts)
-
-# example 3
-global_prompt = "Image of a neverending staircase up to a mysterious palace in the sky, The ancient palace stood majestically atop a mist-shrouded mountain, sunrise, two traditional monk walk in the stair looking at the sunrise, fog,see-through, best quality, whimsical, fantastic, splash art, intricate detailed, hyperdetailed, photorealistic, concept art, harmony, serenity, tranquility, ambient occlusion, halation, cozy ambient lighting, dynamic lighting,masterpiece, liiv1, linquivera, metix, mentixis, masterpiece, award winning,"
-global_prompt = trigger_word + global_prompt
-entity_prompts = ["ancient palace", "stone staircase with railings", "a traditional monk", "a traditional monk"]
-example(pipe, [27], 3, global_prompt, entity_prompts)
-
-# example 4
-global_prompt = "A beautiful girl wearing shirt and shorts in the street,  holding a sign 'Entity Control'"
-global_prompt = trigger_word + global_prompt
-entity_prompts = ["A beautiful girl", "sign 'Entity Control'", "shorts", "shirt"]
-example(pipe, [21], 4, global_prompt, entity_prompts)
-
-# example 5
-global_prompt = "A captivating, dramatic scene in a painting that exudes mystery and foreboding. A white sky, swirling blue clouds, and a crescent yellow moon illuminate a solitary woman standing near the water's edge. Her long dress flows in the wind, silhouetted against the eerie glow. The water mirrors the fiery sky and moonlight, amplifying the uneasy atmosphere."
-global_prompt = trigger_word + global_prompt
-entity_prompts = ["crescent yellow moon", "a solitary woman", "water", "swirling blue clouds"]
-example(pipe, [0], 5, global_prompt, entity_prompts)
-
-# example 6
-global_prompt = "Snow White and the 6 Dwarfs."
-global_prompt = trigger_word + global_prompt
-entity_prompts = ["Dwarf 1", "Dwarf 2", "Dwarf 3", "Snow White", "Dwarf 4", "Dwarf 5", "Dwarf 6"]
-example(pipe, [8], 6, global_prompt, entity_prompts)
-
-# example 7, same prompt with different seeds
-seeds = range(5, 9)
-global_prompt = "A beautiful woman wearing white dress, holding a mirror, with a warm light background;"
-global_prompt = trigger_word + global_prompt
-entity_prompts = ["A beautiful woman", "mirror", "necklace", "glasses", "earring", "white dress", "jewelry headpiece"]
-example(pipe, seeds, 7, global_prompt, entity_prompts)
--- a/examples/EntityControl/utils.py
+++ b/examples/EntityControl/utils.py
@@ -1,59 +0,0 @@
-from PIL import Image, ImageDraw, ImageFont
-import random
-
-def visualize_masks(image, masks, mask_prompts, output_path, font_size=35, use_random_colors=False):
-    # Create a blank image for overlays
-    overlay = Image.new('RGBA', image.size, (0, 0, 0, 0))
-    
-    colors = [
-        (165, 238, 173, 80),
-        (76, 102, 221, 80),
-        (221, 160, 77, 80),
-        (204, 93, 71, 80),
-        (145, 187, 149, 80),
-        (134, 141, 172, 80),
-        (157, 137, 109, 80),
-        (153, 104, 95, 80),
-        (165, 238, 173, 80),
-        (76, 102, 221, 80),
-        (221, 160, 77, 80),
-        (204, 93, 71, 80),
-        (145, 187, 149, 80),
-        (134, 141, 172, 80),
-        (157, 137, 109, 80),
-        (153, 104, 95, 80),
-    ]
-    # Generate random colors for each mask
-    if use_random_colors:
-        colors = [(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255), 80) for _ in range(len(masks))]
-    
-    # Font settings
-    try:
-        font = ImageFont.truetype("arial", font_size)  # Adjust as needed
-    except IOError:
-        font = ImageFont.load_default(font_size)
-
-    # Overlay each mask onto the overlay image
-    for mask, mask_prompt, color in zip(masks, mask_prompts, colors):
-        # Convert mask to RGBA mode
-        mask_rgba = mask.convert('RGBA')
-        mask_data = mask_rgba.getdata()
-        new_data = [(color if item[:3] == (255, 255, 255) else (0, 0, 0, 0)) for item in mask_data]
-        mask_rgba.putdata(new_data)
-
-        # Draw the mask prompt text on the mask
-        draw = ImageDraw.Draw(mask_rgba)
-        mask_bbox = mask.getbbox()  # Get the bounding box of the mask
-        text_position = (mask_bbox[0] + 10, mask_bbox[1] + 10)  # Adjust text position based on mask position
-        draw.text(text_position, mask_prompt, fill=(255, 255, 255, 255), font=font)
-
-        # Alpha composite the overlay with this mask
-        overlay = Image.alpha_composite(overlay, mask_rgba)
-    
-    # Composite the overlay onto the original image
-    result = Image.alpha_composite(image.convert('RGBA'), overlay)
-    
-    # Save or display the resulting image
-    result.save(output_path)
-
-    return result
--- a/examples/ExVideo/ExVideo_cogvideox_test.py
+++ b/examples/ExVideo/ExVideo_cogvideox_test.py
@@ -1,21 +0,0 @@
-from diffsynth import ModelManager, CogVideoPipeline, save_video, download_models
-import torch
-
-
-download_models(["CogVideoX-5B", "ExVideo-CogVideoX-LoRA-129f-v1"])
-model_manager = ModelManager(torch_dtype=torch.bfloat16)
-model_manager.load_models([
-    "models/CogVideo/CogVideoX-5b/text_encoder",
-    "models/CogVideo/CogVideoX-5b/transformer",
-    "models/CogVideo/CogVideoX-5b/vae/diffusion_pytorch_model.safetensors",
-])
-model_manager.load_lora("models/lora/ExVideo-CogVideoX-LoRA-129f-v1.safetensors")
-pipe = CogVideoPipeline.from_model_manager(model_manager)
-
-torch.manual_seed(6)
-video = pipe(
-    prompt="an astronaut riding a horse on Mars.",
-    height=480, width=720, num_frames=129,
-    cfg_scale=7.0, num_inference_steps=100,
-)
-save_video(video, "video_with_lora.mp4", fps=8, quality=5)
--- a/examples/ExVideo/ExVideo_ema.py
+++ b/examples/ExVideo/ExVideo_ema.py
@@ -1,64 +0,0 @@
-import torch, os, argparse
-from safetensors.torch import save_file
-
-
-def load_pl_state_dict(file_path):
-    print(f"loading {file_path}")
-    state_dict = torch.load(file_path, map_location="cpu")
-    trainable_param_names = set(state_dict["trainable_param_names"])
-    if "module" in state_dict:
-        state_dict = state_dict["module"]
-    if "state_dict" in state_dict:
-        state_dict = state_dict["state_dict"]
-    state_dict_ = {}
-    for name, param in state_dict.items():
-        if name.startswith("_forward_module."):
-            name = name[len("_forward_module."):]
-        if name.startswith("unet."):
-            name = name[len("unet."):]
-        if name in trainable_param_names:
-            state_dict_[name] = param
-    return state_dict_
-
-
-def ckpt_to_epochs(ckpt_name):
-    return int(ckpt_name.split("=")[1].split("-")[0])
-
-
-def parse_args():
-    parser = argparse.ArgumentParser(description="Simple example of a training script.")
-    parser.add_argument(
-        "--output_path",
-        type=str,
-        default="./",
-        help="Path to save the model.",
-    )
-    parser.add_argument(
-        "--gamma",
-        type=float,
-        default=0.9,
-        help="Gamma in EMA.",
-    )
-    args = parser.parse_args()
-    return args
-
-
-if __name__ == '__main__':
-    # args
-    args = parse_args() 
-    folder = args.output_path
-    gamma = args.gamma
-
-    # EMA
-    ckpt_list = sorted([(ckpt_to_epochs(ckpt_name), ckpt_name) for ckpt_name in os.listdir(folder) if os.path.isdir(f"{folder}/{ckpt_name}")])
-    state_dict_ema = None
-    for epochs, ckpt_name in ckpt_list:
-        state_dict = load_pl_state_dict(f"{folder}/{ckpt_name}/checkpoint/mp_rank_00_model_states.pt")
-        if state_dict_ema is None:
-            state_dict_ema = {name: param.float() for name, param in state_dict.items()}
-        else:
-            for name, param in state_dict.items():
-                state_dict_ema[name] = state_dict_ema[name] * gamma + param.float() * (1 - gamma)
-        save_path = ckpt_name.replace(".ckpt", "-ema.safetensors")
-        print(f"save to {folder}/{save_path}")
-        save_file(state_dict_ema, f"{folder}/{save_path}")
--- a/examples/ExVideo/ExVideo_svd_test.py
+++ b/examples/ExVideo/ExVideo_svd_test.py
@@ -1,114 +0,0 @@
-from diffsynth import save_video, ModelManager, SVDVideoPipeline, HunyuanDiTImagePipeline, download_models
-from diffsynth import ModelManager
-import torch, os
-
-# The models will be downloaded automatically.
-# You can also use the following urls to download them manually.
-
-# Download models (from Huggingface)
-#   Text-to-image model:
-#     `models/HunyuanDiT/t2i/clip_text_encoder/pytorch_model.bin`: [link](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/resolve/main/t2i/clip_text_encoder/pytorch_model.bin)
-#     `models/HunyuanDiT/t2i/mt5/pytorch_model.bin`: [link](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/resolve/main/t2i/mt5/pytorch_model.bin)
-#     `models/HunyuanDiT/t2i/model/pytorch_model_ema.pt`: [link](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/resolve/main/t2i/model/pytorch_model_ema.pt)
-#     `models/HunyuanDiT/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin`: [link](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/resolve/main/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin)
-#   Stable Video Diffusion model:
-#     `models/stable_video_diffusion/svd_xt.safetensors`: [link](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/svd_xt.safetensors)
-#   ExVideo extension blocks:
-#     `models/stable_video_diffusion/model.fp16.safetensors`: [link](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1/resolve/main/model.fp16.safetensors)
-
-# Download models (from Modelscope)
-#   Text-to-image model:
-#     `models/HunyuanDiT/t2i/clip_text_encoder/pytorch_model.bin`: [link](https://www.modelscope.cn/api/v1/models/modelscope/HunyuanDiT/repo?Revision=master&FilePath=t2i%2Fclip_text_encoder%2Fpytorch_model.bin)
-#     `models/HunyuanDiT/t2i/mt5/pytorch_model.bin`: [link](https://www.modelscope.cn/api/v1/models/modelscope/HunyuanDiT/repo?Revision=master&FilePath=t2i%2Fmt5%2Fpytorch_model.bin)
-#     `models/HunyuanDiT/t2i/model/pytorch_model_ema.pt`: [link](https://www.modelscope.cn/api/v1/models/modelscope/HunyuanDiT/repo?Revision=master&FilePath=t2i%2Fmodel%2Fpytorch_model_ema.pt)
-#     `models/HunyuanDiT/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin`: [link](https://www.modelscope.cn/api/v1/models/modelscope/HunyuanDiT/repo?Revision=master&FilePath=t2i%2Fsdxl-vae-fp16-fix%2Fdiffusion_pytorch_model.bin)
-#   Stable Video Diffusion model:
-#     `models/stable_video_diffusion/svd_xt.safetensors`: [link](https://www.modelscope.cn/api/v1/models/AI-ModelScope/stable-video-diffusion-img2vid-xt/repo?Revision=master&FilePath=svd_xt.safetensors)
-#   ExVideo extension blocks:
-#     `models/stable_video_diffusion/model.fp16.safetensors`: [link](https://modelscope.cn/api/v1/models/ECNU-CILab/ExVideo-SVD-128f-v1/repo?Revision=master&FilePath=model.fp16.safetensors)
-
-
-def generate_image():
-    # Load models
-    os.environ["TOKENIZERS_PARALLELISM"] = "True"
-    download_models(["HunyuanDiT"])
-    model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
-                                 file_path_list=[
-                                     "models/HunyuanDiT/t2i/clip_text_encoder/pytorch_model.bin",
-                                     "models/HunyuanDiT/t2i/mt5/pytorch_model.bin",
-                                     "models/HunyuanDiT/t2i/model/pytorch_model_ema.pt",
-                                     "models/HunyuanDiT/t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin",
-                                 ])
-    pipe = HunyuanDiTImagePipeline.from_model_manager(model_manager)
-
-    # Generate an image
-    torch.manual_seed(0)
-    image = pipe(
-        prompt="bonfire, on the stone",
-        negative_prompt="错误的眼睛，糟糕的人脸，毁容，糟糕的艺术，变形，多余的肢体，模糊的颜色，模糊，重复，病态，残缺，",
-        num_inference_steps=50, height=1024, width=1024,
-    )
-    model_manager.to("cpu")
-    return image
-
-
-def generate_video(image):
-    # Load models
-    download_models(["stable-video-diffusion-img2vid-xt", "ExVideo-SVD-128f-v1"])
-    model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
-                                 file_path_list=[
-                                     "models/stable_video_diffusion/svd_xt.safetensors",
-                                     "models/stable_video_diffusion/model.fp16.safetensors",
-                                 ])
-    pipe = SVDVideoPipeline.from_model_manager(model_manager)
-
-    # Generate a video
-    torch.manual_seed(1)
-    video = pipe(
-        input_image=image.resize((512, 512)),
-        num_frames=128, fps=30, height=512, width=512,
-        motion_bucket_id=127,
-        num_inference_steps=50,
-        min_cfg_scale=2, max_cfg_scale=2, contrast_enhance_scale=1.2
-    )
-    model_manager.to("cpu")
-    return video
-
-
-def upscale_video(image, video):
-    # Load models
-    download_models(["stable-video-diffusion-img2vid-xt", "ExVideo-SVD-128f-v1"])
-    model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
-                                 file_path_list=[
-                                     "models/stable_video_diffusion/svd_xt.safetensors",
-                                     "models/stable_video_diffusion/model.fp16.safetensors",
-                                 ])
-    pipe = SVDVideoPipeline.from_model_manager(model_manager)
-
-    # Generate a video
-    torch.manual_seed(2)
-    video = pipe(
-        input_image=image.resize((1024, 1024)),
-        input_video=[frame.resize((1024, 1024)) for frame in video], denoising_strength=0.5,
-        num_frames=128, fps=30, height=1024, width=1024,
-        motion_bucket_id=127,
-        num_inference_steps=25,
-        min_cfg_scale=2, max_cfg_scale=2, contrast_enhance_scale=1.2
-    )
-    model_manager.to("cpu")
-    return video
-
-
-# We use Hunyuan DiT to generate the first frame. 10GB VRAM is required.
-# If you want to use your own image,
-# please use `image = Image.open("your_image_file.png")` to replace the following code.
-image = generate_image()
-image.save("image.png")
-
-# Now, generate a video with resolution of 512. 20GB VRAM is required.
-video = generate_video(image)
-save_video(video, "video_512.mp4", fps=30)
-
-# Upscale the video. 52GB VRAM is required.
-video = upscale_video(image, video)
-save_video(video, "video_1024.mp4", fps=30)
--- a/examples/ExVideo/ExVideo_svd_train.py
+++ b/examples/ExVideo/ExVideo_svd_train.py
@@ -1,364 +0,0 @@
-import torch, json, os, imageio, argparse
-from torchvision.transforms import v2
-import numpy as np
-from einops import rearrange, repeat
-import lightning as pl
-from diffsynth import ModelManager, SVDImageEncoder, SVDUNet, SVDVAEEncoder, ContinuousODEScheduler, load_state_dict
-from diffsynth.pipelines.svd_video import SVDCLIPImageProcessor
-from diffsynth.models.svd_unet import TemporalAttentionBlock
-
-
-
-class TextVideoDataset(torch.utils.data.Dataset):
-    def __init__(self, base_path, metadata_path, steps_per_epoch=10000, training_shapes=[(128, 1, 128, 512, 512)]):
-        with open(metadata_path, "r") as f:
-            metadata = json.load(f)
-        self.path = [os.path.join(base_path, i["path"]) for i in metadata]
-        self.steps_per_epoch = steps_per_epoch
-        self.training_shapes = training_shapes
-
-        self.frame_process = []
-        for max_num_frames, interval, num_frames, height, width in training_shapes:
-            self.frame_process.append(v2.Compose([
-                v2.Resize(size=max(height, width), antialias=True),
-                v2.CenterCrop(size=(height, width)),
-                v2.Normalize(mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5]),
-            ]))
-
-
-    def load_frames_using_imageio(self, file_path, max_num_frames, start_frame_id, interval, num_frames, frame_process):
-        reader = imageio.get_reader(file_path)
-        if reader.count_frames() < max_num_frames or reader.count_frames() - 1 < start_frame_id + (num_frames - 1) * interval:
-            reader.close()
-            return None
-        
-        frames = []
-        for frame_id in range(num_frames):
-            frame = reader.get_data(start_frame_id + frame_id * interval)
-            frame = torch.tensor(frame, dtype=torch.float32)
-            frame = rearrange(frame, "H W C -> 1 C H W")
-            frame = frame_process(frame)
-            frames.append(frame)
-        reader.close()
-
-        frames = torch.concat(frames, dim=0)
-        frames = rearrange(frames, "T C H W -> C T H W")
-
-        return frames
-
-
-    def load_video(self, file_path, training_shape_id):
-        data = {}
-        max_num_frames, interval, num_frames, height, width = self.training_shapes[training_shape_id]
-        frame_process = self.frame_process[training_shape_id]
-        start_frame_id = torch.randint(0, max_num_frames - (num_frames - 1) * interval, (1,))[0]
-        frames = self.load_frames_using_imageio(file_path, max_num_frames, start_frame_id, interval, num_frames, frame_process)
-        if frames is None:
-            return None
-        else:
-            data[f"frames_{training_shape_id}"] = frames
-        return data
-
-
-    def __getitem__(self, index):
-        video_data = {}
-        for training_shape_id in range(len(self.training_shapes)):
-            while True:
-                data_id = torch.randint(0, len(self.path), (1,))[0]
-                data_id = (data_id + index) % len(self.path) # For fixed seed.
-                video_file = self.path[data_id]
-                try:
-                    data = self.load_video(video_file, training_shape_id)
-                except:
-                    data = None
-                if data is not None:
-                    break
-            video_data.update(data)
-        return video_data
-    
-
-    def __len__(self):
-        return self.steps_per_epoch
-
-
-
-class MotionBucketManager:
-    def __init__(self):
-        self.thresholds = [
-            0.000000000, 0.012205946, 0.015117834, 0.018080613, 0.020614484, 0.021959992, 0.024088068, 0.026323952, 
-            0.028277775, 0.029968588, 0.031836554, 0.033596724, 0.035121530, 0.037200287, 0.038914755, 0.040696491, 
-            0.042368013, 0.044265781, 0.046311017, 0.048243891, 0.050294187, 0.052142400, 0.053634230, 0.055612389, 
-            0.057594258, 0.059410289, 0.061283995, 0.063603796, 0.065192916, 0.067146860, 0.069066539, 0.070390493, 
-            0.072588451, 0.073959745, 0.075889029, 0.077695683, 0.079783581, 0.082162730, 0.084092639, 0.085958421, 
-            0.087700523, 0.089684933, 0.091688842, 0.093335517, 0.094987206, 0.096664011, 0.098314710, 0.100262381, 
-            0.101984538, 0.103404313, 0.105280340, 0.106974818, 0.109028399, 0.111164779, 0.113065213, 0.114362158, 
-            0.116407216, 0.118063427, 0.119524263, 0.121835820, 0.124242283, 0.126202747, 0.128989249, 0.131672353, 
-            0.133417681, 0.135567948, 0.137313649, 0.139189199, 0.140912935, 0.143525436, 0.145718485, 0.148315132, 
-            0.151039496, 0.153218940, 0.155252382, 0.157651082, 0.159966752, 0.162195817, 0.164811596, 0.167341709, 
-            0.170251891, 0.172651157, 0.175550997, 0.178372145, 0.181039348, 0.183565900, 0.186599866, 0.190071866, 
-            0.192574754, 0.195026234, 0.198099136, 0.200210452, 0.202522039, 0.205410406, 0.208610669, 0.211623028, 
-            0.214723110, 0.218520239, 0.222194016, 0.225363150, 0.229384825, 0.233422622, 0.237012610, 0.240735114, 
-            0.243622541, 0.247465774, 0.252190471, 0.257356376, 0.261856794, 0.266556412, 0.271076709, 0.277361482, 
-            0.281250387, 0.286582440, 0.291158527, 0.296712339, 0.303008437, 0.311793238, 0.318485111, 0.326999635, 
-            0.332138240, 0.341770738, 0.354188830, 0.365194678, 0.379234344, 0.401538879, 0.416078776, 0.440871328,
-        ]
-
-    def get_motion_score(self, frames):
-        score = frames.std(dim=2).mean(dim=[1, 2, 3]).tolist()
-        return score
-    
-    def get_bucket_id(self, motion_score):
-        for bucket_id in range(len(self.thresholds) - 1):
-            if self.thresholds[bucket_id + 1] > motion_score:
-                return bucket_id
-        return len(self.thresholds) - 1
-
-    def __call__(self, frames):
-        scores = self.get_motion_score(frames)
-        bucket_ids = [self.get_bucket_id(score) for score in scores]
-        return bucket_ids
-
-
-
-class LightningModel(pl.LightningModule):
-    def __init__(self, learning_rate=1e-5, svd_ckpt_path=None, add_positional_conv=128, contrast_enhance_scale=1.01):
-        super().__init__()
-        state_dict = load_state_dict(svd_ckpt_path)
-
-        self.image_encoder = SVDImageEncoder().to(dtype=torch.float16, device=self.device)
-        self.image_encoder.load_state_dict(SVDImageEncoder.state_dict_converter().from_civitai(state_dict))
-        self.image_encoder.eval()
-        self.image_encoder.requires_grad_(False)
-
-        self.unet = SVDUNet(add_positional_conv=add_positional_conv).to(dtype=torch.float16, device=self.device)
-        self.unet.load_state_dict(SVDUNet.state_dict_converter().from_civitai(state_dict, add_positional_conv=add_positional_conv), strict=False)
-        self.unet.train()
-        self.unet.requires_grad_(False)
-        for block in self.unet.blocks:
-            if isinstance(block, TemporalAttentionBlock):
-                block.requires_grad_(True)
-
-        self.vae_encoder = SVDVAEEncoder().to(dtype=torch.float16, device=self.device)
-        self.vae_encoder.load_state_dict(SVDVAEEncoder.state_dict_converter().from_civitai(state_dict))
-        self.vae_encoder.eval()
-        self.vae_encoder.requires_grad_(False)
-
-        self.noise_scheduler = ContinuousODEScheduler(num_inference_steps=1000)
-        self.learning_rate = learning_rate
-
-        self.motion_bucket_manager = MotionBucketManager()
-        self.contrast_enhance_scale = contrast_enhance_scale
-
-
-    def encode_image_with_clip(self, image):
-        image = SVDCLIPImageProcessor().resize_with_antialiasing(image, (224, 224))
-        image = (image + 1.0) / 2.0
-        mean = torch.tensor([0.48145466, 0.4578275, 0.40821073]).reshape(1, 3, 1, 1).to(device=self.device, dtype=self.dtype)
-        std = torch.tensor([0.26862954, 0.26130258, 0.27577711]).reshape(1, 3, 1, 1).to(device=self.device, dtype=self.dtype)
-        image = (image - mean) / std
-        image_emb = self.image_encoder(image)
-        return image_emb
-    
-
-    def encode_video_with_vae(self, video):
-        video = video.to(device=self.device, dtype=self.dtype)
-        video = video.unsqueeze(0)
-        latents = self.vae_encoder.encode_video(video)
-        latents = rearrange(latents[0], "C T H W -> T C H W")
-        return latents
-    
-
-    def tensor2video(self, frames):
-        frames = rearrange(frames, "C T H W -> T H W C")
-        frames = ((frames.float() + 1) * 127.5).clip(0, 255).cpu().numpy().astype(np.uint8)
-        return frames
-
-
-    def calculate_loss(self, frames):
-        with torch.no_grad():
-            # Call video encoder
-            latents = self.encode_video_with_vae(frames)
-            image_emb_vae = repeat(latents[0] / self.vae_encoder.scaling_factor, "C H W -> T C H W", T=frames.shape[1])
-            image_emb_clip = self.encode_image_with_clip(frames[:,0].unsqueeze(0))
-
-            # Call scheduler
-            timestep = torch.randint(0, len(self.noise_scheduler.timesteps), (1,))[0]
-            timestep = self.noise_scheduler.timesteps[timestep]
-            noise = torch.randn_like(latents)
-            noisy_latents = self.noise_scheduler.add_noise(latents, noise, timestep)
-
-            # Prepare positional id
-            fps = 30
-            motion_bucket_id = self.motion_bucket_manager(frames.unsqueeze(0))[0]
-            noise_aug_strength = 0
-            add_time_id = torch.tensor([[fps-1, motion_bucket_id, noise_aug_strength]], device=self.device)
-
-        # Calculate loss
-        latents_input = torch.cat([noisy_latents, image_emb_vae], dim=1)
-        model_pred = self.unet(latents_input, timestep, image_emb_clip, add_time_id, use_gradient_checkpointing=True)
-        latents_output = self.noise_scheduler.step(model_pred.float(), timestep, noisy_latents.float(), to_final=True)
-        loss = torch.nn.functional.mse_loss(latents_output, latents.float() * self.contrast_enhance_scale, reduction="mean")
-
-        # Re-weighting
-        reweighted_loss = loss * self.noise_scheduler.training_weight(timestep)
-        return loss, reweighted_loss
-    
-
-    def training_step(self, batch, batch_idx):
-        # Loss
-        frames = batch["frames_0"][0]
-        loss, reweighted_loss = self.calculate_loss(frames)
-
-        # Record log
-        self.log("train_loss", loss, prog_bar=True)
-        self.log("reweighted_train_loss", reweighted_loss, prog_bar=True)
-        return reweighted_loss
-
-
-    def configure_optimizers(self):
-        trainable_modules = []
-        for block in self.unet.blocks:
-            if isinstance(block, TemporalAttentionBlock):
-                trainable_modules += block.parameters()
-        optimizer = torch.optim.AdamW(trainable_modules, lr=self.learning_rate)
-        return optimizer
-    
-
-    def on_save_checkpoint(self, checkpoint):
-        trainable_param_names = list(filter(lambda named_param: named_param[1].requires_grad, self.unet.named_parameters()))
-        trainable_param_names = [named_param[0] for named_param in trainable_param_names]
-        checkpoint["trainable_param_names"] = trainable_param_names
-
-
-
-def parse_args():
-    parser = argparse.ArgumentParser(description="Simple example of a training script.")
-    parser.add_argument(
-        "--pretrained_path",
-        type=str,
-        default=None,
-        required=True,
-        help="Path to pretrained model. For example, `models/stable_video_diffusion/svd_xt.safetensors`.",
-    )
-    parser.add_argument(
-        "--resume_from_checkpoint",
-        type=str,
-        default=None,
-        required=False,
-        help="Path to checkpoint, in case your training program is stopped unexpectedly and you want to resume.",
-    )
-    parser.add_argument(
-        "--dataset_path",
-        type=str,
-        default=None,
-        required=True,
-        help="The path of the Dataset.",
-    )
-    parser.add_argument(
-        "--output_path",
-        type=str,
-        default="./",
-        help="Path to save the model.",
-    )
-    parser.add_argument(
-        "--steps_per_epoch",
-        type=int,
-        default=500,
-        help="Number of steps per epoch.",
-    )
-    parser.add_argument(
-        "--num_frames",
-        type=int,
-        default=128,
-        help="Number of frames.",
-    )
-    parser.add_argument(
-        "--height",
-        type=int,
-        default=512,
-        help="Image height.",
-    )
-    parser.add_argument(
-        "--width",
-        type=int,
-        default=512,
-        help="Image width.",
-    )
-    parser.add_argument(
-        "--dataloader_num_workers",
-        type=int,
-        default=2,
-        help="Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.",
-    )
-    parser.add_argument(
-        "--learning_rate",
-        type=float,
-        default=1e-5,
-        help="Learning rate.",
-    )
-    parser.add_argument(
-        "--accumulate_grad_batches",
-        type=int,
-        default=1,
-        help="The number of batches in gradient accumulation.",
-    )
-    parser.add_argument(
-        "--max_epochs",
-        type=int,
-        default=1,
-        help="Number of epochs.",
-    )
-    parser.add_argument(
-        "--contrast_enhance_scale",
-        type=float,
-        default=1.01,
-        help="Avoid generating gray videos.",
-    )
-    args = parser.parse_args()
-    return args
-
-
-if __name__ == '__main__':
-    # args
-    args = parse_args()
-
-    # dataset and data loader
-    dataset = TextVideoDataset(
-        args.dataset_path,
-        os.path.join(args.dataset_path, "metadata.json"),
-        training_shapes=[(args.num_frames, 1, args.num_frames, args.height, args.width)],
-        steps_per_epoch=args.steps_per_epoch,
-    )
-    train_loader = torch.utils.data.DataLoader(
-        dataset,
-        shuffle=True,
-        # We don't support batch_size > 1,
-        # because sometimes our GPU cannot process even one video.
-        batch_size=1,
-        num_workers=args.dataloader_num_workers
-    )
-
-    # model
-    model = LightningModel(
-        learning_rate=args.learning_rate,
-        svd_ckpt_path=args.pretrained_path,
-        add_positional_conv=args.num_frames,
-        contrast_enhance_scale=args.contrast_enhance_scale
-    )
-
-    # train
-    trainer = pl.Trainer(
-        max_epochs=args.max_epochs,
-        accelerator="gpu",
-        devices="auto",
-        strategy="deepspeed_stage_2",
-        precision="16-mixed",
-        default_root_dir=args.output_path,
-        accumulate_grad_batches=args.accumulate_grad_batches,
-        callbacks=[pl.pytorch.callbacks.ModelCheckpoint(save_top_k=-1)]
-    )
-    trainer.fit(
-        model=model,
-        train_dataloaders=train_loader,
-        ckpt_path=args.resume_from_checkpoint
-    )
--- a/examples/ExVideo/README.md
+++ b/examples/ExVideo/README.md
@@ -1,89 +0,0 @@
-# ExVideo
-
-ExVideo is a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.
-
-* [Project Page](https://ecnu-cilab.github.io/ExVideoProjectPage/)
-* [Technical report](https://arxiv.org/abs/2406.14130)
-* **[New]** Extended models (ExVideo-CogVideoX)
-    * [HuggingFace](https://huggingface.co/ECNU-CILab/ExVideo-CogVideoX-LoRA-129f-v1)
-    * [ModelScope](https://modelscope.cn/models/ECNU-CILab/ExVideo-CogVideoX-LoRA-129f-v1)
-* Extended models (ExVideo-SVD)
-    * [HuggingFace](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
-    * [ModelScope](https://modelscope.cn/models/ECNU-CILab/ExVideo-SVD-128f-v1)
-
-## Example: Text-to-video via extended CogVideoX-5B
-
-Generate a video using CogVideoX-5B and our extension module. See [ExVideo_cogvideox_test.py](./ExVideo_cogvideox_test.py).
-
-https://github.com/user-attachments/assets/321ee04b-8c17-479e-8a95-8cbcf21f8d7e
-
-## Example: Text-to-video via extended Stable Video Diffusion
-
-Generate a video using a text-to-image model and our image-to-video model. See [ExVideo_svd_test.py](./ExVideo_svd_test.py).
-
-https://github.com/modelscope/DiffSynth-Studio/assets/35051019/d97f6aa9-8064-4b5b-9d49-ed6001bb9acc
-
-## Train
-
-* Step 1: Install additional packages
-
-```
-pip install lightning deepspeed
-```
-
-* Step 2: Download base model (from [HuggingFace](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/svd_xt.safetensors) or [ModelScope](https://www.modelscope.cn/api/v1/models/AI-ModelScope/stable-video-diffusion-img2vid-xt/repo?Revision=master&FilePath=svd_xt.safetensors)) to `models/stable_video_diffusion/svd_xt.safetensors`.
-
-* Step 3: Prepare datasets
-
-```
-path/to/your/dataset
-├── metadata.json
-└── videos
-    ├── video_1.mp4
-    ├── video_2.mp4
-    └── video_3.mp4
-```
-
-where the `metadata.json` is
-
-```
-[
-    {
-        "path": "videos/video_1.mp4"
-    },
-    {
-        "path": "videos/video_2.mp4"
-    },
-    {
-        "path": "videos/video_3.mp4"
-    }
-]
-```
-
-* Step 4: Run
-
-```
-CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python -u ExVideo_svd_train.py \
-  --pretrained_path "models/stable_video_diffusion/svd_xt.safetensors" \
-  --dataset_path "path/to/your/dataset" \
-  --output_path "path/to/save/models" \
-  --steps_per_epoch 8000 \
-  --num_frames 128 \
-  --height 512 \
-  --width 512 \
-  --dataloader_num_workers 2 \
-  --learning_rate 1e-5 \
-  --max_epochs 100
-```
-
-* Step 5: Post-process checkpoints
-
-Calculate Exponential Moving Average (EMA) and package it using `safetensors`.
-
-```
-python ExVideo_ema.py --output_path "path/to/save/models/lightning_logs/version_xx" --gamma 0.9
-```
-
-* Step 6: Enjoy your model
-
-The EMA model is at `path/to/save/models/lightning_logs/version_xx/checkpoints/epoch=xx-step=yyy-ema.safetensors`. Load it in [ExVideo_svd_test.py](./ExVideo_svd_test.py) and then enjoy your model.
--- a/examples/HunyuanVideo/README.md
+++ b/examples/HunyuanVideo/README.md
@@ -1,33 +0,0 @@
-# HunyuanVideo
-
-[HunyuanVideo](https://github.com/Tencent/HunyuanVideo) is a video generation model trained by Tencent. We provide advanced VRAM management for this model, including three stages:
-
-|VRAM required|Example script|Frames|Resolution|Note|
-|-|-|-|-|-|
-|80G|[hunyuanvideo_80G.py](hunyuanvideo_80G.py)|129|720*1280|No VRAM management.|
-|24G|[hunyuanvideo_24G.py](hunyuanvideo_24G.py)|129|720*1280|The video is consistent with the original implementation, but it requires 5%~10% more time than [hunyuanvideo_80G.py](hunyuanvideo_80G.py)|
-|6G|[hunyuanvideo_6G.py](hunyuanvideo_6G.py)|129|512*384|The base model doesn't support low resolutions. We recommend users to use some LoRA ([example](https://civitai.com/models/1032126/walking-animation-hunyuan-video)) trained using low resolutions.|
-
-[HunyuanVideo-I2V](https://github.com/Tencent/HunyuanVideo-I2V) is the image-to-video generation version of HunyuanVideo. We also provide advanced VRAM management for this model.
-|VRAM required|Example script|Frames|Resolution|Note|
-|-|-|-|-|-|
-|80G|[hunyuanvideo_i2v_80G.py](hunyuanvideo_i2v_80G.py)|129|720p|No VRAM management.|
-|24G|[hunyuanvideo_i2v_24G.py](hunyuanvideo_i2v_24G.py)|129|720p|The video is consistent with the original implementation, but it requires 5%~10% more time than [hunyuanvideo_80G.py](hunyuanvideo_80G.py)|
-
-## Gallery
-
-Video generated by [hunyuanvideo_80G.py](hunyuanvideo_80G.py) and [hunyuanvideo_24G.py](hunyuanvideo_24G.py):
-
-https://github.com/user-attachments/assets/48dd24bb-0cc6-40d2-88c3-10feed3267e9
-
-Video generated by [hunyuanvideo_6G.py](hunyuanvideo_6G.py) using [this LoRA](https://civitai.com/models/1032126/walking-animation-hunyuan-video):
-
-https://github.com/user-attachments/assets/2997f107-d02d-4ecb-89bb-5ce1a7f93817
-
-Video to video generated by [hunyuanvideo_v2v_6G.py](./hunyuanvideo_v2v_6G.py) using [this LoRA](https://civitai.com/models/1032126/walking-animation-hunyuan-video):
-
-https://github.com/user-attachments/assets/4b89e52e-ce42-434e-aa57-08f09dfa2b10
-
-Video generated by [hunyuanvideo_i2v_80G.py](hunyuanvideo_i2v_80G.py) and [hunyuanvideo_i2v_24G.py](hunyuanvideo_i2v_24G.py):
-
-https://github.com/user-attachments/assets/494f252a-c9af-440d-84ba-a8ddcdcc538a
--- a/examples/HunyuanVideo/hunyuanvideo_24G.py
+++ b/examples/HunyuanVideo/hunyuanvideo_24G.py
@@ -1,42 +0,0 @@
-import torch
-torch.cuda.set_per_process_memory_fraction(1.0, 0)
-from diffsynth import ModelManager, HunyuanVideoPipeline, download_models, save_video
-
-
-download_models(["HunyuanVideo"])
-model_manager = ModelManager()
-
-# The DiT model is loaded in bfloat16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideo/transformers/mp_rank_00_model_states.pt"
-    ],
-    torch_dtype=torch.bfloat16, # you can use torch_dtype=torch.float8_e4m3fn to enable quantization.
-    device="cpu"
-)
-
-# The other modules are loaded in float16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideo/text_encoder/model.safetensors",
-        "models/HunyuanVideo/text_encoder_2",
-        "models/HunyuanVideo/vae/pytorch_model.pt",
-    ],
-    torch_dtype=torch.float16,
-    device="cpu"
-)
-
-# We support LoRA inference. You can use the following code to load your LoRA model.
-# model_manager.load_lora("models/lora/xxx.safetensors", lora_alpha=1.0)
-
-# The computation device is "cuda".
-pipe = HunyuanVideoPipeline.from_model_manager(
-    model_manager,
-    torch_dtype=torch.bfloat16,
-    device="cuda"
-)
-
-# Enjoy!
-prompt = "CG, masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait. The girl's flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her."
-video = pipe(prompt, seed=0)
-save_video(video, "video_girl.mp4", fps=30, quality=6)
--- a/examples/HunyuanVideo/hunyuanvideo_6G.py
+++ b/examples/HunyuanVideo/hunyuanvideo_6G.py
@@ -1,52 +0,0 @@
-import torch
-torch.cuda.set_per_process_memory_fraction(1.0, 0)
-from diffsynth import ModelManager, HunyuanVideoPipeline, download_models, save_video, FlowMatchScheduler, download_customized_models
-
-
-download_models(["HunyuanVideo"])
-model_manager = ModelManager()
-
-# The DiT model is loaded in bfloat16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideo/transformers/mp_rank_00_model_states.pt"
-    ],
-    torch_dtype=torch.bfloat16, # you can use torch_dtype=torch.float8_e4m3fn to enable quantization.
-    device="cpu"
-)
-
-# The other modules are loaded in float16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideo/text_encoder/model.safetensors",
-        "models/HunyuanVideo/text_encoder_2",
-        "models/HunyuanVideo/vae/pytorch_model.pt",
-    ],
-    torch_dtype=torch.float16,
-    device="cpu"
-)
-
-# We support LoRA inference. You can use the following code to load your LoRA model.
-# Example LoRA: https://civitai.com/models/1032126/walking-animation-hunyuan-video
-download_customized_models(
-    model_id="AI-ModelScope/walking_animation_hunyuan_video",
-    origin_file_path="kxsr_walking_anim_v1-5.safetensors",
-    local_dir="models/lora"
-)[0]
-model_manager.load_lora("models/lora/kxsr_walking_anim_v1-5.safetensors", lora_alpha=1.0)
-
-# The computation device is "cuda".
-pipe = HunyuanVideoPipeline.from_model_manager(
-    model_manager,
-    torch_dtype=torch.bfloat16,
-    device="cuda"
-)
-# This LoRA requires shift=9.0.
-pipe.scheduler = FlowMatchScheduler(shift=9.0, sigma_min=0.0, extra_one_step=True)
-
-# Enjoy!
-for clothes_up in ["white t-shirt", "black t-shirt", "orange t-shirt"]:
-    for clothes_down in ["blue sports skirt", "red sports skirt", "white sports skirt"]:
-        prompt = f"kxsr, full body, no crop, A 3D-rendered CG animation video featuring a Gorgeous, mature, curvaceous, fair-skinned female girl with long silver hair and blue eyes. She wears a {clothes_up} and a {clothes_down}, walking offering a sense of fluid movement and vivid animation."
-        video = pipe(prompt, seed=0, height=512, width=384, num_frames=129, num_inference_steps=18, tile_size=(17, 16, 16), tile_stride=(12, 12, 12))
-        save_video(video, f"video-{clothes_up}-{clothes_down}.mp4", fps=30, quality=6)
--- a/examples/HunyuanVideo/hunyuanvideo_80G.py
+++ b/examples/HunyuanVideo/hunyuanvideo_80G.py
@@ -1,45 +0,0 @@
-import torch
-torch.cuda.set_per_process_memory_fraction(1.0, 0)
-from diffsynth import ModelManager, HunyuanVideoPipeline, download_models, save_video
-
-
-download_models(["HunyuanVideo"])
-model_manager = ModelManager()
-
-# The DiT model is loaded in bfloat16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideo/transformers/mp_rank_00_model_states.pt"
-    ],
-    torch_dtype=torch.bfloat16, # you can use torch_dtype=torch.float8_e4m3fn to enable quantization.
-    device="cuda"
-)
-
-# The other modules are loaded in float16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideo/text_encoder/model.safetensors",
-        "models/HunyuanVideo/text_encoder_2",
-        "models/HunyuanVideo/vae/pytorch_model.pt",
-    ],
-    torch_dtype=torch.float16,
-    device="cuda"
-)
-
-# We support LoRA inference. You can use the following code to load your LoRA model.
-# model_manager.load_lora("models/lora/xxx.safetensors", lora_alpha=1.0)
-
-# The computation device is "cuda".
-pipe = HunyuanVideoPipeline.from_model_manager(
-    model_manager,
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    enable_vram_management=False
-)
-# Although you have enough VRAM, we still recommend you to enable offload.
-pipe.enable_cpu_offload()
-
-# Enjoy!
-prompt = "CG, masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait. The girl's flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her."
-video = pipe(prompt, seed=0)
-save_video(video, "video.mp4", fps=30, quality=6)
--- a/examples/HunyuanVideo/hunyuanvideo_i2v_24G.py
+++ b/examples/HunyuanVideo/hunyuanvideo_i2v_24G.py
@@ -1,43 +0,0 @@
-import torch
-from diffsynth import ModelManager, HunyuanVideoPipeline, download_models, save_video
-from modelscope import dataset_snapshot_download
-from PIL import Image
-
-
-download_models(["HunyuanVideoI2V"])
-model_manager = ModelManager()
-
-# The DiT model is loaded in bfloat16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideoI2V/transformers/mp_rank_00_model_states.pt"
-    ],
-    torch_dtype=torch.bfloat16,
-    device="cpu"
-)
-
-# The other modules are loaded in float16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideoI2V/text_encoder/model.safetensors",
-        "models/HunyuanVideoI2V/text_encoder_2",
-        'models/HunyuanVideoI2V/vae/pytorch_model.pt'
-    ],
-    torch_dtype=torch.float16,
-    device="cpu"
-)
-# The computation device is "cuda".
-pipe = HunyuanVideoPipeline.from_model_manager(model_manager,
-                                               torch_dtype=torch.bfloat16,
-                                               device="cuda",
-                                               enable_vram_management=True)
-
-dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth",
-                          local_dir="./",
-                          allow_file_pattern=f"data/examples/hunyuanvideo/*")
-
-i2v_resolution = "720p"
-prompt = "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick."
-images = [Image.open("data/examples/hunyuanvideo/0.jpg").convert('RGB')]
-video = pipe(prompt, input_images=images, num_inference_steps=50, seed=0, i2v_resolution=i2v_resolution)
-save_video(video, f"video_{i2v_resolution}_low_vram.mp4", fps=30, quality=6)
--- a/examples/HunyuanVideo/hunyuanvideo_i2v_80G.py
+++ b/examples/HunyuanVideo/hunyuanvideo_i2v_80G.py
@@ -1,45 +0,0 @@
-import torch
-from diffsynth import ModelManager, HunyuanVideoPipeline, download_models, save_video
-from modelscope import dataset_snapshot_download
-from PIL import Image
-
-
-download_models(["HunyuanVideoI2V"])
-model_manager = ModelManager()
-
-# The DiT model is loaded in bfloat16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideoI2V/transformers/mp_rank_00_model_states.pt"
-    ],
-    torch_dtype=torch.bfloat16,
-    device="cuda"
-)
-
-# The other modules are loaded in float16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideoI2V/text_encoder/model.safetensors",
-        "models/HunyuanVideoI2V/text_encoder_2",
-        'models/HunyuanVideoI2V/vae/pytorch_model.pt'
-    ],
-    torch_dtype=torch.float16,
-    device="cuda"
-)
-# The computation device is "cuda".
-pipe = HunyuanVideoPipeline.from_model_manager(model_manager,
-                                               torch_dtype=torch.bfloat16,
-                                               device="cuda",
-                                               enable_vram_management=False)
-# Although you have enough VRAM, we still recommend you to enable offload.
-pipe.enable_cpu_offload()
-
-dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth",
-                          local_dir="./",
-                          allow_file_pattern=f"data/examples/hunyuanvideo/*")
-
-i2v_resolution = "720p"
-prompt = "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick."
-images = [Image.open("data/examples/hunyuanvideo/0.jpg").convert('RGB')]
-video = pipe(prompt, input_images=images, num_inference_steps=50, seed=0, i2v_resolution=i2v_resolution)
-save_video(video, f"video_{i2v_resolution}.mp4", fps=30, quality=6)
--- a/examples/HunyuanVideo/hunyuanvideo_v2v_6G.py
+++ b/examples/HunyuanVideo/hunyuanvideo_v2v_6G.py
@@ -1,55 +0,0 @@
-import torch
-torch.cuda.set_per_process_memory_fraction(1.0, 0)
-from diffsynth import ModelManager, HunyuanVideoPipeline, download_models, save_video, FlowMatchScheduler, download_customized_models
-
-
-download_models(["HunyuanVideo"])
-model_manager = ModelManager()
-
-# The DiT model is loaded in bfloat16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideo/transformers/mp_rank_00_model_states.pt"
-    ],
-    torch_dtype=torch.bfloat16, # you can use torch_dtype=torch.float8_e4m3fn to enable quantization.
-    device="cpu"
-)
-
-# The other modules are loaded in float16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideo/text_encoder/model.safetensors",
-        "models/HunyuanVideo/text_encoder_2",
-        "models/HunyuanVideo/vae/pytorch_model.pt",
-    ],
-    torch_dtype=torch.float16,
-    device="cpu"
-)
-
-# We support LoRA inference. You can use the following code to load your LoRA model.
-# Example LoRA: https://civitai.com/models/1032126/walking-animation-hunyuan-video
-download_customized_models(
-    model_id="AI-ModelScope/walking_animation_hunyuan_video",
-    origin_file_path="kxsr_walking_anim_v1-5.safetensors",
-    local_dir="models/lora"
-)[0]
-model_manager.load_lora("models/lora/kxsr_walking_anim_v1-5.safetensors", lora_alpha=1.0)
-
-# The computation device is "cuda".
-pipe = HunyuanVideoPipeline.from_model_manager(
-    model_manager,
-    torch_dtype=torch.bfloat16,
-    device="cuda"
-)
-# This LoRA requires shift=9.0.
-pipe.scheduler = FlowMatchScheduler(shift=9.0, sigma_min=0.0, extra_one_step=True)
-
-# Text-to-video
-prompt = f"kxsr, full body, no crop. A girl is walking. CG, masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait. The girl's flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her."
-video = pipe(prompt, seed=1, height=512, width=384, num_frames=129, num_inference_steps=18, tile_size=(17, 16, 16), tile_stride=(12, 12, 12))
-save_video(video, f"video.mp4", fps=30, quality=6)
-
-# Video-to-video
-prompt = f"kxsr, full body, no crop. A girl is walking. CG, masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, purple dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait. The girl's flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her."
-video = pipe(prompt, seed=1, height=512, width=384, num_frames=129, num_inference_steps=18, tile_size=(17, 16, 16), tile_stride=(12, 12, 12), input_video=video, denoising_strength=0.85)
-save_video(video, f"video_edited.mp4", fps=30, quality=6)
--- a/examples/InfiniteYou/README.md
+++ b/examples/InfiniteYou/README.md
@@ -1,7 +0,0 @@
-# InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
-We support the identity preserving feature of InfiniteYou. See [./infiniteyou.py](./infiniteyou.py) for example. The visualization of the result is shown below.
-
-|Identity Image|Generated Image|
-|-|-|
-|![man_id](https://github.com/user-attachments/assets/bbc38a91-966e-49e8-a0d7-c5467582ad1f)|![man](https://github.com/user-attachments/assets/0decd5e1-5f65-437c-98fa-90991b6f23c1)|
-|![woman_id](https://github.com/user-attachments/assets/b2894695-690e-465b-929c-61e5dc57feeb)|![woman](https://github.com/user-attachments/assets/67cc7496-c4d3-4de1-a8f1-9eb4991d95e8)|
--- a/examples/InfiniteYou/infiniteyou.py
+++ b/examples/InfiniteYou/infiniteyou.py
@@ -1,58 +0,0 @@
-import importlib
-import torch
-from diffsynth import ModelManager, FluxImagePipeline, download_models, ControlNetConfigUnit
-from modelscope import dataset_snapshot_download
-from PIL import Image
-
-if importlib.util.find_spec("facexlib") is None:
-    raise ImportError("You are using InifiniteYou. It depends on facexlib, which is not installed. Please install it with `pip install facexlib`.")
-if importlib.util.find_spec("insightface") is None:
-    raise ImportError("You are using InifiniteYou. It depends on insightface, which is not installed. Please install it with `pip install insightface`.")
-
-download_models(["InfiniteYou"])
-model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda", model_id_list=["FLUX.1-dev"])
-model_manager.load_models([
-    [
-        "models/InfiniteYou/InfuseNetModel/diffusion_pytorch_model-00001-of-00002.safetensors",
-        "models/InfiniteYou/InfuseNetModel/diffusion_pytorch_model-00002-of-00002.safetensors"
-    ],
-    "models/InfiniteYou/image_proj_model.bin",
-])
-
-
-pipe = FluxImagePipeline.from_model_manager(
-    model_manager,
-    controlnet_config_units=[
-        ControlNetConfigUnit(
-            processor_id="none",
-            model_path=[
-                'models/InfiniteYou/InfuseNetModel/diffusion_pytorch_model-00001-of-00002.safetensors',
-                'models/InfiniteYou/InfuseNetModel/diffusion_pytorch_model-00002-of-00002.safetensors'
-            ],
-            scale=1.0
-        )
-    ]
-)
-dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth", local_dir="./", allow_file_pattern=f"data/examples/infiniteyou/*")
-
-prompt = "A man, portrait, cinematic"
-id_image = "data/examples/infiniteyou/man.jpg"
-id_image = Image.open(id_image).convert('RGB')
-image = pipe(
-    prompt=prompt, seed=1,
-    infinityou_id_image=id_image, infinityou_guidance=1.0,
-    num_inference_steps=50, embedded_guidance=3.5,
-    height=1024, width=1024,
-)
-image.save("man.jpg")
-
-prompt = "A woman, portrait, cinematic"
-id_image = "data/examples/infiniteyou/woman.jpg"
-id_image = Image.open(id_image).convert('RGB')
-image = pipe(
-    prompt=prompt, seed=1,
-    infinityou_id_image=id_image, infinityou_guidance=1.0,
-    num_inference_steps=50, embedded_guidance=3.5,
-    height=1024, width=1024,
-)
-image.save("woman.jpg")
--- a/examples/Ip-Adapter/README.md
+++ b/examples/Ip-Adapter/README.md
@@ -1,44 +0,0 @@
-# IP-Adapter
-
-IP-Adapter is a interesting model, which can adopt the content or style of another image to generate a new image.
-
-## Example: Content Controlling in Stable Diffusion
-
-Based on Stable Diffusion, we can transfer the object to another scene. See [`sd_ipadapter.py`](./sd_ipadapter.py).
-
-|First, we generate a car. The prompt is "masterpiece, best quality, a car".|Next, utilizing IP-Adapter, we move the car to the road. The prompt is "masterpiece, best quality, a car running on the road".|
-|-|-|
-|![car](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/8530a2f0-f610-4269-a22c-ac6c2f21fc18)|![car_on_the_road](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/b8ccddb2-c423-46d8-bd1a-327fcc074a36)|
-
-## Example: Content and Style Controlling in Stable Diffusion XL
-
-The IP-Adapter model based on Stable Diffusion XL is more powerful. You have the option to use the content or style. See [`sdxl_ipadapter.py`](./sdxl_ipadapter.py).
-
-* Content controlling (original usage of IP-Adapter)
-
-|First, we generate a rabbit.|Next, enable IP-Adapter and let the rabbit jump.|For comparison, disable IP-Adapter to see the generated image.|
-|-|-|-|
-|![rabbit](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/4b452634-ec57-414f-897a-f8c50c74a650)|![rabbit_to_jumping_rabbit](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/b93c5495-0b77-4d97-bcd3-3942858288f2)|![rabbit_to_jumping_rabbit_without_ipa](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/52f37195-65b3-4a38-8d9b-73df37311c15)|
-
-
-* Style controlling (InstantStyle)
-
-|First, we generate a rabbit.|Next, enable InstantStyle and convert the rabbit to a cat.|For comparison, disable IP-Adapter to see the generated image.|
-|-|-|-|
-|![rabbit](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/4b452634-ec57-414f-897a-f8c50c74a650)|![rabbit_to_cat](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/a006b281-f643-4ea9-b0da-712289c96059)|![rabbit_to_cat_without_ipa](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/189bd11e-7a10-4c09-8554-0eebde9150fd)|
-
-## Example: Image Fusing (Experimental)
-
-Since IP-Adapter can control the content based on more than one image, we can do something interesting. See [`sdxl_ipadapter_multi_reference.py`](sdxl_ipadapter_multi_reference.py).
-
-We have two pokemons here:
-
-|Charizard|Pikachu|
-|-|-|
-|![](https://media.52poke.com/wiki/7/7e/006Charizard.png)|![](https://media.52poke.com/wiki/0/0d/025Pikachu.png)|
-
-Fuse!
-
-|Pikazard ???|
-|-|
-|![Pikazard](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/807cdb31-94f5-4cc2-a978-3c6a7ffedc5b)|
--- a/examples/Ip-Adapter/flux_ipadapter.py
+++ b/examples/Ip-Adapter/flux_ipadapter.py
@@ -1,38 +0,0 @@
-from diffsynth import ModelManager, download_models, FluxImagePipeline
-import torch
-
-# Download models (automatically)
-# `models/IpAdapter/InstantX/FLUX.1-dev-IP-Adapter/ip-adapter.bin`: [link](https://huggingface.co/InstantX/FLUX.1-dev-IP-Adapter/blob/main/ip-adapter.bin)
-# `models/IpAdapter/InstantX/FLUX.1-dev-IP-Adapter/image_encoder`: [link](https://huggingface.co/google/siglip-so400m-patch14-384)
-download_models(["InstantX/FLUX.1-dev-IP-Adapter", "FLUX.1-dev"])
-
-# Load models
-model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
-model_manager.load_models([
-    "models/IpAdapter/InstantX/FLUX.1-dev-IP-Adapter/ip-adapter.bin",
-    "models/IpAdapter/InstantX/FLUX.1-dev-IP-Adapter/image_encoder",
-    "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
-    "models/FLUX/FLUX.1-dev/text_encoder_2",
-    "models/FLUX/FLUX.1-dev/ae.safetensors",
-    "models/FLUX/FLUX.1-dev/flux1-dev.safetensors",
-])
-seed = 42
-pipe = FluxImagePipeline.from_model_manager(model_manager)
-torch.manual_seed(seed)
-origin_prompt = "a rabbit in a garden, colorful flowers"
-image = pipe(
-    prompt=origin_prompt,
-    cfg_scale=1.0, embedded_guidance=3.5,
-    height=1280, width=960, num_inference_steps=30
-)
-image.save("style image.jpg")
-
-torch.manual_seed(seed)
-image = pipe(
-    prompt="A piggy",
-    cfg_scale=1.0, embedded_guidance=3.5,
-    height=1280, width=960, num_inference_steps=30,
-    ipadapter_images=[image], ipadapter_scale=0.7
-)
-image.save("A piggy.jpg")
-
--- a/examples/Ip-Adapter/sd_ipadapter.py
+++ b/examples/Ip-Adapter/sd_ipadapter.py
@@ -1,38 +0,0 @@
-from diffsynth import ModelManager, SDImagePipeline, download_models
-import torch
-
-
-# Download models (automatically)
-# `models/stable_diffusion/aingdiffusion_v12.safetensors`: [link](https://civitai.com/api/download/models/229575?type=Model&format=SafeTensor&size=full&fp=fp16)
-# `models/IpAdapter/stable_diffusion/image_encoder/model.safetensors`: [link](https://huggingface.co/h94/IP-Adapter/resolve/main/models/image_encoder/model.safetensors)
-# `models/IpAdapter/stable_diffusion/ip-adapter_sd15.bin`: [link](https://huggingface.co/h94/IP-Adapter/resolve/main/models/ip-adapter_sd15.bin)
-# `models/textual_inversion/verybadimagenegative_v1.3.pt`: [link](https://civitai.com/api/download/models/25820?type=Model&format=PickleTensor&size=full&fp=fp16)
-download_models(["AingDiffusion_v12", "IP-Adapter-SD", "TextualInversion_VeryBadImageNegative_v1.3"])
-
-# Load models
-model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
-model_manager.load_models([
-    "models/stable_diffusion/aingdiffusion_v12.safetensors",
-    "models/IpAdapter/stable_diffusion/image_encoder/model.safetensors",
-    "models/IpAdapter/stable_diffusion/ip-adapter_sd15.bin"
-])
-pipe = SDImagePipeline.from_model_manager(model_manager)
-pipe.prompter.load_textual_inversions(["models/textual_inversion/verybadimagenegative_v1.3.pt"])
-
-torch.manual_seed(1)
-style_image = pipe(
-    prompt="masterpiece, best quality, a car",
-    negative_prompt="verybadimagenegative_v1.3",
-    cfg_scale=7, clip_skip=2,
-    height=512, width=512, num_inference_steps=50,
-)
-style_image.save("car.jpg")
-
-image = pipe(
-    prompt="masterpiece, best quality, a car running on the road",
-    negative_prompt="verybadimagenegative_v1.3",
-    cfg_scale=7, clip_skip=2,
-    height=512, width=512, num_inference_steps=50,
-    ipadapter_images=[style_image], ipadapter_scale=1.0
-)
-image.save("car_on_the_road.jpg")
--- a/examples/Ip-Adapter/sdxl_ipadapter.py
+++ b/examples/Ip-Adapter/sdxl_ipadapter.py
@@ -1,61 +0,0 @@
-from diffsynth import ModelManager, SDXLImagePipeline, download_models
-import torch
-
-
-# Download models (automatically)
-# `models/stable_diffusion_xl/sd_xl_base_1.0.safetensors`: [link](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors)
-# `models/IpAdapter/stable_diffusion_xl/image_encoder/model.safetensors`: [link](https://huggingface.co/h94/IP-Adapter/resolve/main/sdxl_models/image_encoder/model.safetensors)
-# `models/IpAdapter/stable_diffusion_xl/ip-adapter_sdxl.bin`: [link](https://huggingface.co/h94/IP-Adapter/resolve/main/sdxl_models/ip-adapter_sdxl.safetensors)
-download_models(["StableDiffusionXL_v1", "IP-Adapter-SDXL"])
-
-# Load models
-model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
-model_manager.load_models([
-    "models/stable_diffusion_xl/sd_xl_base_1.0.safetensors",
-    "models/IpAdapter/stable_diffusion_xl/image_encoder/model.safetensors",
-    "models/IpAdapter/stable_diffusion_xl/ip-adapter_sdxl.bin"
-])
-pipe = SDXLImagePipeline.from_model_manager(model_manager)
-
-torch.manual_seed(123456)
-style_image = pipe(
-    prompt="a rabbit in a garden, colorful flowers",
-    negative_prompt="anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured",
-    cfg_scale=5,
-    height=1024, width=1024, num_inference_steps=50,
-)
-style_image.save("rabbit.jpg")
-
-image = pipe(
-    prompt="a cat",
-    negative_prompt="",
-    cfg_scale=5,
-    height=1024, width=1024, num_inference_steps=50,
-    ipadapter_images=[style_image], ipadapter_use_instant_style=True
-)
-image.save("rabbit_to_cat.jpg")
-
-image = pipe(
-    prompt="a rabbit is jumping",
-    negative_prompt="",
-    cfg_scale=5,
-    height=1024, width=1024, num_inference_steps=50,
-    ipadapter_images=[style_image], ipadapter_use_instant_style=False, ipadapter_scale=0.5
-)
-image.save("rabbit_to_jumping_rabbit.jpg")
-
-image = pipe(
-    prompt="a cat",
-    negative_prompt="",
-    cfg_scale=5,
-    height=1024, width=1024, num_inference_steps=50,
-)
-image.save("rabbit_to_cat_without_ipa.jpg")
-
-image = pipe(
-    prompt="a rabbit is jumping",
-    negative_prompt="",
-    cfg_scale=5,
-    height=1024, width=1024, num_inference_steps=50,
-)
-image.save("rabbit_to_jumping_rabbit_without_ipa.jpg")
--- a/examples/Ip-Adapter/sdxl_ipadapter_multi_reference.py
+++ b/examples/Ip-Adapter/sdxl_ipadapter_multi_reference.py
@@ -1,34 +0,0 @@
-from diffsynth import ModelManager, SDXLImagePipeline, download_models
-import torch, requests
-from PIL import Image
-
-
-# Download models (automatically)
-# `models/stable_diffusion_xl/bluePencilXL_v200.safetensors`: [link](https://civitai.com/api/download/models/245614?type=Model&format=SafeTensor&size=pruned&fp=fp16)
-# `models/IpAdapter/stable_diffusion_xl/image_encoder/model.safetensors`: [link](https://huggingface.co/h94/IP-Adapter/resolve/main/sdxl_models/image_encoder/model.safetensors)
-# `models/IpAdapter/stable_diffusion_xl/ip-adapter_sdxl.bin`: [link](https://huggingface.co/h94/IP-Adapter/resolve/main/sdxl_models/ip-adapter_sdxl.safetensors)
-download_models(["BluePencilXL_v200", "IP-Adapter-SDXL"])
-
-# Load models
-model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
-model_manager.load_models([
-    "models/stable_diffusion_xl/bluePencilXL_v200.safetensors",
-    "models/IpAdapter/stable_diffusion_xl/image_encoder/model.safetensors",
-    "models/IpAdapter/stable_diffusion_xl/ip-adapter_sdxl.bin"
-])
-pipe = SDXLImagePipeline.from_model_manager(model_manager)
-
-image_1 = Image.open(requests.get("https://media.52poke.com/wiki/7/7e/006Charizard.png", stream=True).raw).convert("RGB").resize((1024, 1024))
-image_1.save("Charizard.jpg")
-image_2 = Image.open(requests.get("https://media.52poke.com/wiki/0/0d/025Pikachu.png", stream=True).raw).convert("RGB").resize((1024, 1024))
-image_2.save("Pikachu.jpg")
-
-torch.manual_seed(0)
-image = pipe(
-    prompt="a pokemon, maybe Charizard, maybe Pikachu",
-    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
-    cfg_scale=5,
-    height=1024, width=1024, num_inference_steps=50,
-    ipadapter_images=[image_1, image_2], ipadapter_use_instant_style=False, ipadapter_scale=0.7
-)
-image.save(f"Pikazard.jpg")
--- a/examples/TeaCache/README.md
+++ b/examples/TeaCache/README.md
@@ -1,34 +0,0 @@
-# TeaCache
-
-TeaCache ([Timestep Embedding Aware Cache](https://github.com/ali-vilab/TeaCache)) is a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps, thereby accelerating the inference.
-
-## Examples
-
-### FLUX
-
-Script: [./flux_teacache.py](./flux_teacache.py)
-
-Model: FLUX.1-dev
-
-Steps: 50
-
-GPU: A100
-
-|TeaCache is disabled|tea_cache_l1_thresh=0.2|tea_cache_l1_thresh=0.8|
-|-|-|-|
-|23s|13s|5s|
-|![image_None](https://github.com/user-attachments/assets/2bf5187a-9693-44d3-9ebb-6c33cd15443f)|![image_0 2](https://github.com/user-attachments/assets/5532ba94-c7e2-446e-a9ba-1c68c0f63350)|![image_0 8](https://github.com/user-attachments/assets/d8cfdd74-8b45-4048-b1b7-ce480aa23fa1)
-
-### Hunyuan Video
-
-Script: [./hunyuanvideo_teacache.py](./hunyuanvideo_teacache.py)
-
-Model: Hunyuan Video
-
-Steps: 30
-
-GPU: A100
-
-The following video was generated using TeaCache. It is nearly identical to [the video without TeaCache enabled](https://github.com/user-attachments/assets/48dd24bb-0cc6-40d2-88c3-10feed3267e9), but with double the speed.
-
-https://github.com/user-attachments/assets/cd9801c5-88ce-4efc-b055-2c7737166f34
--- a/examples/TeaCache/flux_teacache.py
+++ b/examples/TeaCache/flux_teacache.py
@@ -1,15 +0,0 @@
-import torch
-from diffsynth import ModelManager, FluxImagePipeline
-
-
-model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda", model_id_list=["FLUX.1-dev"])
-pipe = FluxImagePipeline.from_model_manager(model_manager)
-
-prompt = "CG, masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait. The girl's flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her."
-
-for tea_cache_l1_thresh in [None, 0.2, 0.4, 0.6, 0.8]:
-    image = pipe(
-        prompt=prompt, embedded_guidance=3.5, seed=0,
-        num_inference_steps=50, tea_cache_l1_thresh=tea_cache_l1_thresh
-    )
-    image.save(f"image_{tea_cache_l1_thresh}.png")
--- a/examples/TeaCache/hunyuanvideo_teacache.py
+++ b/examples/TeaCache/hunyuanvideo_teacache.py
@@ -1,42 +0,0 @@
-import torch
-torch.cuda.set_per_process_memory_fraction(1.0, 0)
-from diffsynth import ModelManager, HunyuanVideoPipeline, download_models, save_video
-
-
-download_models(["HunyuanVideo"])
-model_manager = ModelManager()
-
-# The DiT model is loaded in bfloat16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideo/transformers/mp_rank_00_model_states.pt"
-    ],
-    torch_dtype=torch.bfloat16, # you can use torch_dtype=torch.float8_e4m3fn to enable quantization.
-    device="cpu"
-)
-
-# The other modules are loaded in float16.
-model_manager.load_models(
-    [
-        "models/HunyuanVideo/text_encoder/model.safetensors",
-        "models/HunyuanVideo/text_encoder_2",
-        "models/HunyuanVideo/vae/pytorch_model.pt",
-    ],
-    torch_dtype=torch.float16,
-    device="cpu"
-)
-
-# We support LoRA inference. You can use the following code to load your LoRA model.
-# model_manager.load_lora("models/lora/xxx.safetensors", lora_alpha=1.0)
-
-# The computation device is "cuda".
-pipe = HunyuanVideoPipeline.from_model_manager(
-    model_manager,
-    torch_dtype=torch.bfloat16,
-    device="cuda"
-)
-
-# Enjoy!
-prompt = "CG, masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait. The girl's flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her."
-video = pipe(prompt, seed=0, tea_cache_l1_thresh=0.15)
-save_video(video, "video_girl.mp4", fps=30, quality=6)
--- a/examples/dev_tools/fix_path.py
+++ b/examples/dev_tools/fix_path.py
@@ -0,0 +1,43 @@
+import re, os
+
+
+def read_file(path):
+    with open(path, "r", encoding="utf-8-sig") as f:
+        context = f.read()
+    return context
+
+def get_files(files, path):
+    if os.path.isdir(path):
+        for folder in os.listdir(path):
+            get_files(files, os.path.join(path, folder))
+    elif path.endswith(".md"):
+        files.append(path)
+        
+def fix_path(doc_root_path):
+    files = []
+    get_files(files, doc_root_path)
+    file_map = {}
+    for file in files:
+        name = file.split("/")[-1]
+        file_map[name] = "/" + file
+
+    pattern = re.compile(r'\]\([^)]*\.md')
+    for file in files:
+        context = read_file(file)
+        matches = pattern.findall(context)
+        
+        edited = False
+        for match in matches:
+            target = "](" + file_map[match.split("/")[-1].replace("](", "")]
+            context = context.replace(match, target)
+            if target != match:
+                print(match, target)
+                edited = True
+            print(file, match, target)
+        
+        if edited:
+            with open(file, "w", encoding="utf-8") as f:
+                f.write(context)
+
+fix_path("doc/zh")
+fix_path("doc/en")
--- a/examples/dev_tools/unit_test.py
+++ b/examples/dev_tools/unit_test.py
@@ -0,0 +1,114 @@
+import os, shutil, multiprocessing, time
+NUM_GPUS = 7
+
+
+def script_is_processed(output_path, script):
+    return os.path.exists(os.path.join(output_path, script)) and "log.txt" in os.listdir(os.path.join(output_path, script))
+
+
+def filter_unprocessed_tasks(script_path):
+    tasks = []
+    output_path = os.path.join("data", script_path)
+    for script in sorted(os.listdir(script_path)):
+        if not script.endswith(".sh") and not script.endswith(".py"):
+            continue
+        if script_is_processed(output_path, script):
+            continue
+        tasks.append(script)
+    return tasks
+
+
+def run_inference(script_path):
+    tasks = filter_unprocessed_tasks(script_path)
+    output_path = os.path.join("data", script_path)
+    for script in tasks:
+        source_path = os.path.join(script_path, script)
+        target_path = os.path.join(output_path, script)
+        os.makedirs(target_path, exist_ok=True)
+        cmd = f"python {source_path} > {target_path}/log.txt 2>&1"
+        print(cmd, flush=True)
+        os.system(cmd)
+        for file_name in os.listdir("./"):
+            if file_name.endswith(".jpg") or file_name.endswith(".png") or file_name.endswith(".mp4"):
+                shutil.move(file_name, os.path.join(target_path, file_name))
+
+
+def run_tasks_on_single_GPU(script_path, tasks, gpu_id, num_gpu):
+    output_path = os.path.join("data", script_path)
+    for script_id, script in enumerate(tasks):
+        if script_id % num_gpu != gpu_id:
+            continue
+        source_path = os.path.join(script_path, script)
+        target_path = os.path.join(output_path, script)
+        os.makedirs(target_path, exist_ok=True)
+        if script.endswith(".sh"):
+            cmd = f"CUDA_VISIBLE_DEVICES={gpu_id} bash {source_path} > {target_path}/log.txt 2>&1"
+        elif script.endswith(".py"):
+            cmd = f"CUDA_VISIBLE_DEVICES={gpu_id} python {source_path} > {target_path}/log.txt 2>&1"
+        print(cmd, flush=True)
+        os.system(cmd)
+
+
+def run_train_multi_GPU(script_path):
+    tasks = filter_unprocessed_tasks(script_path)
+    output_path = os.path.join("data", script_path)
+    for script in tasks:
+        source_path = os.path.join(script_path, script)
+        target_path = os.path.join(output_path, script)
+        os.makedirs(target_path, exist_ok=True)
+        cmd = f"bash {source_path} > {target_path}/log.txt 2>&1"
+        print(cmd, flush=True)
+        os.system(cmd)
+        time.sleep(1)
+        
+
+def run_train_single_GPU(script_path):
+    tasks = filter_unprocessed_tasks(script_path)
+    processes = [multiprocessing.Process(target=run_tasks_on_single_GPU, args=(script_path, tasks, i, NUM_GPUS)) for i in range(NUM_GPUS)]
+    for p in processes:
+        p.start()
+    for p in processes:
+        p.join()
+
+
+def move_files(prefix, target_folder):
+    os.makedirs(target_folder, exist_ok=True)
+    os.system(f"cp -r {prefix}* {target_folder}")
+    os.system(f"rm -rf {prefix}*")
+
+
+def test_qwen_image():
+    run_inference("examples/qwen_image/model_inference")
+    run_inference("examples/qwen_image/model_inference_low_vram")
+    run_train_multi_GPU("examples/qwen_image/model_training/full")
+    run_inference("examples/qwen_image/model_training/validate_full")
+    run_train_single_GPU("examples/qwen_image/model_training/lora")
+    run_inference("examples/qwen_image/model_training/validate_lora")
+    
+
+def test_wan():
+    run_train_single_GPU("examples/wanvideo/model_inference")
+    move_files("video_", "data/output/model_inference")
+    run_train_single_GPU("examples/wanvideo/model_inference_low_vram")
+    move_files("video_", "data/output/model_inference_low_vram")
+    run_train_multi_GPU("examples/wanvideo/model_training/full")
+    run_train_single_GPU("examples/wanvideo/model_training/validate_full")
+    move_files("video_", "data/output/validate_full")
+    run_train_single_GPU("examples/wanvideo/model_training/lora")
+    run_train_single_GPU("examples/wanvideo/model_training/validate_lora")
+    move_files("video_", "data/output/validate_lora")
+
+
+def test_flux():
+    run_inference("examples/flux/model_inference")
+    run_inference("examples/flux/model_inference_low_vram")
+    run_train_multi_GPU("examples/flux/model_training/full")
+    run_inference("examples/flux/model_training/validate_full")
+    run_train_single_GPU("examples/flux/model_training/lora")
+    run_inference("examples/flux/model_training/validate_lora")
+
+
+if __name__ == "__main__":
+    test_qwen_image()
+    test_flux()
+    test_wan()
--- a/examples/diffsynth/README.md
+++ b/examples/diffsynth/README.md
@@ -1,7 +0,0 @@
-# DiffSynth
-
-DiffSynth is the initial version of our video synthesis framework. In this framework, you can apply video deflickering algorithms to the latent space of diffusion models. You can refer to the [original repo](https://github.com/alibaba/EasyNLP/tree/master/diffusion/DiffSynth) for more details.
-
-We provide an example for video stylization. In this pipeline, the rendered video is completely different from the original video, thus we need a powerful deflickering algorithm. We use FastBlend to implement the deflickering module. Please see [`sd_video_rerender.py`](./sd_video_rerender.py).
-
-https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/59fb2f7b-8de0-4481-b79f-0c3a7361a1ea
--- a/examples/diffsynth/sd_video_rerender.py
+++ b/examples/diffsynth/sd_video_rerender.py
@@ -1,64 +0,0 @@
-from diffsynth import ModelManager, SDVideoPipeline, ControlNetConfigUnit, VideoData, save_video, download_models
-from diffsynth.processors.FastBlend import FastBlendSmoother
-from diffsynth.processors.PILEditor import ContrastEditor, SharpnessEditor
-from diffsynth.processors.sequencial_processor import SequencialProcessor
-import torch
-
-
-# Download models (automatically)
-# `models/stable_diffusion/dreamshaper_8.safetensors`: [link](https://civitai.com/api/download/models/128713?type=Model&format=SafeTensor&size=pruned&fp=fp16)
-# `models/ControlNet/control_v11f1p_sd15_depth.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth)
-# `models/ControlNet/control_v11p_sd15_softedge.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.pth)
-# `models/Annotators/dpt_hybrid-midas-501f0c75.pt`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/dpt_hybrid-midas-501f0c75.pt)
-# `models/Annotators/ControlNetHED.pth`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/ControlNetHED.pth)
-download_models([
-    "ControlNet_v11f1p_sd15_depth",
-    "ControlNet_v11p_sd15_softedge",
-    "DreamShaper_8"
-])
-
-# Load models
-model_manager = ModelManager(
-    torch_dtype=torch.float16, device="cuda",
-    file_path_list=[
-        "models/stable_diffusion/dreamshaper_8.safetensors",
-        "models/ControlNet/control_v11f1p_sd15_depth.pth",
-        "models/ControlNet/control_v11p_sd15_softedge.pth",
-    ]
-)
-pipe = SDVideoPipeline.from_model_manager(
-    model_manager,
-    [
-        ControlNetConfigUnit(
-            processor_id="depth",
-            model_path=rf"models/ControlNet/control_v11f1p_sd15_depth.pth",
-            scale=0.5
-        ),
-        ControlNetConfigUnit(
-            processor_id="softedge",
-            model_path=rf"models/ControlNet/control_v11p_sd15_softedge.pth",
-            scale=0.5
-        )
-    ]
-)
-smoother = SequencialProcessor([FastBlendSmoother(), ContrastEditor(rate=1.1), SharpnessEditor(rate=1.1)])
-
-# Load video
-# Original video: https://pixabay.com/videos/flow-rocks-water-fluent-stones-159627/
-video = VideoData(video_file="data/examples/pixabay100/159627 (1080p).mp4", height=512, width=768)
-input_video = [video[i] for i in range(128)]
-
-# Rerender
-torch.manual_seed(0)
-output_video = pipe(
-    prompt="winter, ice, snow, water, river",
-    negative_prompt="", cfg_scale=7,
-    input_frames=input_video, controlnet_frames=input_video, num_frames=len(input_video),
-    num_inference_steps=20, height=512, width=768,
-    animatediff_batch_size=8, animatediff_stride=4, unet_batch_size=8,
-    cross_frame_attention=True,
-    smoother=smoother, smoother_progress_ids=[4, 9, 14, 19]
-)
-
-# Save images and video
-save_video(output_video, "output_video.mp4", fps=30)
--- a/examples/flux/README.md
+++ b/examples/flux/README.md
@@ -1,395 +0,0 @@
-# FLUX
-
-[切换到中文](./README_zh.md)
-
-FLUX is a series of image generation models open-sourced by Black-Forest-Labs.
-
-**DiffSynth-Studio has introduced a new inference and training framework. If you need to use the old version, please click [here](https://github.com/modelscope/DiffSynth-Studio/tree/3edf3583b1f08944cee837b94d9f84d669c2729c).**
-
-## Installation
-
-Before using these models, please install DiffSynth-Studio from source code:
-
-```shell
-git clone https://github.com/modelscope/DiffSynth-Studio.git  
-cd DiffSynth-Studio
-pip install -e .
-```
-
-## Quick Start
-
-You can quickly load the [black-forest-labs/FLUX.1-dev](https://www.modelscope.cn/models/black-forest-labs/FLUX.1-dev  ) model and run inference by executing the code below.
-
-```python
-import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
-
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
-    ],
-)
-
-image = pipe(prompt="a cat", seed=0)
-image.save("image.jpg")
-```
-
-## Model Overview
-
-|Model ID|Extra Args|Inference|Low VRAM Inference|Full Training|Validation after Full Training|LoRA Training|Validation after LoRA Training|
-|-|-|-|-|-|-|-|-|
-|[FLUX.1-dev](https://www.modelscope.cn/models/black-forest-labs/FLUX.1-dev)||[code](./model_inference/FLUX.1-dev.py)|[code](./model_inference_low_vram/FLUX.1-dev.py)|[code](./model_training/full/FLUX.1-dev.sh)|[code](./model_training/validate_full/FLUX.1-dev.py)|[code](./model_training/lora/FLUX.1-dev.sh)|[code](./model_training/validate_lora/FLUX.1-dev.py)|
-|[FLUX.1-Krea-dev](https://www.modelscope.cn/models/black-forest-labs/FLUX.1-Krea-dev)||[code](./model_inference/FLUX.1-Krea-dev.py)|[code](./model_inference_low_vram/FLUX.1-Krea-dev.py)|[code](./model_training/full/FLUX.1-Krea-dev.sh)|[code](./model_training/validate_full/FLUX.1-Krea-dev.py)|[code](./model_training/lora/FLUX.1-Krea-dev.sh)|[code](./model_training/validate_lora/FLUX.1-Krea-dev.py)|
-|[FLUX.1-Kontext-dev](https://www.modelscope.cn/models/black-forest-labs/FLUX.1-Kontext-dev)|`kontext_images`|[code](./model_inference/FLUX.1-Kontext-dev.py)|[code](./model_inference_low_vram/FLUX.1-Kontext-dev.py)|[code](./model_training/full/FLUX.1-Kontext-dev.sh)|[code](./model_training/validate_full/FLUX.1-Kontext-dev.py)|[code](./model_training/lora/FLUX.1-Kontext-dev.sh)|[code](./model_training/validate_lora/FLUX.1-Kontext-dev.py)|
-|[FLUX.1-dev-Controlnet-Inpainting-Beta](https://www.modelscope.cn/models/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta)|`controlnet_inputs`|[code](./model_inference/FLUX.1-dev-Controlnet-Inpainting-Beta.py)|[code](./model_inference_low_vram/FLUX.1-dev-Controlnet-Inpainting-Beta.py)|[code](./model_training/full/FLUX.1-dev-Controlnet-Inpainting-Beta.sh)|[code](./model_training/validate_full/FLUX.1-dev-Controlnet-Inpainting-Beta.py)|[code](./model_training/lora/FLUX.1-dev-Controlnet-Inpainting-Beta.sh)|[code](./model_training/validate_lora/FLUX.1-dev-Controlnet-Inpainting-Beta.py)|
-|[FLUX.1-dev-Controlnet-Union-alpha](https://www.modelscope.cn/models/InstantX/FLUX.1-dev-Controlnet-Union-alpha)|`controlnet_inputs`|[code](./model_inference/FLUX.1-dev-Controlnet-Union-alpha.py)|[code](./model_inference_low_vram/FLUX.1-dev-Controlnet-Union-alpha.py)|[code](./model_training/full/FLUX.1-dev-Controlnet-Union-alpha.sh)|[code](./model_training/validate_full/FLUX.1-dev-Controlnet-Union-alpha.py)|[code](./model_training/lora/FLUX.1-dev-Controlnet-Union-alpha.sh)|[code](./model_training/validate_lora/FLUX.1-dev-Controlnet-Union-alpha.py)|
-|[FLUX.1-dev-Controlnet-Upscaler](https://www.modelscope.cn/models/jasperai/Flux.1-dev-Controlnet-Upscaler)|`controlnet_inputs`|[code](./model_inference/FLUX.1-dev-Controlnet-Upscaler.py)|[code](./model_inference_low_vram/FLUX.1-dev-Controlnet-Upscaler.py)|[code](./model_training/full/FLUX.1-dev-Controlnet-Upscaler.sh)|[code](./model_training/validate_full/FLUX.1-dev-Controlnet-Upscaler.py)|[code](./model_training/lora/FLUX.1-dev-Controlnet-Upscaler.sh)|[code](./model_training/validate_lora/FLUX.1-dev-Controlnet-Upscaler.py)|
-|[FLUX.1-dev-IP-Adapter](https://www.modelscope.cn/models/InstantX/FLUX.1-dev-IP-Adapter)|`ipadapter_images`, `ipadapter_scale`|[code](./model_inference/FLUX.1-dev-IP-Adapter.py)|[code](./model_inference_low_vram/FLUX.1-dev-IP-Adapter.py)|[code](./model_training/full/FLUX.1-dev-IP-Adapter.sh)|[code](./model_training/validate_full/FLUX.1-dev-IP-Adapter.py)|[code](./model_training/lora/FLUX.1-dev-IP-Adapter.sh)|[code](./model_training/validate_lora/FLUX.1-dev-IP-Adapter.py)|
-|[FLUX.1-dev-InfiniteYou](https://www.modelscope.cn/models/ByteDance/InfiniteYou)|`infinityou_id_image`, `infinityou_guidance`, `controlnet_inputs`|[code](./model_inference/FLUX.1-dev-InfiniteYou.py)|[code](./model_inference_low_vram/FLUX.1-dev-InfiniteYou.py)|[code](./model_training/full/FLUX.1-dev-InfiniteYou.sh)|[code](./model_training/validate_full/FLUX.1-dev-InfiniteYou.py)|[code](./model_training/lora/FLUX.1-dev-InfiniteYou.sh)|[code](./model_training/validate_lora/FLUX.1-dev-InfiniteYou.py)|
-|[FLUX.1-dev-EliGen](https://www.modelscope.cn/models/DiffSynth-Studio/Eligen)|`eligen_entity_prompts`, `eligen_entity_masks`, `eligen_enable_on_negative`, `eligen_enable_inpaint`|[code](./model_inference/FLUX.1-dev-EliGen.py)|[code](./model_inference_low_vram/FLUX.1-dev-EliGen.py)|-|-|[code](./model_training/lora/FLUX.1-dev-EliGen.sh)|[code](./model_training/validate_lora/FLUX.1-dev-EliGen.py)|
-|[FLUX.1-dev-LoRA-Encoder](https://www.modelscope.cn/models/DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev)|`lora_encoder_inputs`, `lora_encoder_scale`|[code](./model_inference/FLUX.1-dev-LoRA-Encoder.py)|[code](./model_inference_low_vram/FLUX.1-dev-LoRA-Encoder.py)|[code](./model_training/full/FLUX.1-dev-LoRA-Encoder.sh)|[code](./model_training/validate_full/FLUX.1-dev-LoRA-Encoder.py)|-|-|
-|[FLUX.1-dev-LoRA-Fusion-Preview](https://modelscope.cn/models/DiffSynth-Studio/LoRAFusion-preview-FLUX.1-dev)||[code](./model_inference/FLUX.1-dev-LoRA-Fusion.py)|-|-|-|-|-|
-|[Step1X-Edit](https://www.modelscope.cn/models/stepfun-ai/Step1X-Edit)|`step1x_reference_image`|[code](./model_inference/Step1X-Edit.py)|[code](./model_inference_low_vram/Step1X-Edit.py)|[code](./model_training/full/Step1X-Edit.sh)|[code](./model_training/validate_full/Step1X-Edit.py)|[code](./model_training/lora/Step1X-Edit.sh)|[code](./model_training/validate_lora/Step1X-Edit.py)|
-|[FLEX.2-preview](https://www.modelscope.cn/models/ostris/Flex.2-preview)|`flex_inpaint_image`, `flex_inpaint_mask`, `flex_control_image`, `flex_control_strength`, `flex_control_stop`|[code](./model_inference/FLEX.2-preview.py)|[code](./model_inference_low_vram/FLEX.2-preview.py)|[code](./model_training/full/FLEX.2-preview.sh)|[code](./model_training/validate_full/FLEX.2-preview.py)|[code](./model_training/lora/FLEX.2-preview.sh)|[code](./model_training/validate_lora/FLEX.2-preview.py)|
-|[Nexus-Gen](https://www.modelscope.cn/models/DiffSynth-Studio/Nexus-GenV2)|`nexus_gen_reference_image`|[code](./model_inference/Nexus-Gen-Editing.py)|[code](./model_inference_low_vram/Nexus-Gen-Editing.py)|[code](./model_training/full/Nexus-Gen.sh)|[code](./model_training/validate_full/Nexus-Gen.py)|[code](./model_training/lora/Nexus-Gen.sh)|[code](./model_training/validate_lora/Nexus-Gen.py)|
-
-## Model Inference
-
-The following sections will help you understand our features and write inference code.
-
-<details>
-
-<summary>Load Model</summary>
-
-The model is loaded using `from_pretrained`:
-
-```python
-import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
-
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
-    ],
-)
-```
-
-Here, `torch_dtype` and `device` set the computation precision and device. The `model_configs` can be used in different ways to specify model paths:
-
-* Download the model from [ModelScope](https://modelscope.cn/  ) and load it. In this case, fill in `model_id` and `origin_file_pattern`, for example:
-
-```python
-ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors")
-```
-
-* Load the model from a local file path. In this case, fill in `path`, for example:
-
-```python
-ModelConfig(path="models/black-forest-labs/FLUX.1-dev/flux1-dev.safetensors")
-```
-
-For a single model that loads from multiple files, use a list, for example:
-
-```python
-ModelConfig(path=[
-    "models/xxx/diffusion_pytorch_model-00001-of-00003.safetensors",
-    "models/xxx/diffusion_pytorch_model-00002-of-00003.safetensors",
-    "models/xxx/diffusion_pytorch_model-00003-of-00003.safetensors",
-])
-```
-
-The `ModelConfig` method also provides extra arguments to control model loading behavior:
-
-* `local_model_path`: Path to save downloaded models. Default is `"./models"`.
-* `skip_download`: Whether to skip downloading. Default is `False`. If your network cannot access [ModelScope](https://modelscope.cn/  ), download the required files manually and set this to `True`.
-
-</details>
-
-
-<details>
-
-<summary>VRAM Management</summary>
-
-DiffSynth-Studio provides fine-grained VRAM management for the FLUX model. This allows the model to run on devices with low VRAM. You can enable the offload feature using the code below. It moves some modules to CPU memory when GPU memory is limited.
-
-```python
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu"),
-    ],
-)
-pipe.enable_vram_management()
-```
-
-FP8 quantization is also supported:
-
-```python
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_dtype=torch.float8_e4m3fn),
-    ],
-)
-pipe.enable_vram_management()
-```
-
-You can use FP8 quantization and offload at the same time:
-
-```python
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-    ],
-)
-pipe.enable_vram_management()
-```
-
-After enabling VRAM management, the framework will automatically decide the VRAM strategy based on available GPU memory. For most FLUX models, inference can run with as little as 8GB of VRAM. The `enable_vram_management` function has the following parameters to manually control the VRAM strategy:
-
-* `vram_limit`: VRAM usage limit in GB. By default, it uses all free VRAM on the device. Note that this is not an absolute limit. If the set VRAM is not enough but more VRAM is actually available, the model will run with minimal VRAM usage. Setting it to 0 achieves the theoretical minimum VRAM usage.
-* `vram_buffer`: VRAM buffer size in GB. Default is 0.5GB. A buffer is needed because larger neural network layers may use more VRAM than expected during loading. The optimal value is the VRAM used by the largest layer in the model.
-* `num_persistent_param_in_dit`: Number of parameters in the DiT model that stay in VRAM. Default is no limit. We plan to remove this parameter in the future. Do not rely on it.
-
-</details>
-
-
-<details>
-
-<summary>Inference Acceleration</summary>
-
-* TeaCache: Acceleration technique [TeaCache](https://github.com/ali-vilab/TeaCache  ). Please refer to the [example code](./acceleration/teacache.py).
-
-</details>
-
-<details>
-
-<summary>Input Parameters</summary>
-
-The pipeline supports the following input parameters during inference:
-
-* `prompt`: Text prompt describing what should appear in the image.
-* `negative_prompt`: Negative prompt describing what should not appear in the image. Default is `""`.
-* `cfg_scale`: Parameter for classifier-free guidance. Default is 1. Takes effect when set to a value greater than 1.
-* `embedded_guidance`: Built-in guidance parameter for FLUX-dev. Default is 3.5.
-* `t5_sequence_length`: Sequence length of text embeddings from the T5 model. Default is 512.
-* `input_image`: Input image used for image-to-image generation. Used together with `denoising_strength`.
-* `denoising_strength`: Denoising strength, range from 0 to 1. Default is 1. When close to 0, the output image is similar to the input. When close to 1, the output differs more from the input. Do not set it to values other than 1 if `input_image` is not provided.
-* `height`: Image height. Must be a multiple of 16.
-* `width`: Image width. Must be a multiple of 16.
-* `seed`: Random seed. Default is `None`, meaning fully random.
-* `rand_device`: Device for generating random Gaussian noise. Default is `"cpu"`. Setting it to `"cuda"` may lead to different results on different GPUs.
-* `sigma_shift`: Parameter from Rectified Flow theory. Default is 3. A larger value means the model spends more steps at the start of denoising. Increasing this can improve image quality, but may cause differences between generated images and training data due to inconsistency with training.
-* `num_inference_steps`: Number of inference steps. Default is 30.
-* `kontext_images`: Input images for the Kontext model.
-* `controlnet_inputs`: Inputs for the ControlNet model.
-* `ipadapter_images`: Input images for the IP-Adapter model.
-* `ipadapter_scale`: Control strength for the IP-Adapter model.
-* `eligen_entity_prompts`: Local prompts for the EliGen model.
-* `eligen_entity_masks`: Mask regions for local prompts in the EliGen model. Matches one-to-one with `eligen_entity_prompts`.
-* `eligen_enable_on_negative`: Whether to enable EliGen on the negative prompt side. Only works when `cfg_scale > 1`.
-* `eligen_enable_inpaint`: Whether to enable EliGen for local inpainting.
-* `infinityou_id_image`: Face image for the InfiniteYou model.
-* `infinityou_guidance`: Control strength for the InfiniteYou model.
-* `flex_inpaint_image`: Image for FLEX model's inpainting.
-* `flex_inpaint_mask`: Mask region for FLEX model's inpainting.
-* `flex_control_image`: Image for FLEX model's structural control.
-* `flex_control_strength`: Strength for FLEX model's structural control.
-* `flex_control_stop`: End point for FLEX model's structural control. 1 means enabled throughout, 0.5 means enabled in the first half, 0 means disabled.
-* `step1x_reference_image`: Input image for Step1x-Edit model's image editing.
-* `lora_encoder_inputs`: Inputs for LoRA encoder. Can be ModelConfig or local path.
-* `lora_encoder_scale`: Activation strength for LoRA encoder. Default is 1. Smaller values mean weaker LoRA activation.
-* `tea_cache_l1_thresh`: Threshold for TeaCache. Larger values mean faster speed but lower image quality. Note that after enabling TeaCache, inference speed is not uniform, so the remaining time shown in the progress bar will be inaccurate.
-* `tiled`: Whether to enable tiled VAE inference. Default is `False`. Setting to `True` reduces VRAM usage during VAE encoding/decoding, with slight error and slightly longer inference time.
-* `tile_size`: Tile size during VAE encoding/decoding. Default is 128. Only takes effect when `tiled=True`.
-* `tile_stride`: Tile stride during VAE encoding/decoding. Default is 64. Only takes effect when `tiled=True`. Must be less than or equal to `tile_size`.
-* `progress_bar_cmd`: Progress bar display. Default is `tqdm.tqdm`. Set to `lambda x:x` to disable the progress bar.
-
-</details>
-
-
-## Model Training
-
-Training for the FLUX series models is done using a unified script [`./model_training/train.py`](./model_training/train.py).
-
-<details>
-
-<summary>Script Parameters</summary>
-
-The script includes the following parameters:
-
-* Dataset
-  * `--dataset_base_path`: Root path of the dataset.
-  * `--dataset_metadata_path`: Path to the dataset metadata file.
-  * `--max_pixels`: Maximum pixel area. Default is 1024*1024. When dynamic resolution is enabled, any image with resolution higher than this will be downscaled.
-  * `--height`: Height of the image or video. Leave `height` and `width` empty to enable dynamic resolution.
-  * `--width`: Width of the image or video. Leave `height` and `width` empty to enable dynamic resolution.
-  * `--data_file_keys`: Data file keys in the metadata. Separate with commas.
-  * `--dataset_repeat`: Number of times the dataset repeats per epoch.
-  * `--dataset_num_workers`: Number of workers for data loading.
-* Model
-  * `--model_paths`: Paths to load models. In JSON format.
-  * `--model_id_with_origin_paths`: Model ID with original paths, e.g., black-forest-labs/FLUX.1-dev:flux1-dev.safetensors. Separate with commas.
-* Training
-  * `--learning_rate`: Learning rate.
-  * `--weight_decay`: Weight decay.
-  * `--num_epochs`: Number of epochs.
-  * `--output_path`: Save path.
-  * `--remove_prefix_in_ckpt`: Remove prefix in checkpoint.
-  * `--save_steps`: Number of checkpoint saving invervals. If None, checkpoints will be saved every epoch.
-  * `--find_unused_parameters`: Whether to find unused parameters in DDP.
-* Trainable Modules
-  * `--trainable_models`: Models that can be trained, e.g., dit, vae, text_encoder.
-  * `--lora_base_model`: Which model to add LoRA to.
-  * `--lora_target_modules`: Which layers to add LoRA to.
-  * `--lora_rank`: Rank of LoRA.
-  * `--lora_checkpoint`: Path to the LoRA checkpoint. If provided, LoRA will be loaded from this checkpoint.
-* Extra Model Inputs
-  * `--extra_inputs`: Extra model inputs, separated by commas.
-* VRAM Management
-  * `--use_gradient_checkpointing`: Whether to enable gradient checkpointing.
-  * `--use_gradient_checkpointing_offload`: Whether to offload gradient checkpointing to CPU memory.
-  * `--gradient_accumulation_steps`: Number of gradient accumulation steps.
-* Others
-  * `--align_to_opensource_format`: Whether to align the FLUX DiT LoRA format with the open-source version. Only works for LoRA training.
-
-In addition, the training framework is built on [`accelerate`](https://huggingface.co/docs/accelerate/index  ). Run `accelerate config` before training to set GPU-related parameters. For some training scripts (e.g., full model training), we provide suggested `accelerate` config files. You can find them in the corresponding training scripts.
-
-</details>
-
-
-<details>
-
-<summary>Step 1: Prepare Dataset</summary>
-
-A dataset contains a series of files. We suggest organizing your dataset like this:
-
-```
-data/example_image_dataset/
-├── metadata.csv
-├── image1.jpg
-└── image2.jpg
-```
-
-Here, `image1.jpg` and `image2.jpg` are training images, and `metadata.csv` is the metadata list, for example:
-
-```
-image,prompt
-image1.jpg,"a cat is sleeping"
-image2.jpg,"a dog is running"
-```
-
-We have built a sample image dataset to help you test. You can download it with the following command:
-
-```shell
-modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
-```
-
-The dataset supports multiple image formats: `"jpg", "jpeg", "png", "webp"`.
-
-Image size can be controlled by script arguments `--height` and `--width`. When `--height` and `--width` are left empty, dynamic resolution is enabled. The model will train using each image's actual width and height from the dataset.
-
-**We strongly recommend using fixed resolution for training, because there can be load balancing issues in multi-GPU training.**
-
-When the model needs extra inputs, for example, `kontext_images` required by controllable models like [`black-forest-labs/FLUX.1-Kontext-dev`](https://modelscope.cn/models/black-forest-labs/FLUX.1-Kontext-dev  ), add the corresponding column to your dataset, for example:
-
-```
-image,prompt,kontext_images
-image1.jpg,"a cat is sleeping",image1_reference.jpg
-```
-
-If an extra input includes image files, you must specify the column name in the `--data_file_keys` argument. Add column names as needed, for example `--data_file_keys "image,kontext_images"`, and also enable `--extra_inputs "kontext_images"`.
-
-</details>
-
-
-<details>
-
-<summary>Step 2: Load Model</summary>
-
-Similar to model loading during inference, you can configure which models to load directly using model IDs. For example, during inference we load the model with this setting:
-
-```python
-model_configs=[
-    ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
-    ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-    ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
-    ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
-]
-```
-
-Then, during training, use the following parameter to load the same models:
-
-```shell
--model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors"
-```
-
-If you want to load models from local files, for example, during inference:
-
-```python
-model_configs=[
-    ModelConfig(path="models/black-forest-labs/FLUX.1-dev/flux1-dev.safetensors"),
-    ModelConfig(path="models/black-forest-labs/FLUX.1-dev/text_encoder/model.safetensors"),
-    ModelConfig(path="models/black-forest-labs/FLUX.1-dev/text_encoder_2/"),
-    ModelConfig(path="models/black-forest-labs/FLUX.1-dev/ae.safetensors"),
-]
-```
-
-Then during training, set it as:
-
-```shell
--model_paths '[
-    "models/black-forest-labs/FLUX.1-dev/flux1-dev.safetensors",
-    "models/black-forest-labs/FLUX.1-dev/text_encoder/model.safetensors",
-    "models/black-forest-labs/FLUX.1-dev/text_encoder_2/",
-    "models/black-forest-labs/FLUX.1-dev/ae.safetensors"
-]' \
-```
-
-</details>
-
-
-<details>
-
-<summary>Step 3: Set Trainable Modules</summary>
-
-The training framework supports training base models or LoRA models. Here are some examples:
-
-* Full training of the DiT part: `--trainable_models dit`
-* Training a LoRA model on the DiT part: `--lora_base_model dit --lora_target_modules "a_to_qkv,b_to_qkv,ff_a.0,ff_a.2,ff_b.0,ff_b.2,a_to_out,b_to_out,proj_out,norm.linear,norm1_a.linear,norm1_b.linear,to_qkv_mlp" --lora_rank 32`
-
-Also, because the training script loads multiple modules (text encoder, dit, vae), you need to remove prefixes when saving model files. For example, when fully training the DiT part or training a LoRA model on the DiT part, set `--remove_prefix_in_ckpt pipe.dit.`
-
-</details>
-
-
-<details>
-
-<summary>Step 4: Start Training</summary>
-
-We have written training commands for each model. Please refer to the table at the beginning of this document.
-
-</details>
--- a/examples/flux/README_zh.md
+++ b/examples/flux/README_zh.md
@@ -1,396 +0,0 @@
-# FLUX
-
-[Switch to English](./README.md)
-
-FLUX 是由 Black-Forest-Labs 开源的一系列图像生成模型。
-
-**DiffSynth-Studio 启用了新的推理和训练框架，如需使用旧版本，请点击[这里](https://github.com/modelscope/DiffSynth-Studio/tree/3edf3583b1f08944cee837b94d9f84d669c2729c)。**
-
-## 安装
-
-在使用本系列模型之前，请通过源码安装 DiffSynth-Studio。
-
-```shell
-git clone https://github.com/modelscope/DiffSynth-Studio.git
-cd DiffSynth-Studio
-pip install -e .
-```
-
-## 快速开始
-
-通过运行以下代码可以快速加载 [black-forest-labs/FLUX.1-dev](https://www.modelscope.cn/models/black-forest-labs/FLUX.1-dev) 模型并进行推理。
-
-```python
-import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
-
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
-    ],
-)
-
-image = pipe(prompt="a cat", seed=0)
-image.save("image.jpg")
-```
-
-## 模型总览
-
-|模型 ID|额外参数|推理|低显存推理|全量训练|全量训练后验证|LoRA 训练|LoRA 训练后验证|
-|-|-|-|-|-|-|-|-|
-|[FLUX.1-dev](https://www.modelscope.cn/models/black-forest-labs/FLUX.1-dev)||[code](./model_inference/FLUX.1-dev.py)|[code](./model_inference_low_vram/FLUX.1-dev.py)|[code](./model_training/full/FLUX.1-dev.sh)|[code](./model_training/validate_full/FLUX.1-dev.py)|[code](./model_training/lora/FLUX.1-dev.sh)|[code](./model_training/validate_lora/FLUX.1-dev.py)|
-|[FLUX.1-Krea-dev](https://www.modelscope.cn/models/black-forest-labs/FLUX.1-Krea-dev)||[code](./model_inference/FLUX.1-Krea-dev.py)|[code](./model_inference_low_vram/FLUX.1-Krea-dev.py)|[code](./model_training/full/FLUX.1-Krea-dev.sh)|[code](./model_training/validate_full/FLUX.1-Krea-dev.py)|[code](./model_training/lora/FLUX.1-Krea-dev.sh)|[code](./model_training/validate_lora/FLUX.1-Krea-dev.py)|
-|[FLUX.1-Kontext-dev](https://www.modelscope.cn/models/black-forest-labs/FLUX.1-Kontext-dev)|`kontext_images`|[code](./model_inference/FLUX.1-Kontext-dev.py)|[code](./model_inference_low_vram/FLUX.1-Kontext-dev.py)|[code](./model_training/full/FLUX.1-Kontext-dev.sh)|[code](./model_training/validate_full/FLUX.1-Kontext-dev.py)|[code](./model_training/lora/FLUX.1-Kontext-dev.sh)|[code](./model_training/validate_lora/FLUX.1-Kontext-dev.py)|
-|[FLUX.1-dev-Controlnet-Inpainting-Beta](https://www.modelscope.cn/models/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta)|`controlnet_inputs`|[code](./model_inference/FLUX.1-dev-Controlnet-Inpainting-Beta.py)|[code](./model_inference_low_vram/FLUX.1-dev-Controlnet-Inpainting-Beta.py)|[code](./model_training/full/FLUX.1-dev-Controlnet-Inpainting-Beta.sh)|[code](./model_training/validate_full/FLUX.1-dev-Controlnet-Inpainting-Beta.py)|[code](./model_training/lora/FLUX.1-dev-Controlnet-Inpainting-Beta.sh)|[code](./model_training/validate_lora/FLUX.1-dev-Controlnet-Inpainting-Beta.py)|
-|[FLUX.1-dev-Controlnet-Union-alpha](https://www.modelscope.cn/models/InstantX/FLUX.1-dev-Controlnet-Union-alpha)|`controlnet_inputs`|[code](./model_inference/FLUX.1-dev-Controlnet-Union-alpha.py)|[code](./model_inference_low_vram/FLUX.1-dev-Controlnet-Union-alpha.py)|[code](./model_training/full/FLUX.1-dev-Controlnet-Union-alpha.sh)|[code](./model_training/validate_full/FLUX.1-dev-Controlnet-Union-alpha.py)|[code](./model_training/lora/FLUX.1-dev-Controlnet-Union-alpha.sh)|[code](./model_training/validate_lora/FLUX.1-dev-Controlnet-Union-alpha.py)|
-|[FLUX.1-dev-Controlnet-Upscaler](https://www.modelscope.cn/models/jasperai/Flux.1-dev-Controlnet-Upscaler)|`controlnet_inputs`|[code](./model_inference/FLUX.1-dev-Controlnet-Upscaler.py)|[code](./model_inference_low_vram/FLUX.1-dev-Controlnet-Upscaler.py)|[code](./model_training/full/FLUX.1-dev-Controlnet-Upscaler.sh)|[code](./model_training/validate_full/FLUX.1-dev-Controlnet-Upscaler.py)|[code](./model_training/lora/FLUX.1-dev-Controlnet-Upscaler.sh)|[code](./model_training/validate_lora/FLUX.1-dev-Controlnet-Upscaler.py)|
-|[FLUX.1-dev-IP-Adapter](https://www.modelscope.cn/models/InstantX/FLUX.1-dev-IP-Adapter)|`ipadapter_images`, `ipadapter_scale`|[code](./model_inference/FLUX.1-dev-IP-Adapter.py)|[code](./model_inference_low_vram/FLUX.1-dev-IP-Adapter.py)|[code](./model_training/full/FLUX.1-dev-IP-Adapter.sh)|[code](./model_training/validate_full/FLUX.1-dev-IP-Adapter.py)|[code](./model_training/lora/FLUX.1-dev-IP-Adapter.sh)|[code](./model_training/validate_lora/FLUX.1-dev-IP-Adapter.py)|
-|[FLUX.1-dev-InfiniteYou](https://www.modelscope.cn/models/ByteDance/InfiniteYou)|`infinityou_id_image`, `infinityou_guidance`, `controlnet_inputs`|[code](./model_inference/FLUX.1-dev-InfiniteYou.py)|[code](./model_inference_low_vram/FLUX.1-dev-InfiniteYou.py)|[code](./model_training/full/FLUX.1-dev-InfiniteYou.sh)|[code](./model_training/validate_full/FLUX.1-dev-InfiniteYou.py)|[code](./model_training/lora/FLUX.1-dev-InfiniteYou.sh)|[code](./model_training/validate_lora/FLUX.1-dev-InfiniteYou.py)|
-|[FLUX.1-dev-EliGen](https://www.modelscope.cn/models/DiffSynth-Studio/Eligen)|`eligen_entity_prompts`, `eligen_entity_masks`, `eligen_enable_on_negative`, `eligen_enable_inpaint`|[code](./model_inference/FLUX.1-dev-EliGen.py)|[code](./model_inference_low_vram/FLUX.1-dev-EliGen.py)|-|-|[code](./model_training/lora/FLUX.1-dev-EliGen.sh)|[code](./model_training/validate_lora/FLUX.1-dev-EliGen.py)|
-|[FLUX.1-dev-LoRA-Encoder](https://www.modelscope.cn/models/DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev)|`lora_encoder_inputs`, `lora_encoder_scale`|[code](./model_inference/FLUX.1-dev-LoRA-Encoder.py)|[code](./model_inference_low_vram/FLUX.1-dev-LoRA-Encoder.py)|[code](./model_training/full/FLUX.1-dev-LoRA-Encoder.sh)|[code](./model_training/validate_full/FLUX.1-dev-LoRA-Encoder.py)|-|-|
-|[FLUX.1-dev-LoRA-Fusion-Preview](https://modelscope.cn/models/DiffSynth-Studio/LoRAFusion-preview-FLUX.1-dev)||[code](./model_inference/FLUX.1-dev-LoRA-Fusion.py)|-|-|-|-|-|
-|[Step1X-Edit](https://www.modelscope.cn/models/stepfun-ai/Step1X-Edit)|`step1x_reference_image`|[code](./model_inference/Step1X-Edit.py)|[code](./model_inference_low_vram/Step1X-Edit.py)|[code](./model_training/full/Step1X-Edit.sh)|[code](./model_training/validate_full/Step1X-Edit.py)|[code](./model_training/lora/Step1X-Edit.sh)|[code](./model_training/validate_lora/Step1X-Edit.py)|
-|[FLEX.2-preview](https://www.modelscope.cn/models/ostris/Flex.2-preview)|`flex_inpaint_image`, `flex_inpaint_mask`, `flex_control_image`, `flex_control_strength`, `flex_control_stop`|[code](./model_inference/FLEX.2-preview.py)|[code](./model_inference_low_vram/FLEX.2-preview.py)|[code](./model_training/full/FLEX.2-preview.sh)|[code](./model_training/validate_full/FLEX.2-preview.py)|[code](./model_training/lora/FLEX.2-preview.sh)|[code](./model_training/validate_lora/FLEX.2-preview.py)|
-|[Nexus-Gen](https://www.modelscope.cn/models/DiffSynth-Studio/Nexus-GenV2)|`nexus_gen_reference_image`|[code](./model_inference/Nexus-Gen-Editing.py)|[code](./model_inference_low_vram/Nexus-Gen-Editing.py)|[code](./model_training/full/Nexus-Gen.sh)|[code](./model_training/validate_full/Nexus-Gen.py)|[code](./model_training/lora/Nexus-Gen.sh)|[code](./model_training/validate_lora/Nexus-Gen.py)|
-
-## 模型推理
-
-以下部分将会帮助您理解我们的功能并编写推理代码。
-
-<details>
-
-<summary>加载模型</summary>
-
-模型通过 `from_pretrained` 加载：
-
-```python
-import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
-
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
-    ],
-)
-```
-
-其中 `torch_dtype` 和 `device` 是计算精度和计算设备。`model_configs` 可通过多种方式配置模型路径：
-
-* 从[魔搭社区](https://modelscope.cn/)下载模型并加载。此时需要填写 `model_id` 和 `origin_file_pattern`，例如
-
-```python
-ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors")
-```
-
-* 从本地文件路径加载模型。此时需要填写 `path`，例如
-
-```python
-ModelConfig(path="models/black-forest-labs/FLUX.1-dev/flux1-dev.safetensors")
-```
-
-对于从多个文件加载的单一模型，使用列表即可，例如
-
-```python
-ModelConfig(path=[
-    "models/xxx/diffusion_pytorch_model-00001-of-00003.safetensors",
-    "models/xxx/diffusion_pytorch_model-00002-of-00003.safetensors",
-    "models/xxx/diffusion_pytorch_model-00003-of-00003.safetensors",
-])
-```
-
-`ModelConfig` 还提供了额外的参数用于控制模型加载时的行为：
-
-* `local_model_path`: 用于保存下载模型的路径，默认值为 `"./models"`。
-* `skip_download`: 是否跳过下载，默认值为 `False`。当您的网络无法访问[魔搭社区](https://modelscope.cn/)时，请手动下载必要的文件，并将其设置为 `True`。
-
-</details>
-
-
-<details>
-
-<summary>显存管理</summary>
-
-DiffSynth-Studio 为 FLUX 模型提供了细粒度的显存管理，让模型能够在低显存设备上进行推理，可通过以下代码开启 offload 功能，在显存有限的设备上将部分模块 offload 到内存中。
-
-```python
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu"),
-    ],
-)
-pipe.enable_vram_management()
-```
-
-FP8 量化功能也是支持的：
-
-```python
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_dtype=torch.float8_e4m3fn),
-    ],
-)
-pipe.enable_vram_management()
-```
-
-FP8 量化和 offload 可同时开启：
-
-```python
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-    ],
-)
-pipe.enable_vram_management()
-```
-
-开启显存管理后，框架会自动根据设备上的剩余显存确定显存管理策略。对于大多数 FLUX 系列模型，最低 8GB 显存即可进行推理。`enable_vram_management` 函数提供了以下参数，用于手动控制显存管理策略：
-
-* `vram_limit`: 显存占用量限制（GB），默认占用设备上的剩余显存。注意这不是一个绝对限制，当设置的显存不足以支持模型进行推理，但实际可用显存足够时，将会以最小化显存占用的形式进行推理。将其设置为0时，将会实现理论最小显存占用。
-* `vram_buffer`: 显存缓冲区大小（GB），默认为 0.5GB。由于部分较大的神经网络层在 onload 阶段会不可控地占用更多显存，因此一个显存缓冲区是必要的，理论上的最优值为模型中最大的层所占的显存。
-* `num_persistent_param_in_dit`: DiT 模型中常驻显存的参数数量（个），默认为无限制。我们将会在未来删除这个参数，请不要依赖这个参数。
-
-</details>
-
-
-<details>
-
-<summary>推理加速</summary>
-
-* TeaCache：加速技术 [TeaCache](https://github.com/ali-vilab/TeaCache)，请参考[示例代码](./acceleration/teacache.py)。
-
-</details>
-
-<details>
-
-<summary>输入参数</summary>
-
-Pipeline 在推理阶段能够接收以下输入参数：
-
-* `prompt`: 提示词，描述画面中出现的内容。
-* `negative_prompt`: 负向提示词，描述画面中不应该出现的内容，默认值为 `""`。
-* `cfg_scale`: Classifier-free guidance 的参数，默认值为 1，当设置为大于1的数值时生效。
-* `embedded_guidance`: FLUX-dev 的内嵌引导参数，默认值为 3.5。
-* `t5_sequence_length`: T5 模型的文本向量序列长度，默认值为 512。
-* `input_image`: 输入图像，用于图生图，该参数与 `denoising_strength` 配合使用。
-* `denoising_strength`: 去噪强度，范围是 0～1，默认值为 1，当数值接近 0 时，生成图像与输入图像相似；当数值接近 1 时，生成图像与输入图像相差更大。在不输入 `input_image` 参数时，请不要将其设置为非 1 的数值。
-* `height`: 图像高度，需保证高度为 16 的倍数。
-* `width`: 图像宽度，需保证宽度为 16 的倍数。
-* `seed`: 随机种子。默认为 `None`，即完全随机。
-* `rand_device`: 生成随机高斯噪声矩阵的计算设备，默认为 `"cpu"`。当设置为 `cuda` 时，在不同 GPU 上会导致不同的生成结果。
-* `sigma_shift`: Rectified Flow 理论中的参数，默认为 3。数值越大，模型在去噪的开始阶段停留的步骤数越多，可适当调大这个参数来提高画面质量，但会因生成过程与训练过程不一致导致生成的图像内容与训练数据存在差异。
-* `num_inference_steps`: 推理次数，默认值为 30。
-* `kontext_images`: Kontext 模型的输入图像。
-* `controlnet_inputs`: ControlNet 模型的输入。
-* `ipadapter_images`: IP-Adapter 模型的输入图像。
-* `ipadapter_scale`: IP-Adapter 模型的控制强度。
-* `eligen_entity_prompts`: EliGen 模型的图像局部提示词。
-* `eligen_entity_masks`: EliGen 模型的局部提示词控制区域，与 `eligen_entity_prompts` 一一对应。
-* `eligen_enable_on_negative`: 是否在负向提示词一侧启用 EliGen，仅在 `cfg_scale > 1` 时生效。
-* `eligen_enable_inpaint`: 是否启用 EliGen 局部重绘。
-* `infinityou_id_image`: InfiniteYou 模型的人脸图像。
-* `infinityou_guidance`: InfiniteYou 模型的控制强度。
-* `flex_inpaint_image`: FLEX 模型用于局部重绘的图像。
-* `flex_inpaint_mask`: FLEX 模型用于局部重绘的区域。
-* `flex_control_image`: FLEX 模型用于结构控制的图像。
-* `flex_control_strength`: FLEX 模型用于结构控制的强度。
-* `flex_control_stop`: FLEX 模型结构控制的结束点，1表示全程启用，0.5表示在前半段启用，0表示不启用。
-* `step1x_reference_image`: Step1x-Edit 模型用于图像编辑的输入图像。
-* `lora_encoder_inputs`: LoRA 编码器的输入，格式为 ModelConfig 或本地路径。
-* `lora_encoder_scale`: LoRA 编码器的激活强度，默认值为1，数值越小，LoRA 激活越弱。
-* `tea_cache_l1_thresh`: TeaCache 的阈值，数值越大，速度越快，画面质量越差。请注意，开启 TeaCache 后推理速度并非均匀，因此进度条上显示的剩余时间将会变得不准确。
-* `tiled`: 是否启用 VAE 分块推理，默认为 `False`。设置为 `True` 时可显著减少 VAE 编解码阶段的显存占用，会产生少许误差，以及少量推理时间延长。
-* `tile_size`: VAE 编解码阶段的分块大小，默认为 128，仅在 `tiled=True` 时生效。
-* `tile_stride`: VAE 编解码阶段的分块步长，默认为 64，仅在 `tiled=True` 时生效，需保证其数值小于或等于 `tile_size`。
-* `progress_bar_cmd`: 进度条，默认为 `tqdm.tqdm`。可通过设置为 `lambda x:x` 来屏蔽进度条。
-
-</details>
-
-
-## 模型训练
-
-FLUX 系列模型训练通过统一的 [`./model_training/train.py`](./model_training/train.py) 脚本进行。
-
-<details>
-
-<summary>脚本参数</summary>
-
-脚本包含以下参数：
-
-* 数据集
-  * `--dataset_base_path`: 数据集的根路径。
-  * `--dataset_metadata_path`: 数据集的元数据文件路径。
-  * `--max_pixels`: 最大像素面积，默认为 1024*1024，当启用动态分辨率时，任何分辨率大于这个数值的图片都会被缩小。
-  * `--height`: 图像或视频的高度。将 `height` 和 `width` 留空以启用动态分辨率。
-  * `--width`: 图像或视频的宽度。将 `height` 和 `width` 留空以启用动态分辨率。
-  * `--data_file_keys`: 元数据中的数据文件键。用逗号分隔。
-  * `--dataset_repeat`: 每个 epoch 中数据集重复的次数。
-  * `--dataset_num_workers`: 每个 Dataloder 的进程数量。
-* 模型
-  * `--model_paths`: 要加载的模型路径。JSON 格式。
-  * `--model_id_with_origin_paths`: 带原始路径的模型 ID，例如 black-forest-labs/FLUX.1-dev:flux1-dev.safetensors。用逗号分隔。
-* 训练
-  * `--learning_rate`: 学习率。
-  * `--weight_decay`：权重衰减大小。
-  * `--num_epochs`: 轮数（Epoch）。
-  * `--output_path`: 保存路径。
-  * `--remove_prefix_in_ckpt`: 在 ckpt 中移除前缀。
-  * `--save_steps`: 保存模型的间隔 step 数量，如果设置为 None ，则每个 epoch 保存一次
-  * `--find_unused_parameters`: DDP 训练中是否存在未使用的参数
-* 可训练模块
-  * `--trainable_models`: 可训练的模型，例如 dit、vae、text_encoder。
-  * `--lora_base_model`: LoRA 添加到哪个模型上。
-  * `--lora_target_modules`: LoRA 添加到哪一层上。
-  * `--lora_rank`: LoRA 的秩（Rank）。
-  * `--lora_checkpoint`: LoRA 检查点的路径。如果提供此路径，LoRA 将从此检查点加载。
-* 额外模型输入
-  * `--extra_inputs`: 额外的模型输入，以逗号分隔。
-* 显存管理
-  * `--use_gradient_checkpointing`: 是否启用 gradient checkpointing。
-  * `--use_gradient_checkpointing_offload`: 是否将 gradient checkpointing 卸载到内存中。
-  * `--gradient_accumulation_steps`: 梯度累积步数。
-* 其他
-  * `--align_to_opensource_format`: 是否将 FLUX DiT LoRA 的格式与开源版本对齐，仅对 LoRA 训练生效。
-
-
-此外，训练框架基于 [`accelerate`](https://huggingface.co/docs/accelerate/index) 构建，在开始训练前运行 `accelerate config` 可配置 GPU 的相关参数。对于部分模型训练（例如模型的全量训练）脚本，我们提供了建议的 `accelerate` 配置文件，可在对应的训练脚本中查看。
-
-</details>
-
-
-<details>
-
-<summary>Step 1: 准备数据集</summary>
-
-数据集包含一系列文件，我们建议您这样组织数据集文件：
-
-```
-data/example_image_dataset/
-├── metadata.csv
-├── image1.jpg
-└── image2.jpg
-```
-
-其中 `image1.jpg`、`image2.jpg` 为训练用图像数据，`metadata.csv` 为元数据列表，例如
-
-```
-image,prompt
-image1.jpg,"a cat is sleeping"
-image2.jpg,"a dog is running"
-```
-
-我们构建了一个样例图像数据集，以方便您进行测试，通过以下命令可以下载这个数据集：
-
-```shell
-modelscope download --dataset DiffSynth-Studio/example_image_dataset --local_dir ./data/example_image_dataset
-```
-
-数据集支持多种图片格式，`"jpg", "jpeg", "png", "webp"`。
-
-图片的尺寸可通过脚本参数 `--height`、`--width` 控制。当 `--height` 和 `--width` 为空时将会开启动态分辨率，按照数据集中每个图像的实际宽高训练。
-
-**我们强烈建议使用固定分辨率训练，因为在多卡训练中存在负载均衡问题。**
-
-当模型需要额外输入时，例如具备控制能力的模型 [`black-forest-labs/FLUX.1-Kontext-dev`](https://modelscope.cn/models/black-forest-labs/FLUX.1-Kontext-dev) 所需的 `kontext_images`，请在数据集中补充相应的列，例如：
-
-```
-image,prompt,kontext_images
-image1.jpg,"a cat is sleeping",image1_reference.jpg
-```
-
-额外输入若包含图像文件，则需要在 `--data_file_keys` 参数中指定要解析的列名。可根据额外输入增加相应的列名，例如 `--data_file_keys "image,kontext_images"`，同时启用 `--extra_inputs "kontext_images"`。
-
-</details>
-
-
-<details>
-
-<summary>Step 2: 加载模型</summary>
-
-类似于推理时的模型加载逻辑，可直接通过模型 ID 配置要加载的模型。例如，推理时我们通过以下设置加载模型
-
-```python
-model_configs=[
-    ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
-    ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-    ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
-    ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
-]
-```
-
-那么在训练时，填入以下参数即可加载对应的模型。
-
-```shell
--model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors"
-```
-
-如果您希望从本地文件加载模型，例如推理时
-
-```python
-model_configs=[
-    ModelConfig(path="models/black-forest-labs/FLUX.1-dev/flux1-dev.safetensors"),
-    ModelConfig(path="models/black-forest-labs/FLUX.1-dev/text_encoder/model.safetensors"),
-    ModelConfig(path="models/black-forest-labs/FLUX.1-dev/text_encoder_2/"),
-    ModelConfig(path="models/black-forest-labs/FLUX.1-dev/ae.safetensors"),
-]
-```
-
-那么训练时需设置为
-
-```shell
--model_paths '[
-    "models/black-forest-labs/FLUX.1-dev/flux1-dev.safetensors",
-    "models/black-forest-labs/FLUX.1-dev/text_encoder/model.safetensors",
-    "models/black-forest-labs/FLUX.1-dev/text_encoder_2/",
-    "models/black-forest-labs/FLUX.1-dev/ae.safetensors"
-]' \
-```
-
-</details>
-
-
-<details>
-
-<summary>Step 3: 设置可训练模块</summary>
-
-训练框架支持训练基础模型，或 LoRA 模型。以下是几个例子：
-
-* 全量训练 DiT 部分：`--trainable_models dit`
-* 训练 DiT 部分的 LoRA 模型：`--lora_base_model dit --lora_target_modules "a_to_qkv,b_to_qkv,ff_a.0,ff_a.2,ff_b.0,ff_b.2,a_to_out,b_to_out,proj_out,norm.linear,norm1_a.linear,norm1_b.linear,to_qkv_mlp" --lora_rank 32`
-
-此外，由于训练脚本中加载了多个模块（text encoder、dit、vae），保存模型文件时需要移除前缀，例如在全量训练 DiT 部分或者训练 DiT 部分的 LoRA 模型时，请设置 `--remove_prefix_in_ckpt pipe.dit.`
-
-</details>
-
-
-<details>
-
-<summary>Step 4: 启动训练程序</summary>
-
-我们为每一个模型编写了训练命令，请参考本文档开头的表格。
-
-</details>
--- a/examples/flux/acceleration/teacache.py
+++ b/examples/flux/acceleration/teacache.py
@@ -1,24 +0,0 @@
-import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
-
-
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
-    ],
-)
-
-
-prompt = "CG, masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait. The girl's flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her."
-
-for tea_cache_l1_thresh in [None, 0.2, 0.4, 0.6, 0.8]:
-    image = pipe(
-        prompt=prompt, embedded_guidance=3.5, seed=0,
-        num_inference_steps=50, tea_cache_l1_thresh=tea_cache_l1_thresh
-    )
-    image.save(f"image_{tea_cache_l1_thresh}.png")
--- a/examples/flux/model_inference/FLEX.2-preview.py
+++ b/examples/flux/model_inference/FLEX.2-preview.py
@@ -1,6 +1,6 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
-from diffsynth.controlnets.processors import Annotator
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
+from diffsynth.utils.controlnet import Annotator
 import numpy as np
 from PIL import Image

@@ -11,7 +11,7 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="ostris/Flex.2-preview", origin_file_pattern="Flex.2-preview.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
    ],
 )
@@ -21,12 +21,12 @@ image = pipe(
    num_inference_steps=50, embedded_guidance=3.5,
    seed=0
 )
-image.save(f"image_1.jpg")
+image.save("image_1.jpg")

 mask = np.zeros((1024, 1024, 3), dtype=np.uint8)
 mask[200:400, 400:700] = 255
 mask = Image.fromarray(mask)
-mask.save(f"image_mask.jpg")
+mask.save("image_mask.jpg")

 inpaint_image = image

@@ -36,7 +36,7 @@ image = pipe(
    flex_inpaint_image=inpaint_image, flex_inpaint_mask=mask,
    seed=4
 )
-image.save(f"image_2_new.jpg")
+image.save("image_2.jpg")

 control_image = Annotator("canny")(image)
 control_image.save("image_control.jpg")
@@ -47,4 +47,4 @@ image = pipe(
    flex_control_image=control_image,
    seed=4
 )
-image.save(f"image_3_new.jpg")
+image.save("image_3.jpg")
--- a/examples/flux/model_inference/FLUX.1-Kontext-dev.py
+++ b/examples/flux/model_inference/FLUX.1-Kontext-dev.py
@@ -1,5 +1,5 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
 from PIL import Image


@@ -9,7 +9,7 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-Kontext-dev", origin_file_pattern="flux1-kontext-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
    ],
 )
--- a/examples/flux/model_inference/FLUX.1-Krea-dev.py
+++ b/examples/flux/model_inference/FLUX.1-Krea-dev.py
@@ -1,5 +1,5 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


 pipe = FluxImagePipeline.from_pretrained(
@@ -8,7 +8,7 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-Krea-dev", origin_file_pattern="flux1-krea-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
    ],
 )
--- a/examples/flux/model_inference/FLUX.1-dev-AttriCtrl.py
+++ b/examples/flux/model_inference/FLUX.1-dev-AttriCtrl.py
@@ -1,5 +1,5 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


 pipe = FluxImagePipeline.from_pretrained(
@@ -8,7 +8,7 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
        ModelConfig(model_id="DiffSynth-Studio/AttriCtrl-FLUX.1-Dev", origin_file_pattern="models/brightness.safetensors")
    ],
--- a/examples/flux/model_inference/FLUX.1-dev-Controlnet-Inpainting-Beta.py
+++ b/examples/flux/model_inference/FLUX.1-dev-Controlnet-Inpainting-Beta.py
@@ -1,5 +1,5 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig, ControlNetInput
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig, ControlNetInput
 import numpy as np
 from PIL import Image

@@ -10,7 +10,7 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
        ModelConfig(model_id="alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta", origin_file_pattern="diffusion_pytorch_model.safetensors"),
    ],
--- a/examples/flux/model_inference/FLUX.1-dev-Controlnet-Union-alpha.py
+++ b/examples/flux/model_inference/FLUX.1-dev-Controlnet-Union-alpha.py
@@ -1,18 +1,18 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig, ControlNetInput
-from diffsynth.controlnets.processors import Annotator
-from diffsynth import download_models
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig, ControlNetInput
+from diffsynth.utils.controlnet import Annotator
+from modelscope import snapshot_download



-download_models(["Annotators:Depth"])
+snapshot_download("sd_lora/Annotators", allow_file_pattern="dpt_hybrid-midas-501f0c75.pt", local_dir="models/Annotators")
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
        ModelConfig(model_id="InstantX/FLUX.1-dev-Controlnet-Union-alpha", origin_file_pattern="diffusion_pytorch_model.safetensors"),
    ],
--- a/examples/flux/model_inference/FLUX.1-dev-Controlnet-Upscaler.py
+++ b/examples/flux/model_inference/FLUX.1-dev-Controlnet-Upscaler.py
@@ -1,5 +1,5 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig, ControlNetInput
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig, ControlNetInput


 pipe = FluxImagePipeline.from_pretrained(
@@ -8,7 +8,7 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
        ModelConfig(model_id="jasperai/Flux.1-dev-Controlnet-Upscaler", origin_file_pattern="diffusion_pytorch_model.safetensors"),
    ],
--- a/examples/flux/model_inference/FLUX.1-dev-EliGen.py
+++ b/examples/flux/model_inference/FLUX.1-dev-EliGen.py
@@ -1,8 +1,7 @@
 import random
 import torch
 from PIL import Image, ImageDraw, ImageFont
-from diffsynth import download_customized_models
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
 from modelscope import dataset_snapshot_download


@@ -91,24 +90,11 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
    ],
 )
-
-download_from_modelscope = True
-if download_from_modelscope:
-    model_id = "DiffSynth-Studio/Eligen"
-    downloading_priority = ["ModelScope"]
-else:
-    model_id = "modelscope/EliGen"
-    downloading_priority = ["HuggingFace"]
-EliGen_path = download_customized_models(
-    model_id=model_id,
-    origin_file_path="model_bf16.safetensors",
-    local_dir="models/lora/entity_control",
-    downloading_priority=downloading_priority)[0]
-pipe.load_lora(pipe.dit, EliGen_path, alpha=1)
+pipe.load_lora(pipe.dit, ModelConfig(model_id="DiffSynth-Studio/Eligen", origin_file_pattern="model_bf16.safetensors"), alpha=1)

 # example 1
 global_prompt = "A breathtaking beauty of Raja Ampat by the late-night moonlight , one beautiful woman from behind wearing a pale blue long dress with soft glow, sitting at the top of a cliff looking towards the beach,pastell light colors, a group of small distant birds flying in far sky, a boat sailing on the sea, best quality, realistic, whimsical, fantastic, splash art, intricate detailed, hyperdetailed, maximalist style, photorealistic, concept art, sharp focus, harmony, serenity, tranquility, soft pastell colors,ambient occlusion, cozy ambient lighting, masterpiece, liiv1, linquivera, metix, mentixis, masterpiece, award winning, view from above\n"
--- a/examples/flux/model_inference/FLUX.1-dev-IP-Adapter.py
+++ b/examples/flux/model_inference/FLUX.1-dev-IP-Adapter.py
@@ -1,5 +1,5 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


 pipe = FluxImagePipeline.from_pretrained(
@@ -8,10 +8,10 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
        ModelConfig(model_id="InstantX/FLUX.1-dev-IP-Adapter", origin_file_pattern="ip-adapter.bin"),
-        ModelConfig(model_id="google/siglip-so400m-patch14-384"),
+        ModelConfig(model_id="google/siglip-so400m-patch14-384", origin_file_pattern="model.safetensors"),
    ],
 )

--- a/examples/flux/model_inference/FLUX.1-dev-InfiniteYou.py
+++ b/examples/flux/model_inference/FLUX.1-dev-InfiniteYou.py
@@ -1,11 +1,13 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig, ControlNetInput
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig, ControlNetInput
 from modelscope import dataset_snapshot_download
 from modelscope import snapshot_download
 from PIL import Image
 import numpy as np

-
+# This model has additional requirements.
+# Please install the following packages.
+# pip install facexlib insightface onnxruntime
 snapshot_download(
    "ByteDance/InfiniteYou",
    allow_file_pattern="supports/insightface/models/antelopev2/*",
@@ -17,7 +19,7 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
        ModelConfig(model_id="ByteDance/InfiniteYou", origin_file_pattern="infu_flux_v1.0/aes_stage2/image_proj_model.bin"),
        ModelConfig(model_id="ByteDance/InfiniteYou", origin_file_pattern="infu_flux_v1.0/aes_stage2/InfuseNetModel/*.safetensors"),
--- a/examples/flux/model_inference/FLUX.1-dev-LoRA-Encoder.py
+++ b/examples/flux/model_inference/FLUX.1-dev-LoRA-Encoder.py
@@ -1,5 +1,5 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


 pipe = FluxImagePipeline.from_pretrained(
@@ -8,15 +8,13 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
        ModelConfig(model_id="DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev", origin_file_pattern="model.safetensors"),
    ],
 )
-pipe.enable_lora_magic()
-
 lora = ModelConfig(model_id="VoidOc/flux_animal_forest1", origin_file_pattern="20.safetensors")
-pipe.load_lora(pipe.dit, lora, hotload=True) # Use `pipe.clear_lora()` to drop the loaded LoRA.
+pipe.load_lora(pipe.dit, lora) # Use `pipe.clear_lora()` to drop the loaded LoRA.

 # Empty prompt can automatically activate LoRA capabilities.
 image = pipe(prompt="", seed=0, lora_encoder_inputs=lora)
--- a/examples/flux/model_inference/FLUX.1-dev-LoRA-Fusion.py
+++ b/examples/flux/model_inference/FLUX.1-dev-LoRA-Fusion.py
@@ -1,29 +1,38 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig

-        
+
+vram_config = {
+    # Enable lora hotloading
+    "offload_dtype": torch.bfloat16,
+    "offload_device": "cuda",
+    "onload_dtype": torch.bfloat16,
+    "onload_device": "cuda",
+    "preparing_dtype": torch.bfloat16,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
        ModelConfig(model_id="DiffSynth-Studio/LoRAFusion-preview-FLUX.1-dev", origin_file_pattern="model.safetensors"),
    ],
 )
-pipe.enable_lora_magic()
+pipe.enable_lora_merger()

 pipe.load_lora(
    pipe.dit,
    ModelConfig(model_id="cancel13/cxsk", origin_file_pattern="30.safetensors"),
-    hotload=True,
 )
 pipe.load_lora(
    pipe.dit,
    ModelConfig(model_id="DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1", origin_file_pattern="merged_lora.safetensors"),
-    hotload=True,
 )
 image = pipe(prompt="a cat", seed=0)
 image.save("image_fused.jpg")
--- a/examples/flux/model_inference/FLUX.1-dev.py
+++ b/examples/flux/model_inference/FLUX.1-dev.py
@@ -1,5 +1,5 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


 pipe = FluxImagePipeline.from_pretrained(
@@ -8,7 +8,7 @@ pipe = FluxImagePipeline.from_pretrained(
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
    ],
 )
--- a/examples/flux/model_inference/Nexus-Gen-Editing.py
+++ b/examples/flux/model_inference/Nexus-Gen-Editing.py
@@ -1,7 +1,7 @@
 import importlib
 import torch
 from PIL import Image
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
 from modelscope import dataset_snapshot_download


@@ -19,7 +19,7 @@ pipe = FluxImagePipeline.from_pretrained(
        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="model*.safetensors"),
        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="edit_decoder.bin"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
    ],
    nexus_gen_processor_config=ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="processor/"),
--- a/examples/flux/model_inference/Nexus-Gen-Generation.py
+++ b/examples/flux/model_inference/Nexus-Gen-Generation.py
@@ -1,6 +1,6 @@
 import importlib
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


 if importlib.util.find_spec("transformers") is None:
@@ -17,7 +17,7 @@ pipe = FluxImagePipeline.from_pretrained(
        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="model*.safetensors"),
        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="generation_decoder.bin"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
    ],
    nexus_gen_processor_config=ModelConfig("DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="processor"),
--- a/examples/flux/model_inference/Step1X-Edit.py
+++ b/examples/flux/model_inference/Step1X-Edit.py
@@ -1,5 +1,5 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
 from PIL import Image
 import numpy as np

@@ -8,7 +8,7 @@ pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="Qwen/Qwen2.5-VL-7B-Instruct"),
+        ModelConfig(model_id="Qwen/Qwen2.5-VL-7B-Instruct", origin_file_pattern="model-*.safetensors"),
        ModelConfig(model_id="stepfun-ai/Step1X-Edit", origin_file_pattern="step1x-edit-i1258.safetensors"),
        ModelConfig(model_id="stepfun-ai/Step1X-Edit", origin_file_pattern="vae.safetensors"),
    ],
--- a/examples/flux/model_inference_low_vram/FLEX.2-preview.py
+++ b/examples/flux/model_inference_low_vram/FLEX.2-preview.py
@@ -1,33 +1,43 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
-from diffsynth.controlnets.processors import Annotator
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
+from diffsynth.utils.controlnet import Annotator
 import numpy as np
 from PIL import Image


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="ostris/Flex.2-preview", origin_file_pattern="Flex.2-preview.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="ostris/Flex.2-preview", origin_file_pattern="Flex.2-preview.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 image = pipe(
    prompt="portrait of a beautiful Asian girl, long hair, red t-shirt, sunshine, beach",
    num_inference_steps=50, embedded_guidance=3.5,
    seed=0
 )
-image.save(f"image_1.jpg")
+image.save("image_1.jpg")

 mask = np.zeros((1024, 1024, 3), dtype=np.uint8)
 mask[200:400, 400:700] = 255
 mask = Image.fromarray(mask)
-mask.save(f"image_mask.jpg")
+mask.save("image_mask.jpg")

 inpaint_image = image

@@ -37,7 +47,7 @@ image = pipe(
    flex_inpaint_image=inpaint_image, flex_inpaint_mask=mask,
    seed=4
 )
-image.save(f"image_2_new.jpg")
+image.save("image_2.jpg")

 control_image = Annotator("canny")(image)
 control_image.save("image_control.jpg")
@@ -48,4 +58,4 @@ image = pipe(
    flex_control_image=control_image,
    seed=4
 )
-image.save(f"image_3_new.jpg")
+image.save("image_3.jpg")
--- a/examples/flux/model_inference_low_vram/FLUX.1-Kontext-dev.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-Kontext-dev.py
@@ -1,19 +1,29 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
 from PIL import Image


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-Kontext-dev", origin_file_pattern="flux1-kontext-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-Kontext-dev", origin_file_pattern="flux1-kontext-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 image_1 = pipe(
    prompt="a beautiful Asian long-haired female college student.",
--- a/examples/flux/model_inference_low_vram/FLUX.1-Krea-dev.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-Krea-dev.py
@@ -1,18 +1,28 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-Krea-dev", origin_file_pattern="flux1-krea-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-Krea-dev", origin_file_pattern="flux1-krea-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 prompt = "An beautiful woman is riding a bicycle in a park, wearing a red dress"
 negative_prompt = "worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw,"
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev-AttriCtrl.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev-AttriCtrl.py
@@ -1,19 +1,29 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="DiffSynth-Studio/AttriCtrl-FLUX.1-Dev", origin_file_pattern="models/brightness.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn)
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
+        ModelConfig(model_id="DiffSynth-Studio/AttriCtrl-FLUX.1-Dev", origin_file_pattern="models/brightness.safetensors", **vram_config)
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 for i in [0.1, 0.3, 0.5, 0.7, 0.9]:
    image = pipe(prompt="a cat on the beach", seed=2, value_controller_inputs=[i])
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev-Controlnet-Inpainting-Beta.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev-Controlnet-Inpainting-Beta.py
@@ -1,21 +1,31 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig, ControlNetInput
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig, ControlNetInput
 import numpy as np
 from PIL import Image


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta", origin_file_pattern="diffusion_pytorch_model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
+        ModelConfig(model_id="alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta", origin_file_pattern="diffusion_pytorch_model.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 image_1 = pipe(
    prompt="a cat sitting on a chair",
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev-Controlnet-Union-alpha.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev-Controlnet-Union-alpha.py
@@ -1,23 +1,32 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig, ControlNetInput
-from diffsynth.controlnets.processors import Annotator
-from diffsynth import download_models
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig, ControlNetInput
+from diffsynth.utils.controlnet import Annotator
+from modelscope import snapshot_download


-
-download_models(["Annotators:Depth"])
+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
+snapshot_download("sd_lora/Annotators", allow_file_pattern="dpt_hybrid-midas-501f0c75.pt", local_dir="models/Annotators")
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="InstantX/FLUX.1-dev-Controlnet-Union-alpha", origin_file_pattern="diffusion_pytorch_model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
+        ModelConfig(model_id="InstantX/FLUX.1-dev-Controlnet-Union-alpha", origin_file_pattern="diffusion_pytorch_model.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 image_1 = pipe(
    prompt="a beautiful Asian girl, full body, red dress, summer",
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev-Controlnet-Upscaler.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev-Controlnet-Upscaler.py
@@ -1,19 +1,29 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig, ControlNetInput
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig, ControlNetInput


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="jasperai/Flux.1-dev-Controlnet-Upscaler", origin_file_pattern="diffusion_pytorch_model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
+        ModelConfig(model_id="jasperai/Flux.1-dev-Controlnet-Upscaler", origin_file_pattern="diffusion_pytorch_model.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 image_1 = pipe(
    prompt="a photo of a cat, highly detailed",
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev-EliGen.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev-EliGen.py
@@ -1,11 +1,20 @@
 import random
 import torch
 from PIL import Image, ImageDraw, ImageFont
-from diffsynth import download_customized_models
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
 from modelscope import dataset_snapshot_download


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 def visualize_masks(image, masks, mask_prompts, output_path, font_size=35, use_random_colors=False):
    # Create a blank image for overlays
    overlay = Image.new('RGBA', image.size, (0, 0, 0, 0))
@@ -89,27 +98,14 @@ pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()
-
-download_from_modelscope = True
-if download_from_modelscope:
-    model_id = "DiffSynth-Studio/Eligen"
-    downloading_priority = ["ModelScope"]
-else:
-    model_id = "modelscope/EliGen"
-    downloading_priority = ["HuggingFace"]
-EliGen_path = download_customized_models(
-    model_id=model_id,
-    origin_file_path="model_bf16.safetensors",
-    local_dir="models/lora/entity_control",
-    downloading_priority=downloading_priority)[0]
-pipe.load_lora(pipe.dit, EliGen_path, alpha=1)
+pipe.load_lora(pipe.dit, ModelConfig(model_id="DiffSynth-Studio/Eligen", origin_file_pattern="model_bf16.safetensors"), alpha=1)

 # example 1
 global_prompt = "A breathtaking beauty of Raja Ampat by the late-night moonlight , one beautiful woman from behind wearing a pale blue long dress with soft glow, sitting at the top of a cliff looking towards the beach,pastell light colors, a group of small distant birds flying in far sky, a boat sailing on the sea, best quality, realistic, whimsical, fantastic, splash art, intricate detailed, hyperdetailed, maximalist style, photorealistic, concept art, sharp focus, harmony, serenity, tranquility, soft pastell colors,ambient occlusion, cozy ambient lighting, masterpiece, liiv1, linquivera, metix, mentixis, masterpiece, award winning, view from above\n"
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev-IP-Adapter.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev-IP-Adapter.py
@@ -1,20 +1,30 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="InstantX/FLUX.1-dev-IP-Adapter", origin_file_pattern="ip-adapter.bin", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="google/siglip-so400m-patch14-384", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
+        ModelConfig(model_id="InstantX/FLUX.1-dev-IP-Adapter", origin_file_pattern="ip-adapter.bin", **vram_config),
+        ModelConfig(model_id="google/siglip-so400m-patch14-384", origin_file_pattern="model.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 origin_prompt = "a rabbit in a garden, colorful flowers"
 image = pipe(prompt=origin_prompt, height=1280, width=960, seed=42)
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev-InfiniteYou.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev-InfiniteYou.py
@@ -1,11 +1,24 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig, ControlNetInput
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig, ControlNetInput
 from modelscope import dataset_snapshot_download
 from modelscope import snapshot_download
 from PIL import Image
 import numpy as np


+# This model has additional requirements.
+# Please install the following packages.
+# pip install facexlib insightface onnxruntime
+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 snapshot_download(
    "ByteDance/InfiniteYou",
    allow_file_pattern="supports/insightface/models/antelopev2/*",
@@ -15,15 +28,15 @@ pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="ByteDance/InfiniteYou", origin_file_pattern="infu_flux_v1.0/aes_stage2/image_proj_model.bin", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="ByteDance/InfiniteYou", origin_file_pattern="infu_flux_v1.0/aes_stage2/InfuseNetModel/*.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
+        ModelConfig(model_id="ByteDance/InfiniteYou", origin_file_pattern="infu_flux_v1.0/aes_stage2/image_proj_model.bin", **vram_config),
+        ModelConfig(model_id="ByteDance/InfiniteYou", origin_file_pattern="infu_flux_v1.0/aes_stage2/InfuseNetModel/*.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 dataset_snapshot_download(
    dataset_id="DiffSynth-Studio/examples_in_diffsynth",
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev-LoRA-Encoder.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev-LoRA-Encoder.py
@@ -1,23 +1,31 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev", origin_file_pattern="model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
+        ModelConfig(model_id="DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev", origin_file_pattern="model.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()
-pipe.enable_lora_magic()
-
 lora = ModelConfig(model_id="VoidOc/flux_animal_forest1", origin_file_pattern="20.safetensors")
-pipe.load_lora(pipe.dit, lora, hotload=True) # Use `pipe.clear_lora()` to drop the loaded LoRA.
+pipe.load_lora(pipe.dit, lora) # Use `pipe.clear_lora()` to drop the loaded LoRA.

 # Empty prompt can automatically activate LoRA capabilities.
 image = pipe(prompt="", seed=0, lora_encoder_inputs=lora)
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev-LoRA-Fusion.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev-LoRA-Fusion.py
@@ -0,0 +1,38 @@
+import torch
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
+
+
+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
+pipe = FluxImagePipeline.from_pretrained(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_configs=[
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
+        ModelConfig(model_id="DiffSynth-Studio/LoRAFusion-preview-FLUX.1-dev", origin_file_pattern="model.safetensors", **vram_config),
+    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
+)
+pipe.enable_lora_merger()
+
+pipe.load_lora(
+    pipe.dit,
+    ModelConfig(model_id="cancel13/cxsk", origin_file_pattern="30.safetensors"),
+)
+pipe.load_lora(
+    pipe.dit,
+    ModelConfig(model_id="DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1", origin_file_pattern="merged_lora.safetensors"),
+)
+image = pipe(prompt="a cat", seed=0)
+image.save("image_fused.jpg")
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev-LoRAFusion.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev-LoRAFusion.py
@@ -1,35 +0,0 @@
-import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
-
-
-pipe = FluxImagePipeline.from_pretrained(
-    torch_dtype=torch.bfloat16,
-    device="cuda",
-    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="DiffSynth-Studio/FLUX.1-dev-LoRAFusion", origin_file_pattern="model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn)
-    ],
-)
-pipe.enable_vram_management()
-pipe.enable_lora_patcher()
-pipe.load_lora(
-    pipe.dit,
-    ModelConfig(model_id="yangyufeng/fgao", origin_file_pattern="30.safetensors"),
-    hotload=True
-)
-pipe.load_lora(
-    pipe.dit,
-    ModelConfig(model_id="bobooblue/LoRA-bling-mai", origin_file_pattern="10.safetensors"),
-    hotload=True
-)
-pipe.load_lora(
-    pipe.dit,
-    ModelConfig(model_id="JIETANGAB/E", origin_file_pattern="17.safetensors"),
-    hotload=True
-)
-
-image = pipe(prompt="This is a digital painting in a soft, ethereal style. a beautiful Asian girl Shine like a diamond. Everywhere is shining with bling bling luster.The background is a textured blue with visible brushstrokes, giving the image an impressionistic style reminiscent of Vincent van Gogh's work", seed=0)
-image.save("flux.jpg")
--- a/examples/flux/model_inference_low_vram/FLUX.1-dev.py
+++ b/examples/flux/model_inference_low_vram/FLUX.1-dev.py
@@ -1,18 +1,28 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 prompt = "CG, masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait. The girl's flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her."
 negative_prompt = "worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw,"
--- a/examples/flux/model_inference_low_vram/Nexus-Gen-Editing.py
+++ b/examples/flux/model_inference_low_vram/Nexus-Gen-Editing.py
@@ -1,7 +1,7 @@
 import importlib
 import torch
 from PIL import Image
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
 from modelscope import dataset_snapshot_download


@@ -12,19 +12,29 @@ else:
    assert transformers.__version__ == "4.49.0", "Nexus-GenV2 requires transformers==4.49.0, please install it with `pip install transformers==4.49.0`."


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="model*.safetensors", offload_device="cpu"),
-        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="edit_decoder.bin", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu"),
+        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="model*.safetensors", **vram_config),
+        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="edit_decoder.bin", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
    ],
    nexus_gen_processor_config=ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="processor/"),
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth", local_dir="./", allow_file_pattern=f"data/examples/nexusgen/cat.jpg")
 ref_image = Image.open("data/examples/nexusgen/cat.jpg").convert("RGB")
--- a/examples/flux/model_inference_low_vram/Nexus-Gen-Generation.py
+++ b/examples/flux/model_inference_low_vram/Nexus-Gen-Generation.py
@@ -1,6 +1,6 @@
 import importlib
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig


 if importlib.util.find_spec("transformers") is None:
@@ -10,19 +10,29 @@ else:
    assert transformers.__version__ == "4.49.0", "Nexus-GenV2 requires transformers==4.49.0, please install it with `pip install transformers==4.49.0`."


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="model*.safetensors", offload_device="cpu"),
-        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="generation_decoder.bin", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/", offload_device="cpu"),
-        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", offload_device="cpu"),
+        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="model*.safetensors", **vram_config),
+        ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="generation_decoder.bin", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
+        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
    ],
-    nexus_gen_processor_config=ModelConfig(model_id="DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="processor/"),
+    nexus_gen_processor_config=ModelConfig("DiffSynth-Studio/Nexus-GenV2", origin_file_pattern="processor"),
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 prompt = "一只可爱的猫咪"
 image = pipe(
--- a/examples/flux/model_inference_low_vram/Step1X-Edit.py
+++ b/examples/flux/model_inference_low_vram/Step1X-Edit.py
@@ -1,19 +1,29 @@
 import torch
-from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig
+from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig
 from PIL import Image
 import numpy as np


+vram_config = {
+    "offload_dtype": torch.float8_e4m3fn,
+    "offload_device": "cpu",
+    "onload_dtype": torch.float8_e4m3fn,
+    "onload_device": "cpu",
+    "preparing_dtype": torch.float8_e4m3fn,
+    "preparing_device": "cuda",
+    "computation_dtype": torch.bfloat16,
+    "computation_device": "cuda",
+}
 pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
-        ModelConfig(model_id="Qwen/Qwen2.5-VL-7B-Instruct", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="stepfun-ai/Step1X-Edit", origin_file_pattern="step1x-edit-i1258.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
-        ModelConfig(model_id="stepfun-ai/Step1X-Edit", origin_file_pattern="vae.safetensors", offload_device="cpu", offload_dtype=torch.float8_e4m3fn),
+        ModelConfig(model_id="Qwen/Qwen2.5-VL-7B-Instruct", origin_file_pattern="model-*.safetensors", **vram_config),
+        ModelConfig(model_id="stepfun-ai/Step1X-Edit", origin_file_pattern="step1x-edit-i1258.safetensors", **vram_config),
+        ModelConfig(model_id="stepfun-ai/Step1X-Edit", origin_file_pattern="vae.safetensors", **vram_config),
    ],
+    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
 )
-pipe.enable_vram_management()

 image = Image.fromarray(np.zeros((1248, 832, 3), dtype=np.uint8) + 255)
 image = pipe(
--- a/examples/flux/model_training/full/FLEX.2-preview.sh
+++ b/examples/flux/model_training/full/FLEX.2-preview.sh
@@ -3,7 +3,7 @@ accelerate launch --config_file examples/flux/model_training/full/accelerate_con
  --dataset_metadata_path data/example_image_dataset/metadata.csv \
  --max_pixels 1048576 \
  --dataset_repeat 200 \
-  --model_id_with_origin_paths "ostris/Flex.2-preview:Flex.2-preview.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors" \
+  --model_id_with_origin_paths "ostris/Flex.2-preview:Flex.2-preview.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/full/FLUX.1-Kontext-dev.sh
+++ b/examples/flux/model_training/full/FLUX.1-Kontext-dev.sh
@@ -4,7 +4,7 @@ accelerate launch --config_file examples/flux/model_training/full/accelerate_con
  --data_file_keys "image,kontext_images" \
  --max_pixels 1048576 \
  --dataset_repeat 400 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-Kontext-dev:flux1-kontext-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-Kontext-dev:flux1-kontext-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/full/FLUX.1-Krea-dev.sh
+++ b/examples/flux/model_training/full/FLUX.1-Krea-dev.sh
@@ -3,7 +3,7 @@ accelerate launch --config_file examples/flux/model_training/full/accelerate_con
  --dataset_metadata_path data/example_image_dataset/metadata.csv \
  --max_pixels 1048576 \
  --dataset_repeat 400 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-Krea-dev:flux1-krea-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-Krea-dev:flux1-krea-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/full/FLUX.1-dev-AttriCtrl.sh
+++ b/examples/flux/model_training/full/FLUX.1-dev-AttriCtrl.sh
@@ -4,7 +4,7 @@ accelerate launch examples/flux/model_training/train.py \
  --data_file_keys "image" \
  --max_pixels 1048576 \
  --dataset_repeat 100 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors,DiffSynth-Studio/AttriCtrl-FLUX.1-Dev:models/brightness.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors,DiffSynth-Studio/AttriCtrl-FLUX.1-Dev:models/brightness.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.value_controller.encoders.0." \
--- a/examples/flux/model_training/full/FLUX.1-dev-Controlnet-Inpainting-Beta.sh
+++ b/examples/flux/model_training/full/FLUX.1-dev-Controlnet-Inpainting-Beta.sh
@@ -4,7 +4,7 @@ accelerate launch --config_file examples/flux/model_training/full/accelerate_con
  --data_file_keys "image,controlnet_image,controlnet_inpaint_mask" \
  --max_pixels 1048576 \
  --dataset_repeat 400 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors,alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta:diffusion_pytorch_model.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors,alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta:diffusion_pytorch_model.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.controlnet.models.0." \
--- a/examples/flux/model_training/full/FLUX.1-dev-Controlnet-Union-alpha.sh
+++ b/examples/flux/model_training/full/FLUX.1-dev-Controlnet-Union-alpha.sh
@@ -4,7 +4,7 @@ accelerate launch --config_file examples/flux/model_training/full/accelerate_con
  --data_file_keys "image,controlnet_image" \
  --max_pixels 1048576 \
  --dataset_repeat 400 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors,InstantX/FLUX.1-dev-Controlnet-Union-alpha:diffusion_pytorch_model.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors,InstantX/FLUX.1-dev-Controlnet-Union-alpha:diffusion_pytorch_model.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.controlnet.models.0." \
--- a/examples/flux/model_training/full/FLUX.1-dev-Controlnet-Upscaler.sh
+++ b/examples/flux/model_training/full/FLUX.1-dev-Controlnet-Upscaler.sh
@@ -4,7 +4,7 @@ accelerate launch --config_file examples/flux/model_training/full/accelerate_con
  --data_file_keys "image,controlnet_image" \
  --max_pixels 1048576 \
  --dataset_repeat 400 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors,jasperai/Flux.1-dev-Controlnet-Upscaler:diffusion_pytorch_model.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors,jasperai/Flux.1-dev-Controlnet-Upscaler:diffusion_pytorch_model.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.controlnet.models.0." \
--- a/examples/flux/model_training/full/FLUX.1-dev-IP-Adapter.sh
+++ b/examples/flux/model_training/full/FLUX.1-dev-IP-Adapter.sh
@@ -4,7 +4,7 @@ accelerate launch examples/flux/model_training/train.py \
  --data_file_keys "image,ipadapter_images" \
  --max_pixels 1048576 \
  --dataset_repeat 100 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors,InstantX/FLUX.1-dev-IP-Adapter:ip-adapter.bin,google/siglip-so400m-patch14-384:" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors,InstantX/FLUX.1-dev-IP-Adapter:ip-adapter.bin,google/siglip-so400m-patch14-384:model.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.ipadapter." \
--- a/examples/flux/model_training/full/FLUX.1-dev-InfiniteYou.sh
+++ b/examples/flux/model_training/full/FLUX.1-dev-InfiniteYou.sh
@@ -4,7 +4,7 @@ accelerate launch --config_file examples/flux/model_training/full/accelerate_con
  --data_file_keys "image,controlnet_image,infinityou_id_image" \
  --max_pixels 1048576 \
  --dataset_repeat 400 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors,ByteDance/InfiniteYou:infu_flux_v1.0/aes_stage2/image_proj_model.bin,ByteDance/InfiniteYou:infu_flux_v1.0/aes_stage2/InfuseNetModel/*.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors,ByteDance/InfiniteYou:infu_flux_v1.0/aes_stage2/image_proj_model.bin,ByteDance/InfiniteYou:infu_flux_v1.0/aes_stage2/InfuseNetModel/*.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe." \
--- a/examples/flux/model_training/full/FLUX.1-dev-LoRA-Encoder.sh
+++ b/examples/flux/model_training/full/FLUX.1-dev-LoRA-Encoder.sh
@@ -4,7 +4,7 @@ accelerate launch examples/flux/model_training/train.py \
  --data_file_keys "image" \
  --max_pixels 1048576 \
  --dataset_repeat 100 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors,DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev:model.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors,DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev:model.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.lora_encoder." \
--- a/examples/flux/model_training/full/FLUX.1-dev.sh
+++ b/examples/flux/model_training/full/FLUX.1-dev.sh
@@ -3,7 +3,7 @@ accelerate launch --config_file examples/flux/model_training/full/accelerate_con
  --dataset_metadata_path data/example_image_dataset/metadata.csv \
  --max_pixels 1048576 \
  --dataset_repeat 400 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/full/Nexus-Gen.sh
+++ b/examples/flux/model_training/full/Nexus-Gen.sh
@@ -4,7 +4,7 @@ accelerate launch --config_file examples/flux/model_training/full/accelerate_con
  --data_file_keys "image,nexus_gen_reference_image" \
  --max_pixels 262144 \
  --dataset_repeat 400 \
-  --model_id_with_origin_paths "DiffSynth-Studio/Nexus-GenV2:model*.safetensors,DiffSynth-Studio/Nexus-GenV2:edit_decoder.bin,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors" \
+  --model_id_with_origin_paths "DiffSynth-Studio/Nexus-GenV2:model*.safetensors,DiffSynth-Studio/Nexus-GenV2:edit_decoder.bin,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/full/Step1X-Edit.sh
+++ b/examples/flux/model_training/full/Step1X-Edit.sh
@@ -4,7 +4,7 @@ accelerate launch --config_file examples/flux/model_training/full/accelerate_con
  --data_file_keys "image,step1x_reference_image" \
  --max_pixels 1048576 \
  --dataset_repeat 400 \
-  --model_id_with_origin_paths "Qwen/Qwen2.5-VL-7B-Instruct:,stepfun-ai/Step1X-Edit:step1x-edit-i1258.safetensors,stepfun-ai/Step1X-Edit:vae.safetensors" \
+  --model_id_with_origin_paths "Qwen/Qwen2.5-VL-7B-Instruct:model-*.safetensors,stepfun-ai/Step1X-Edit:step1x-edit-i1258.safetensors,stepfun-ai/Step1X-Edit:vae.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/lora/FLEX.2-preview.sh
+++ b/examples/flux/model_training/lora/FLEX.2-preview.sh
@@ -3,7 +3,7 @@ accelerate launch examples/flux/model_training/train.py \
  --dataset_metadata_path data/example_image_dataset/metadata.csv \
  --max_pixels 1048576 \
  --dataset_repeat 50 \
-  --model_id_with_origin_paths "ostris/Flex.2-preview:Flex.2-preview.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors" \
+  --model_id_with_origin_paths "ostris/Flex.2-preview:Flex.2-preview.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors" \
  --learning_rate 1e-4 \
  --num_epochs 5 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/lora/FLUX.1-Kontext-dev.sh
+++ b/examples/flux/model_training/lora/FLUX.1-Kontext-dev.sh
@@ -4,7 +4,7 @@ accelerate launch examples/flux/model_training/train.py \
  --data_file_keys "image,kontext_images" \
  --max_pixels 1048576 \
  --dataset_repeat 400 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-Kontext-dev:flux1-kontext-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-Kontext-dev:flux1-kontext-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors" \
  --learning_rate 1e-4 \
  --num_epochs 5 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/lora/FLUX.1-Krea-dev.sh
+++ b/examples/flux/model_training/lora/FLUX.1-Krea-dev.sh
@@ -3,7 +3,7 @@ accelerate launch examples/flux/model_training/train.py \
  --dataset_metadata_path data/example_image_dataset/metadata.csv \
  --max_pixels 1048576 \
  --dataset_repeat 50 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-Krea-dev:flux1-krea-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-Krea-dev:flux1-krea-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors" \
  --learning_rate 1e-4 \
  --num_epochs 5 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/lora/FLUX.1-dev-AttriCtrl.sh
+++ b/examples/flux/model_training/lora/FLUX.1-dev-AttriCtrl.sh
@@ -4,7 +4,7 @@ accelerate launch examples/flux/model_training/train.py \
  --data_file_keys "image" \
  --max_pixels 1048576 \
  --dataset_repeat 100 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors,DiffSynth-Studio/AttriCtrl-FLUX.1-Dev:models/brightness.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors,DiffSynth-Studio/AttriCtrl-FLUX.1-Dev:models/brightness.safetensors" \
  --learning_rate 1e-4 \
  --num_epochs 5 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/lora/FLUX.1-dev-Controlnet-Inpainting-Beta.sh
+++ b/examples/flux/model_training/lora/FLUX.1-dev-Controlnet-Inpainting-Beta.sh
@@ -4,7 +4,7 @@ accelerate launch examples/flux/model_training/train.py \
  --data_file_keys "image,controlnet_image,controlnet_inpaint_mask" \
  --max_pixels 1048576 \
  --dataset_repeat 100 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors,alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta:diffusion_pytorch_model.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors,alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta:diffusion_pytorch_model.safetensors" \
  --learning_rate 1e-4 \
  --num_epochs 5 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/examples/flux/model_training/lora/FLUX.1-dev-Controlnet-Union-alpha.sh
+++ b/examples/flux/model_training/lora/FLUX.1-dev-Controlnet-Union-alpha.sh
@@ -4,7 +4,7 @@ accelerate launch examples/flux/model_training/train.py \
  --data_file_keys "image,controlnet_image" \
  --max_pixels 1048576 \
  --dataset_repeat 100 \
-  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/,black-forest-labs/FLUX.1-dev:ae.safetensors,InstantX/FLUX.1-dev-Controlnet-Union-alpha:diffusion_pytorch_model.safetensors" \
+  --model_id_with_origin_paths "black-forest-labs/FLUX.1-dev:flux1-dev.safetensors,black-forest-labs/FLUX.1-dev:text_encoder/model.safetensors,black-forest-labs/FLUX.1-dev:text_encoder_2/*.safetensors,black-forest-labs/FLUX.1-dev:ae.safetensors,InstantX/FLUX.1-dev-Controlnet-Union-alpha:diffusion_pytorch_model.safetensors" \
  --learning_rate 1e-4 \
  --num_epochs 5 \
  --remove_prefix_in_ckpt "pipe.dit." \
--- a/Show More
+++ b/Show More