rearrange examples

2026-04-08 08:58:20 +00:00 · 2024-06-06 18:50:07 +08:00
parent f6de5eef4d
commit 4d4a095420
20 changed files with 140 additions and 45 deletions
--- a/README.md
+++ b/README.md
@@ -2,7 +2,43 @@

 ## Introduction

-DiffSynth is a new Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. This version is currently in its initial stage, supporting SD and SDXL architectures. In the future, we plan to develop more interesting features based on this new codebase.
+DiffSynth Studio is a Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. We provide many interesting features. Enjoy the magic of Diffusion models!
+
+## Roadmap
+
+* Aug 29, 2023. I propose DiffSynth, a video synthesis framework.
+    * [Project Page](https://ecnu-cilab.github.io/DiffSynth.github.io/).
+    * The source codes are released in [EasyNLP](https://github.com/alibaba/EasyNLP/tree/master/diffusion/DiffSynth).
+    * The technical report (ECML PKDD 2024) is released on [arXiv](https://arxiv.org/abs/2308.03463).
+* Oct 1, 2023. I release an early version of this project, namely FastSDXL. A try for building a diffusion engine.
+    * The source codes are released on [GitHub](https://github.com/Artiprocher/FastSDXL).
+    * FastSDXL includes a trainable OLSS scheduler for efficiency improvement.
+        * The original repo of OLSS is [here](https://github.com/alibaba/EasyNLP/tree/master/diffusion/olss_scheduler).
+        * The technical report (CIKM 2023) is released on [arXiv](https://arxiv.org/abs/2305.14677).
+        * A demo video is shown on [Bilibili](https://www.bilibili.com/video/BV1w8411y7uj).
+        * Since OLSS requires additional training, we don't implement it in this project.
+* Nov 15, 2023. I propose FastBlend, a powerful video deflickering algorithm.
+    * The sd-webui extension is released on [GitHub](https://github.com/Artiprocher/sd-webui-fastblend).
+    * Demo videos are shown on Bilibili, including three tasks.
+        * [Video deflickering](https://www.bilibili.com/video/BV1d94y1W7PE)
+        * [Video interpolation](https://www.bilibili.com/video/BV1Lw411m71p)
+        * [Image-driven video rendering](https://www.bilibili.com/video/BV1RB4y1Z7LF)
+    * The technical report is released on [arXiv](https://arxiv.org/abs/2311.09265).
+    * An unofficial ComfyUI extension developed by other users is released on [GitHub](https://github.com/AInseven/ComfyUI-fastblend).
+* Dec 8, 2023. I decide to develop a new Project, aiming to release the potential of diffusion models, especially in video synthesis.
+* Jan 29, 2024. I propose Diffutoon, a fantastic solution for toon shading.
+    * [Project Page](https://ecnu-cilab.github.io/DiffutoonProjectPage/).
+    * The source codes are released in this project.
+    * The technical report (IJCAI 2024) is released on [arXiv](https://arxiv.org/abs/2401.16224).
+* Until now, DiffSynth Studio has supported the following models:
+    * [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)
+    * [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
+    * [ControlNet](https://github.com/lllyasviel/ControlNet)
+    * [AnimateDiff](https://github.com/guoyww/animatediff/)
+    * [Ip-Adapter](https://github.com/tencent-ailab/IP-Adapter)
+    * [ESRGAN](https://github.com/xinntao/ESRGAN)
+    * [RIFE](https://github.com/hzwer/ECCV2022-RIFE)
+    * [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)

 ## Installation

@@ -30,72 +66,46 @@ https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/93085557-73f3-4e

 ## Usage (in Python code)

-### Example 1: Stable Diffusion
+The Python examples are in [`examples`](./examples/). We provide an overview here.

-We can generate images with very high resolution. Please see `examples/sd_text_to_image.py` for more details.
+### Image Synthesis
+
+Generate high-resolution images, by breaking the limitation of diffusion models! [`examples/image_synthesis`](./examples/image_synthesis/)

 |512*512|1024*1024|2048*2048|4096*4096|
 |-|-|-|-|
 |![512](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/55f679e9-7445-4605-9315-302e93d11370)|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/6fc84611-8da6-4a1f-8fee-9a34eba3b4a5)|![2048](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/9087a73c-9164-4c58-b2a0-effc694143fb)|![4096](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/edee9e71-fc39-4d1c-9ca9-fa52002c67ac)|

-### Example 2: Stable Diffusion XL
-
-Generate images with Stable Diffusion XL. Please see `examples/sdxl_text_to_image.py` for more details.
-
 |1024*1024|2048*2048|
 |-|-|
 |![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/67687748-e738-438c-aee5-96096f09ac90)|![2048](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/584186bc-9855-4140-878e-99541f9a757f)|

-### Example 3: Stable Diffusion XL Turbo
+### Toon Shading

-Generate images with Stable Diffusion XL Turbo. You can see `examples/sdxl_turbo.py` for more details, but we highly recommend you to use it in the WebUI.
-
-|"black car"|"red car"|
-|-|-|
-|![black_car](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/7fbfd803-68d4-44f3-8713-8c925fec47d0)|![black_car_to_red_car](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/aaf886e4-c33c-4fd8-98e2-29eef117ba00)|
-
-### Example 4: Toon Shading (Diffutoon)
-
-This example is implemented based on [Diffutoon](https://arxiv.org/abs/2401.16224). This approach is adept for rendering high-resoluton videos with rapid motion. You can easily modify the parameters in the config dict. See `examples/diffutoon_toon_shading.py`. We also provide [an example on Colab](https://colab.research.google.com/github/Artiprocher/DiffSynth-Studio/blob/main/examples/Diffutoon.ipynb).
+Render realistic videos in a flatten style and enable video editing features. [`examples/Diffutoon`](./examples/Diffutoon/)

 https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-4709-be5e-b39af82404dd

-### Example 5: Toon Shading with Editing Signals (Diffutoon)
-
-This example is implemented based on [Diffutoon](https://arxiv.org/abs/2401.16224), supporting video editing signals. See `examples\diffutoon_toon_shading_with_editing_signals.py`. The editing feature is also supported in the [Colab example](https://colab.research.google.com/github/Artiprocher/DiffSynth-Studio/blob/main/examples/Diffutoon.ipynb).
-
 https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/20528af5-5100-474a-8cdc-440b9efdd86c

-### Example 6: Toon Shading (in native Python code)
+### Video Stylization

-This example is provided for developers. If you don't want to use the config to manage parameters, you can see `examples/sd_toon_shading.py` to learn how to use it in native Python code.
-
-https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/607c199b-6140-410b-a111-3e4ffb01142c
-
-### Example 7: Text to Video
-
-Given a prompt, DiffSynth Studio can generate a video using a Stable Diffusion model and an AnimateDiff model. We can break the limitation of number of frames! See `examples/sd_text_to_video.py`.
-
-https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/8f556355-4079-4445-9b48-e9da77699437
-
-### Example 8: Video Stylization
-
-We provide an example for video stylization. In this pipeline, the rendered video is completely different from the original video, thus we need a powerful deflickering algorithm. We use FastBlend to implement the deflickering module. Please see `examples/sd_video_rerender.py` for more details.
+Video stylization without video models. [`examples/diffsynth`](./examples/diffsynth/)

 https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/59fb2f7b-8de0-4481-b79f-0c3a7361a1ea

-### Example 9: Prompt Processing
+### Chinese Models

-If you are not native English user, we provide translation service for you. Our prompter can translate other language to English and refine it using "BeautifulPrompt" models. Please see `examples/sd_prompt_refining.py` for more details.
+Use Hunyuan-DiT to generate images with Chinese prompts. We also support LoRA fine-tuning of this model. [`examples/hunyuan_dit`](./examples/hunyuan_dit/)

-Prompt: "一个漂亮的女孩". The [translation model](https://huggingface.co/Helsinki-NLP/opus-mt-en-zh) will translate it to English.
+Prompt: 少女手捧鲜花，坐在公园的长椅上，夕阳的余晖洒在少女的脸庞，整个画面充满诗意的美感

-|seed=0|seed=1|seed=2|seed=3|
-|-|-|-|-|
-|![0_](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/ebb25ca8-7ce1-4d9e-8081-59a867c70c4d)|![1_](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/a7e79853-3c1a-471a-9c58-c209ec4b76dd)|![2_](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/a292b959-a121-481f-b79c-61cc3346f810)|![3_](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/1c19b54e-5a6f-4d48-960b-a7b2b149bb4c)|
+|1024x1024|2048x2048 (highres-fix)|
+|-|-|
+|![image_1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/2b6528cf-a229-46e9-b7dd-4a9475b07308)|![image_2048](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/11d264ec-966b-45c9-9804-74b60428b866)|

-Prompt: "一个漂亮的女孩". The [translation model](https://huggingface.co/Helsinki-NLP/opus-mt-en-zh) will translate it to English. Then the [refining model](https://huggingface.co/alibaba-pai/pai-bloom-1b1-text2prompt-sd) will refine the translated prompt for better visual quality.
+Prompt: 一只小狗蹦蹦跳跳，周围是姹紫嫣红的鲜花，远处是山脉

-|seed=0|seed=1|seed=2|seed=3|
-|-|-|-|-|
-|![0](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/778b1bd9-44e0-46ac-a99c-712b3fc9aaa4)|![1](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/c03479b8-2082-4c6e-8e1c-3582b98686f6)|![2](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/edb33d21-3288-4a55-96ca-a4bfe1b50b00)|![3](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/7848cfc1-cad5-4848-8373-41d24e98e584)|
+|Without LoRA|With LoRA|
+|-|-|
+|![image_without_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/1aa21de5-a992-4b66-b14f-caa44e08876e)|![image_with_lora](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/83a0a41a-691f-4610-8e7b-d8e17c50a282)|
--- a/examples/Diffutoon/Diffutoon.ipynb
+++ b/examples/Diffutoon/Diffutoon.ipynb
--- a/examples/Diffutoon/README.md
+++ b/examples/Diffutoon/README.md
@@ -0,0 +1,21 @@
+# Diffutoon
+
+[Diffutoon](https://arxiv.org/abs/2401.16224) is a toon shading approach. This approach is adept for rendering high-resoluton videos with rapid motion.
+
+## Example: Toon Shading (Diffutoon)
+
+Directly render realistic videos in a flatten style. In this example, you can easily modify the parameters in the config dict. See [`diffutoon_toon_shading.py`](./diffutoon_toon_shading.py). We also provide [an example on Colab](https://colab.research.google.com/github/Artiprocher/DiffSynth-Studio/blob/main/examples/Diffutoon.ipynb).
+
+https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-4709-be5e-b39af82404dd
+
+## Example: Toon Shading with Editing Signals (Diffutoon)
+
+This example supports video editing signals. See `examples\diffutoon_toon_shading_with_editing_signals.py`. The editing feature is also supported in the [Colab example](https://colab.research.google.com/github/Artiprocher/DiffSynth-Studio/blob/main/examples/Diffutoon/Diffutoon.ipynb).
+
+https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/20528af5-5100-474a-8cdc-440b9efdd86c
+
+## Example: Toon Shading (in native Python code)
+
+This example is provided for developers. If you don't want to use the config to manage parameters, you can see `examples/sd_toon_shading.py` to learn how to use it in native Python code.
+
+https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/607c199b-6140-410b-a111-3e4ffb01142c
--- a/examples/Diffutoon/diffutoon_toon_shading.py
+++ b/examples/Diffutoon/diffutoon_toon_shading.py
--- a/examples/Diffutoon/diffutoon_toon_shading_with_editing_signals.py
+++ b/examples/Diffutoon/diffutoon_toon_shading_with_editing_signals.py
--- a/examples/Diffutoon/sd_toon_shading.py
+++ b/examples/Diffutoon/sd_toon_shading.py
--- a/examples/Ip-Adapter/README.md
+++ b/examples/Ip-Adapter/README.md
@@ -0,0 +1,3 @@
+# IP-Adapter
+
+The features of IP-Adapter in DiffSynth Studio is not completed. Please wait for us.
--- a/examples/Ip-Adapter/sdxl_ipadapter.py
+++ b/examples/Ip-Adapter/sdxl_ipadapter.py
--- a/examples/diffsynth/README.md
+++ b/examples/diffsynth/README.md
@@ -0,0 +1,7 @@
+# DiffSynth
+
+DiffSynth is the initial version of our video synthesis framework. In this framework, you can apply video deflickering algorithms to the latent space of diffusion models. You can refer to the [original repo](https://github.com/alibaba/EasyNLP/tree/master/diffusion/DiffSynth) for more details.
+
+We provide an example for video stylization. In this pipeline, the rendered video is completely different from the original video, thus we need a powerful deflickering algorithm. We use FastBlend to implement the deflickering module. Please see [`sd_video_rerender.py`](./sd_video_rerender.py).
+
+https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/59fb2f7b-8de0-4481-b79f-0c3a7361a1ea
--- a/examples/diffsynth/sd_video_rerender.py
+++ b/examples/diffsynth/sd_video_rerender.py
--- a/examples/hunyuan_dit/README.md
+++ b/examples/hunyuan_dit/README.md
@@ -26,6 +26,8 @@ models/HunyuanDiT/

 The original resolution of Hunyuan DiT is 1024x1024. If you want to use larger resolutions, please use highres-fix.

+Hunyuan DiT is also supported in our UI.
+
 ```python
 from diffsynth import ModelManager, HunyuanDiTImagePipeline
 import torch
--- a/examples/image_synthesis/README.md
+++ b/examples/image_synthesis/README.md
@@ -0,0 +1,43 @@
+# Image Synthesis
+
+Image synthesis is the base feature of DiffSynth Studio.
+
+### Example: Stable Diffusion
+
+We can generate images with very high resolution. Please see `examples/sd_text_to_image.py` for more details.
+
+|512*512|1024*1024|2048*2048|4096*4096|
+|-|-|-|-|
+|![512](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/55f679e9-7445-4605-9315-302e93d11370)|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/6fc84611-8da6-4a1f-8fee-9a34eba3b4a5)|![2048](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/9087a73c-9164-4c58-b2a0-effc694143fb)|![4096](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/edee9e71-fc39-4d1c-9ca9-fa52002c67ac)|
+
+### Example: Stable Diffusion XL
+
+Generate images with Stable Diffusion XL. Please see `examples/sdxl_text_to_image.py` for more details.
+
+|1024*1024|2048*2048|
+|-|-|
+|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/67687748-e738-438c-aee5-96096f09ac90)|![2048](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/584186bc-9855-4140-878e-99541f9a757f)|
+
+### Example: Stable Diffusion XL Turbo
+
+Generate images with Stable Diffusion XL Turbo. You can see `examples/sdxl_turbo.py` for more details, but we highly recommend you to use it in the WebUI.
+
+|"black car"|"red car"|
+|-|-|
+|![black_car](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/7fbfd803-68d4-44f3-8713-8c925fec47d0)|![black_car_to_red_car](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/aaf886e4-c33c-4fd8-98e2-29eef117ba00)|
+
+### Example: Prompt Processing
+
+If you are not native English user, we provide translation service for you. Our prompter can translate other language to English and refine it using "BeautifulPrompt" models. Please see `examples/sd_prompt_refining.py` for more details.
+
+Prompt: "一个漂亮的女孩". The [translation model](https://huggingface.co/Helsinki-NLP/opus-mt-en-zh) will translate it to English.
+
+|seed=0|seed=1|seed=2|seed=3|
+|-|-|-|-|
+|![0_](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/ebb25ca8-7ce1-4d9e-8081-59a867c70c4d)|![1_](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/a7e79853-3c1a-471a-9c58-c209ec4b76dd)|![2_](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/a292b959-a121-481f-b79c-61cc3346f810)|![3_](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/1c19b54e-5a6f-4d48-960b-a7b2b149bb4c)|
+
+Prompt: "一个漂亮的女孩". The [translation model](https://huggingface.co/Helsinki-NLP/opus-mt-en-zh) will translate it to English. Then the [refining model](https://huggingface.co/alibaba-pai/pai-bloom-1b1-text2prompt-sd) will refine the translated prompt for better visual quality.
+
+|seed=0|seed=1|seed=2|seed=3|
+|-|-|-|-|
+|![0](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/778b1bd9-44e0-46ac-a99c-712b3fc9aaa4)|![1](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/c03479b8-2082-4c6e-8e1c-3582b98686f6)|![2](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/edb33d21-3288-4a55-96ca-a4bfe1b50b00)|![3](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/7848cfc1-cad5-4848-8373-41d24e98e584)|
--- a/examples/image_synthesis/sd_prompt_refining.py
+++ b/examples/image_synthesis/sd_prompt_refining.py
--- a/examples/image_synthesis/sd_text_to_image.py
+++ b/examples/image_synthesis/sd_text_to_image.py
--- a/examples/image_synthesis/sdxl_text_to_image.py
+++ b/examples/image_synthesis/sdxl_text_to_image.py
--- a/examples/image_synthesis/sdxl_turbo.py
+++ b/examples/image_synthesis/sdxl_turbo.py
--- a/examples/video_synthesis/README.md
+++ b/examples/video_synthesis/README.md
@@ -0,0 +1,9 @@
+# Text to Video
+
+In DiffSynth Studio, we can use AnimateDiff and SVD to generate videos. However, these models usually generate terrible contents. We do not recommend users to use these models, until a more powerful video model emerges.
+
+### Example 7: Text to Video
+
+Generate a video using a Stable Diffusion model and an AnimateDiff model. We can break the limitation of number of frames! See [sd_text_to_video.py](./sd_text_to_video.py).
+
+https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/8f556355-4079-4445-9b48-e9da77699437
--- a/examples/video_synthesis/sd_text_to_video.py
+++ b/examples/video_synthesis/sd_text_to_video.py
--- a/examples/video_synthesis/sdxl_text_to_video.py
+++ b/examples/video_synthesis/sdxl_text_to_video.py
--- a/examples/video_synthesis/svd_text_to_video.py
+++ b/examples/video_synthesis/svd_text_to_video.py