7.0 KiB
DiffSynth Studio
Introduction
DiffSynth Studio is a Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. We provide many interesting features. Enjoy the magic of Diffusion models!
Until now, DiffSynth Studio has supported the following models:
- ExVideo
- Stable Video Diffusion
- Hunyuan-DiT
- RIFE
- ESRGAN
- Ip-Adapter
- AnimateDiff
- ControlNet
- Stable Diffusion XL
- Stable Diffusion
News
-
June 21, 2024. 🔥🔥🔥 We propose ExVideo, a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.
- Project Page
- Source code is released in this repo. See
examples/ExVideo. - Models are released on HuggingFace and ModelScope.
- Technical report is released on arXiv.
-
June 13, 2024. DiffSynth Studio is transferred to ModelScope. The developers have transitioned from "I" to "we". Of course, I will still participate in development and maintenance.
-
Jan 29, 2024. We propose Diffutoon, a fantastic solution for toon shading.
- Project Page
- The source codes are released in this project.
- The technical report (IJCAI 2024) is released on arXiv.
-
Dec 8, 2023. We decide to develop a new Project, aiming to release the potential of diffusion models, especially in video synthesis. The development of this project is started.
-
Nov 15, 2023. We propose FastBlend, a powerful video deflickering algorithm.
-
Oct 1, 2023. We release an early version of this project, namely FastSDXL. A try for building a diffusion engine.
- The source codes are released on GitHub.
- FastSDXL includes a trainable OLSS scheduler for efficiency improvement.
-
Aug 29, 2023. We propose DiffSynth, a video synthesis framework.
- Project Page.
- The source codes are released in EasyNLP.
- The technical report (ECML PKDD 2024) is released on arXiv.
Installation
git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .
Usage (in Python code)
The Python examples are in examples. We provide an overview here.
Long Video Synthesis
We trained an extended video synthesis model, which can generate 128 frames. examples/ExVideo
https://github.com/modelscope/DiffSynth-Studio/assets/35051019/d97f6aa9-8064-4b5b-9d49-ed6001bb9acc
Image Synthesis
Generate high-resolution images, by breaking the limitation of diffusion models! examples/image_synthesis
| 512*512 | 1024*1024 | 2048*2048 | 4096*4096 |
|---|---|---|---|
| 1024*1024 | 2048*2048 |
|---|---|
Toon Shading
Render realistic videos in a flatten style and enable video editing features. examples/Diffutoon
https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-4709-be5e-b39af82404dd
https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/20528af5-5100-474a-8cdc-440b9efdd86c
Video Stylization
Video stylization without video models. examples/diffsynth
https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/59fb2f7b-8de0-4481-b79f-0c3a7361a1ea
Chinese Models
Use Hunyuan-DiT to generate images with Chinese prompts. We also support LoRA fine-tuning of this model. examples/hunyuan_dit
Prompt: 少女手捧鲜花,坐在公园的长椅上,夕阳的余晖洒在少女的脸庞,整个画面充满诗意的美感
| 1024x1024 | 2048x2048 (highres-fix) |
|---|---|
Prompt: 一只小狗蹦蹦跳跳,周围是姹紫嫣红的鲜花,远处是山脉
| Without LoRA | With LoRA |
|---|---|
Usage (in WebUI)
python -m streamlit run DiffSynth_Studio.py
https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/93085557-73f3-4eee-a205-9829591ef954