Files
DiffSynth-Studio/README.md
2024-10-10 18:26:37 +08:00

10 KiB

DiffSynth Studio

PyPI license open issues GitHub pull-requests GitHub latest commit

modelscope%2FDiffSynth-Studio | Trendshift

Introduction

DiffSynth Studio is a Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. We provide many interesting features. Enjoy the magic of Diffusion models!

Until now, DiffSynth Studio has supported the following models:

News

  • August 22, 2024. CogVideoX-5B is supported in this project. See here. We provide several interesting features for this text-to-video model, including

    • Text to video
    • Video editing
    • Self-upscaling
    • Video interpolation
  • August 22, 2024. We have implemented an interesting painter that supports all text-to-image models. Now you can create stunning images using the painter, with assistance from AI!

  • August 21, 2024. FLUX is supported in DiffSynth-Studio.

    • Enable CFG and highres-fix to improve visual quality. See here
    • LoRA, ControlNet, and additional models will be available soon.
  • June 21, 2024. 🔥🔥🔥 We propose ExVideo, a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.

  • June 13, 2024. DiffSynth Studio is transferred to ModelScope. The developers have transitioned from "I" to "we". Of course, I will still participate in development and maintenance.

  • Jan 29, 2024. We propose Diffutoon, a fantastic solution for toon shading.

    • Project Page
    • The source codes are released in this project.
    • The technical report (IJCAI 2024) is released on arXiv.
  • Dec 8, 2023. We decide to develop a new Project, aiming to release the potential of diffusion models, especially in video synthesis. The development of this project is started.

  • Nov 15, 2023. We propose FastBlend, a powerful video deflickering algorithm.

  • Oct 1, 2023. We release an early version of this project, namely FastSDXL. A try for building a diffusion engine.

    • The source codes are released on GitHub.
    • FastSDXL includes a trainable OLSS scheduler for efficiency improvement.
      • The original repo of OLSS is here.
      • The technical report (CIKM 2023) is released on arXiv.
      • A demo video is shown on Bilibili.
      • Since OLSS requires additional training, we don't implement it in this project.
  • Aug 29, 2023. We propose DiffSynth, a video synthesis framework.

Installation

Install from source code (recommended):

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .

Or install from pypi:

pip install diffsynth

Usage (in Python code)

The Python examples are in examples. We provide an overview here.

Download Models

Download the pre-set models. Model IDs can be found in config file.

from diffsynth import download_models

download_models(["FLUX.1-dev", "Kolors"])

Download your own models.

from diffsynth.models.downloader import download_from_huggingface, download_from_modelscope

# From Modelscope (recommended)
download_from_modelscope("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.bin", "models/kolors/Kolors/vae")
# From Huggingface
download_from_huggingface("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.safetensors", "models/kolors/Kolors/vae")

Video Synthesis

Text-to-video using CogVideoX-5B

CogVideoX-5B is released by ZhiPu. We provide an improved pipeline, supporting text-to-video, video editing, self-upscaling and video interpolation. examples/video_synthesis

The video on the left is generated using the original text-to-video pipeline, while the video on the right is the result after editing and frame interpolation.

https://github.com/user-attachments/assets/26b044c1-4a60-44a4-842f-627ff289d006

Long Video Synthesis

We trained extended video synthesis models, which can generate 128 frames. examples/ExVideo

https://github.com/modelscope/DiffSynth-Studio/assets/35051019/d97f6aa9-8064-4b5b-9d49-ed6001bb9acc

https://github.com/user-attachments/assets/321ee04b-8c17-479e-8a95-8cbcf21f8d7e

Toon Shading

Render realistic videos in a flatten style and enable video editing features. examples/Diffutoon

https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-4709-be5e-b39af82404dd

https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/20528af5-5100-474a-8cdc-440b9efdd86c

Video Stylization

Video stylization without video models. examples/diffsynth

https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/59fb2f7b-8de0-4481-b79f-0c3a7361a1ea

Image Synthesis

Generate high-resolution images, by breaking the limitation of diffusion models! examples/image_synthesis.

LoRA fine-tuning is supported in examples/train.

FLUX Stable Diffusion 3
image_1024_cfg image_1024
Kolors Hunyuan-DiT
image_1024 image_1024
Stable Diffusion Stable Diffusion XL
1024 1024

Usage (in WebUI)

Create stunning images using the painter, with assistance from AI!

https://github.com/user-attachments/assets/95265d21-cdd6-4125-a7cb-9fbcf6ceb7b0

This video is not rendered in real-time.

Before launching the WebUI, please download models to the folder ./models. See here.

  • Gradio version
pip install gradio
python apps/gradio/DiffSynth_Studio.py

20240822102002

  • Streamlit version
pip install streamlit streamlit-drawable-canvas
python -m streamlit run apps/streamlit/DiffSynth_Studio.py

https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/93085557-73f3-4eee-a205-9829591ef954