2024-06-13 10:54:32 +08:00
2024-06-05 13:00:39 +08:00
2024-06-09 15:26:44 +08:00
2024-06-06 19:42:32 +08:00
2024-06-09 15:26:44 +08:00
2024-06-06 15:56:24 +08:00
2023-12-23 20:13:41 +08:00
2024-06-06 15:56:24 +08:00
2023-12-08 01:03:30 +08:00
2024-06-06 18:50:07 +08:00

DiffSynth Studio

Introduction

DiffSynth Studio is a Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. We provide many interesting features. Enjoy the magic of Diffusion models!

Roadmap

  • Aug 29, 2023. I propose DiffSynth, a video synthesis framework.
  • Oct 1, 2023. I release an early version of this project, namely FastSDXL. A try for building a diffusion engine.
    • The source codes are released on GitHub.
    • FastSDXL includes a trainable OLSS scheduler for efficiency improvement.
      • The original repo of OLSS is here.
      • The technical report (CIKM 2023) is released on arXiv.
      • A demo video is shown on Bilibili.
      • Since OLSS requires additional training, we don't implement it in this project.
  • Nov 15, 2023. I propose FastBlend, a powerful video deflickering algorithm.
  • Dec 8, 2023. I decide to develop a new Project, aiming to release the potential of diffusion models, especially in video synthesis.
  • Jan 29, 2024. I propose Diffutoon, a fantastic solution for toon shading.
    • Project Page.
    • The source codes are released in this project.
    • The technical report (IJCAI 2024) is released on arXiv.
  • Until now, DiffSynth Studio has supported the following models:

Installation

Create Python environment:

conda env create -f environment.yml

We find that sometimes conda cannot install cupy correctly, please install it manually. See this document for more details.

Enter the Python environment:

conda activate DiffSynthStudio

Usage (in WebUI)

python -m streamlit run DiffSynth_Studio.py

https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/93085557-73f3-4eee-a205-9829591ef954

Usage (in Python code)

The Python examples are in examples. We provide an overview here.

Image Synthesis

Generate high-resolution images, by breaking the limitation of diffusion models! examples/image_synthesis

512*512 1024*1024 2048*2048 4096*4096
512 1024 2048 4096
1024*1024 2048*2048
1024 2048

Toon Shading

Render realistic videos in a flatten style and enable video editing features. examples/Diffutoon

https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-4709-be5e-b39af82404dd

https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/20528af5-5100-474a-8cdc-440b9efdd86c

Video Stylization

Video stylization without video models. examples/diffsynth

https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/59fb2f7b-8de0-4481-b79f-0c3a7361a1ea

Chinese Models

Use Hunyuan-DiT to generate images with Chinese prompts. We also support LoRA fine-tuning of this model. examples/hunyuan_dit

Prompt: 少女手捧鲜花,坐在公园的长椅上,夕阳的余晖洒在少女的脸庞,整个画面充满诗意的美感

1024x1024 2048x2048 (highres-fix)
image_1024 image_2048

Prompt: 一只小狗蹦蹦跳跳,周围是姹紫嫣红的鲜花,远处是山脉

Without LoRA With LoRA
image_without_lora image_with_lora
Description
No description provided
Readme Apache-2.0 54 MiB
Languages
Python 100%