diff --git a/README.md b/README.md index 7624597..a57fa45 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ DiffSynth Studio is a Diffusion engine. We have restructured architectures inclu Until now, DiffSynth Studio has supported the following models: +* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev) * [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1) * [Kolors](https://huggingface.co/Kwai-Kolors/Kolors) * [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium) @@ -30,6 +31,8 @@ Until now, DiffSynth Studio has supported the following models: ## News +- **August 22, 2024** We have implemented an interesting painter that supports all text-to-image models. Now you can create stunning images using the painter, with assistance from AI! + - Use it in our [WebUI](#usage-in-webui). - **June 21, 2024.** 🔥🔥🔥 We propose ExVideo, a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames. - [Project Page](https://ecnu-cilab.github.io/ExVideoProjectPage/) @@ -90,27 +93,16 @@ pip install diffsynth The Python examples are in [`examples`](./examples/). We provide an overview here. -### Long Video Synthesis +### Video Synthesis + +#### Long Video Synthesis We trained an extended video synthesis model, which can generate 128 frames. [`examples/ExVideo`](./examples/ExVideo/) https://github.com/modelscope/DiffSynth-Studio/assets/35051019/d97f6aa9-8064-4b5b-9d49-ed6001bb9acc -### Image Synthesis -Generate high-resolution images, by breaking the limitation of diffusion models! [`examples/image_synthesis`](./examples/image_synthesis/). - -LoRA fine-tuning is supported in [`examples/train`](./examples/train/). - -|Model|Example| -|-|-| -|Stable Diffusion|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/6fc84611-8da6-4a1f-8fee-9a34eba3b4a5)| -|Stable Diffusion XL|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/67687748-e738-438c-aee5-96096f09ac90)| -|Stable Diffusion 3|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/4df346db-6f91-420a-b4c1-26e205376098)| -|Kolors|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/53ef6f41-da11-4701-8665-9f64392607bf)| -|Hunyuan-DiT|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/60b022c8-df3f-4541-95ab-bf39f2fa8bb5)| - -### Toon Shading +#### Toon Shading Render realistic videos in a flatten style and enable video editing features. [`examples/Diffutoon`](./examples/Diffutoon/) @@ -118,16 +110,50 @@ https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-47 https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/20528af5-5100-474a-8cdc-440b9efdd86c -### Video Stylization +#### Video Stylization Video stylization without video models. [`examples/diffsynth`](./examples/diffsynth/) https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/59fb2f7b-8de0-4481-b79f-0c3a7361a1ea +#### Image Synthesis + +Generate high-resolution images, by breaking the limitation of diffusion models! [`examples/image_synthesis`](./examples/image_synthesis/). + +LoRA fine-tuning is supported in [`examples/train`](./examples/train/). + +|FLUX|Stable Diffusion 3| +|-|-| +|![image_1024_cfg](https://github.com/user-attachments/assets/6af5b106-0673-4e58-9213-cd9157eef4c0)|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/4df346db-6f91-420a-b4c1-26e205376098)| + +|Kolors|Hunyuan-DiT| +|-|-| +|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/53ef6f41-da11-4701-8665-9f64392607bf)|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/60b022c8-df3f-4541-95ab-bf39f2fa8bb5)| + +|Stable Diffusion|Stable Diffusion XL| +|-|-| +|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/6fc84611-8da6-4a1f-8fee-9a34eba3b4a5)|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/67687748-e738-438c-aee5-96096f09ac90)| + ## Usage (in WebUI) +Create stunning images using the painter, with assistance from AI! + +https://github.com/user-attachments/assets/95265d21-cdd6-4125-a7cb-9fbcf6ceb7b0 + +**This video is not rendered in real-time.** + +* `Gradio` version + ``` -python -m streamlit run DiffSynth_Studio.py +python apps/gradio/DiffSynth_Studio.py +``` + +![20240822102002](https://github.com/user-attachments/assets/59613157-de51-4109-99b3-97cbffd88076) + +* `Streamlit` version + +``` +python -m streamlit run apps/streamlit/DiffSynth_Studio.py ``` https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/93085557-73f3-4eee-a205-9829591ef954 diff --git a/examples/image_synthesis/README.md b/examples/image_synthesis/README.md index 2c751d0..04ec0e7 100644 --- a/examples/image_synthesis/README.md +++ b/examples/image_synthesis/README.md @@ -6,9 +6,11 @@ Image synthesis is the base feature of DiffSynth Studio. We can generate images Example script: [`flux_text_to_image.py`](./flux_text_to_image.py) +The original version of FLUX doesn't support classifier-free guidance; however, we believe that this guidance mechanism is an important feature for synthesizing beautiful images. You can enable it using the parameter `cfg_scale`, and the extra guidance scale introduced by FLUX is `embedded_guidance`. + |1024*1024 (original)|1024*1024 (classifier-free guidance)|2048*2048 (highres-fix)| |-|-|-| -|![image_1024](https://github.com/user-attachments/assets/d8e66872-8739-43e4-8c2b-eda9daba0450)|![image_1024_cfg](https://github.com/user-attachments/assets/1073c70d-018f-47e4-9342-bc580b4c7c59)|![image_2048_highres](https://github.com/user-attachments/assets/8719c1a8-b341-48c1-a085-364c3a7d25f0)| +|![image_1024](https://github.com/user-attachments/assets/ce01327f-068f-45f5-aba9-0fa45eb26199)|![image_1024_cfg](https://github.com/user-attachments/assets/6af5b106-0673-4e58-9213-cd9157eef4c0)|![image_2048_highres](https://github.com/user-attachments/assets/a4bb776f-d9f0-4450-968c-c5d090a3ab4c)| ### Example: Stable Diffusion diff --git a/examples/image_synthesis/flux_text_to_image.py b/examples/image_synthesis/flux_text_to_image.py index 775c684..a2e5199 100644 --- a/examples/image_synthesis/flux_text_to_image.py +++ b/examples/image_synthesis/flux_text_to_image.py @@ -12,14 +12,14 @@ model_manager.load_models([ ]) pipe = FluxImagePipeline.from_model_manager(model_manager) -prompt = "A captivating fantasy magic woman portrait set in the deep sea. The woman, with blue spaghetti strap silk dress, swims in the sea. Her flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her. Smooth, delicate and fair skin." -negative_prompt = "worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, dim, fuzzy, depth of Field, nsfw," +prompt = "CG. Full body. A captivating fantasy magic woman portrait in the deep sea. The woman, with blue spaghetti strap silk dress, swims in the sea. Her flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her. Smooth, delicate and fair skin." +negative_prompt = "dark, worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, dim, fuzzy, depth of Field, nsfw," # Disable classifier-free guidance (consistent with the original implementation of FLUX.1) torch.manual_seed(6) image = pipe( prompt=prompt, - num_inference_steps=30, + num_inference_steps=30, embedded_guidance=3.5 ) image.save("image_1024.jpg") @@ -27,7 +27,7 @@ image.save("image_1024.jpg") torch.manual_seed(6) image = pipe( prompt=prompt, negative_prompt=negative_prompt, - num_inference_steps=30, cfg_scale=2.0 + num_inference_steps=30, cfg_scale=2.0, embedded_guidance=3.5 ) image.save("image_1024_cfg.jpg") @@ -35,7 +35,7 @@ image.save("image_1024_cfg.jpg") torch.manual_seed(7) image = pipe( prompt=prompt, - num_inference_steps=30, + num_inference_steps=30, embedded_guidance=3.5, input_image=image.resize((2048, 2048)), height=2048, width=2048, denoising_strength=0.6, tiled=True ) image.save("image_2048_highres.jpg") diff --git a/mask.jpg b/mask.jpg deleted file mode 100644 index b003d9a..0000000 Binary files a/mask.jpg and /dev/null differ