Add files via upload

再改一次
This commit is contained in:
yrk111222
2024-10-22 09:56:03 +08:00
committed by GitHub
parent 157ba2e426
commit f6e676cdf9
46 changed files with 2525 additions and 0 deletions

View File

@@ -0,0 +1,85 @@
# Quick Start
In this document, we introduce how to quickly get started with DiffSynth-Studio for creation through a piece of code.
## Installation
Use the following command to clone and install DiffSynth-Studio from GitHub. For more information, please refer to [Installation](./Installation.md).
```shell
git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .
```
## One-click Run!
By running the following code, we will download the model, load the model, and generate an image.
```python
import torch
from diffsynth import ModelManager, FluxImagePipeline
model_manager = ModelManager(
torch_dtype=torch.bfloat16,
device="cuda",
model_id_list=["FLUX.1-dev"]
)
pipe = FluxImagePipeline.from_model_manager(model_manager)
torch.manual_seed(0)
image = pipe(
prompt="In a forest, a wooden plank sign reading DiffSynth",
height=576, width=1024,
)
image.save("image.jpg")
```
![image](https://github.com/user-attachments/assets/15a52a2b-2f18-46fe-810c-cb3ad2853919)
From this example, we can see that there are two key modules in DiffSynth: `ModelManager` and `Pipeline`. We will introduce them in detail next.
## Downloading and Loading Models
`ModelManager` is responsible for downloading and loading models, which can be done in one step with the following code.
```python
import torch
from diffsynth import ModelManager
model_manager = ModelManager(
torch_dtype=torch.bfloat16,
device="cuda",
model_id_list=["FLUX.1-dev"]
)
```
Of course, we also support completing this step by step, and the following code is equivalent to the above.
```python
import torch
from diffsynth import download_models, ModelManager
download_models(["FLUX.1-dev"])
model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
model_manager.load_models([
"models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
"models/FLUX/FLUX.1-dev/text_encoder_2",
"models/FLUX/FLUX.1-dev/ae.safetensors",
"models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
])
```
When downloading models, we support downloading from [ModelScope](https://www.modelscope.cn/) and [HuggingFace](https://huggingface.co/), and we also support downloading non-preset models. For more information about model downloading, please refer to [Model Download](./DownloadModels.md).
When loading models, you can put all the model paths you want to load into it. For model weight files in formats such as `.safetensors`, `ModelManager` will automatically determine the model type after loading; for folder format models, `ModelManager` will try to parse the `config.json` file within and try to call the corresponding module in third-party libraries such as `transformers`. For models supported by DiffSynth-Studio, please refer to [Supported Models](./Models.md).
## Building Pipeline
DiffSynth-Studio provides multiple inference `Pipeline`s, which can be directly obtained through `ModelManager` to get the required models and initialize. For example, the text-to-image `Pipeline` for the FLUX.1-dev model can be constructed as follows:
```python
pipe = FluxImagePipeline.from_model_manager(model_manager)
```
For more `Pipeline`s used for image generation and video generation, see [Inference Pipelines](./Pipelines.md).

View File

@@ -0,0 +1,34 @@
# Download Models
We have preset some mainstream Diffusion model download links in DiffSynth-Studio, which you can download and use.
## Download Preset Models
You can directly use the `download_models` function to download the preset model files, where the model ID can refer to the [config file](/diffsynth/configs/model_config.py).
```python
from diffsynth import download_models
download_models(["FLUX.1-dev"])
```
For VSCode users, after activating Pylance or other Python language services, typing `""` in the code will display all supported model IDs.
![image](https://github.com/user-attachments/assets/2bbfec32-e015-45a7-98d9-57af13200b7c)
## Download Non-Preset Models
You can select models from two download sources: [ModelScope](https://modelscope.cn/models) and [HuggingFace](https://huggingface.co/models). Of course, you can also manually download the models you need through browsers or other tools.
```python
from diffsynth import download_customized_models
download_customized_models(
model_id="Kwai-Kolors/Kolors",
origin_file_path="vae/diffusion_pytorch_model.fp16.bin",
local_dir="models/kolors/Kolors/vae",
downloading_priority=["ModelScope", "HuggingFace"]
)
```
In this code snippet, we will prioritize downloading from `ModelScope` according to the download priority, and download the file `vae/diffusion_pytorch_model.fp16.bin` from the model repository with ID `Kwai-Kolors/Kolors` in the [model library](https://modelscope.cn/models/Kwai-Kolors/Kolors) to the local path `models/kolors/Kolors/vae`.

View File

@@ -0,0 +1,49 @@
# Extension Features
This document introduces some technologies related to the Diffusion models implemented in DiffSynth, which have significant application potential in image and video processing.
- **[RIFE](https://github.com/hzwer/ECCV2022-RIFE)**: RIFE is a frame interpolation method based on real-time intermediate flow estimation. It uses a model with an IFNet structure that can quickly estimate intermediate flows end-to-end. RIFE does not rely on pre-trained optical flow models and supports frame interpolation at arbitrary time steps, processing through time-encoded inputs.
In this code snippet, we use the RIFE model to double the frame rate of a video.
```python
from diffsynth import VideoData, ModelManager, save_video
from diffsynth.extensions.RIFE import RIFEInterpolater
model_manager = ModelManager(model_id_list=["RIFE"])
rife = RIFEInterpolater.from_model_manager(model_manager)
video = VideoData("input_video.mp4", height=512, width=768).raw_data()
video = rife.interpolate(video)
save_video(video, "output_video.mp4", fps=60)
```
- **[ESRGAN](https://github.com/xinntao/ESRGAN)**: ESRGAN is an image super-resolution model that can achieve a fourfold increase in resolution. This method significantly enhances the realism of generated images by optimizing network architecture, adversarial loss, and perceptual loss.
In this code snippet, we use the ESRGAN model to quadruple the resolution of an image.
```python
from PIL import Image
from diffsynth import ModelManager
from diffsynth.extensions.ESRGAN import ESRGAN
model_manager = ModelManager(model_id_list=["ESRGAN_x4"])
esrgan = ESRGAN.from_model_manager(model_manager)
image = Image.open("input_image.jpg")
image = esrgan.upscale(image)
image.save("output_image.jpg")
```
- **[FastBlend](https://arxiv.org/abs/2311.09265)**: FastBlend is a model-free video de-flickering algorithm. Flicker often occurs in style videos processed frame by frame using image generation models. FastBlend can eliminate flicker in style videos based on the motion features in the original video (guide video).
In this code snippet, we use FastBlend to remove the flicker effect from a style video.
```python
from diffsynth import VideoData, save_video
from diffsynth.extensions.FastBlend import FastBlendSmoother
fastblend = FastBlendSmoother()
guide_video = VideoData("guide_video.mp4", height=512, width=768).raw_data()
style_video = VideoData("style_video.mp4", height=512, width=768).raw_data()
output_video = fastblend(style_video, original_frames=guide_video)
save_video(output_video, "output_video.mp4", fps=30)
```

View File

@@ -0,0 +1,26 @@
# Installation
Currently, DiffSynth-Studio supports installation via cloning from GitHub or using pip. We recommend users to clone from GitHub to experience the latest features.
## From Source
1. Clone the source repository:
```bash
git clone https://github.com/modelscope/DiffSynth-Studio.git
```
2. Navigate to the project directory and install:
```bash
cd DiffSynth-Studio
pip install -e .
```
## From PyPI
Install directly via PyPI:
```bash
pip install diffsynth
```

View File

@@ -0,0 +1,18 @@
# 模型
So far, the models supported by DiffSynth Studio are as follows:
* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)
* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
* [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
* [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)
* [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)
* [RIFE](https://github.com/hzwer/ECCV2022-RIFE)
* [ESRGAN](https://github.com/xinntao/ESRGAN)
* [Ip-Adapter](https://github.com/tencent-ailab/IP-Adapter)
* [AnimateDiff](https://github.com/guoyww/animatediff/)
* [ControlNet](https://github.com/lllyasviel/ControlNet)
* [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
* [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)

View File

@@ -0,0 +1,22 @@
# Pipelines
DiffSynth-Studio includes multiple pipelines, categorized into two types: image generation and video generation.
## Image Pipelines
| Pipeline | Models |
|----------------------------|----------------------------------------------------------------|
| SDImagePipeline | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter |
| SDXLImagePipeline | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter |
| SD3ImagePipeline | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
| HunyuanDiTImagePipeline | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
| FluxImagePipeline | text_encoder_1: FluxTextEncoder1<br>text_encoder_2: FluxTextEncoder2<br>dit: FluxDiT<br>vae_decoder: FluxVAEDecoder<br>vae_encoder: FluxVAEEncoder |
## Video Pipelines
| Pipeline | Models |
|----------------------------|----------------------------------------------------------------|
| SDVideoPipeline | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter<br>motion_modules: SDMotionModel |
| SDXLVideoPipeline | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter<br>motion_modules: SDXLMotionModel |
| SVDVideoPipeline | image_encoder: SVDImageEncoder<br>unet: SVDUNet<br>vae_encoder: SVDVAEEncoder<br>vae_decoder: SVDVAEDecoder |
| CogVideoPipeline | text_encoder: FluxTextEncoder2<br>dit: CogDiT<br>vae_encoder: CogVAEEncoder<br>vae_decoder: CogVAEDecoder |

View File

@@ -0,0 +1,35 @@
# Prompt Processing
DiffSynth includes prompt processing functionality, which is divided into:
- **Prompt Refiners (`prompt_refiner_classes`)**: Includes prompt refinement, prompt translation from Chinese to English, and both refinement and translation of prompts. Available parameters are as follows:
- **English Prompt Refinement**: 'BeautifulPrompt', using the model [pai-bloom-1b1-text2prompt-sd](https://modelscope.cn/models/AI-ModelScope/pai-bloom-1b1-text2prompt-sd).
- **Prompt Translation from Chinese to English**: 'Translator', using the model [opus-mt-zh-e](https://modelscope.cn/models/moxying/opus-mt-zh-en).
- **Prompt Translation and Refinement**: 'QwenPrompt', using the model [Qwen2-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct).
- **Prompt Extenders (`prompt_extender_classes`)**: Based on Omost's prompt partition control expansion. Available parameter is:
- **Prompt Partition Expansion**: 'OmostPromter'.
## Usage Instructions
### Prompt Refiners
When loading the model pipeline, you can specify the desired prompt refiner functionality using the `prompt_refiner_classes` parameter. For example code, refer to [sd_prompt_refining.py](examples/image_synthesis/sd_prompt_refining.py).
Available `prompt_refiner_classes` parameters include: Translator, BeautifulPrompt, QwenPrompt.
```python
pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[Translator, BeautifulPrompt])
```
### Prompt Extenders
When loading the model pipeline, you can specify the desired prompt extender using the `prompt_extender_classes` parameter. For example code, refer to [omost_flux_text_to_image.py](examples/image_synthesis/omost_flux_text_to_image.py).
```python
pipe = FluxImagePipeline.from_model_manager(model_manager, prompt_extender_classes=[OmostPromter])
```

View File

@@ -0,0 +1,11 @@
# Schedulers
Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, **requiring no additional configuration**.
The supported schedulers are:
- **EnhancedDDIMScheduler**: Extends the denoising process introduced in the Denoising Diffusion Probabilistic Models (DDPM) with non-Markovian guidance.
- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in [Stable Diffusion 3](https://arxiv.org/abs/2403.03206).
- **ContinuousODEScheduler**: A scheduler based on Ordinary Differential Equations (ODE).