Add files via upload

再改一次
2026-03-19 14:58:12 +00:00 · 2024-10-22 09:56:03 +08:00
parent 157ba2e426
commit f6e676cdf9
46 changed files with 2525 additions and 0 deletions
--- a/docs/source_en/tutorial/ASimpleExample.md
+++ b/docs/source_en/tutorial/ASimpleExample.md
@@ -0,0 +1,85 @@
+# Quick Start
+
+In this document, we introduce how to quickly get started with DiffSynth-Studio for creation through a piece of code.
+
+## Installation
+
+Use the following command to clone and install DiffSynth-Studio from GitHub. For more information, please refer to [Installation](./Installation.md).
+
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+
+## One-click Run!
+
+By running the following code, we will download the model, load the model, and generate an image.
+
+```python
+import torch
+from diffsynth import ModelManager, FluxImagePipeline
+
+model_manager = ModelManager(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_id_list=["FLUX.1-dev"]
+)
+pipe = FluxImagePipeline.from_model_manager(model_manager)
+
+torch.manual_seed(0)
+image = pipe(
+    prompt="In a forest, a wooden plank sign reading DiffSynth",
+    height=576, width=1024,
+)
+image.save("image.jpg")
+```
+
+![image](https://github.com/user-attachments/assets/15a52a2b-2f18-46fe-810c-cb3ad2853919)
+
+From this example, we can see that there are two key modules in DiffSynth: `ModelManager` and `Pipeline`. We will introduce them in detail next.
+
+## Downloading and Loading Models
+
+`ModelManager` is responsible for downloading and loading models, which can be done in one step with the following code.
+
+```python
+import torch
+from diffsynth import ModelManager
+
+model_manager = ModelManager(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_id_list=["FLUX.1-dev"]
+)
+```
+
+Of course, we also support completing this step by step, and the following code is equivalent to the above.
+
+```python
+import torch
+from diffsynth import download_models, ModelManager
+
+download_models(["FLUX.1-dev"])
+model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda")
+model_manager.load_models([
+    "models/FLUX/FLUX.1-dev/text_encoder/model.safetensors",
+    "models/FLUX/FLUX.1-dev/text_encoder_2",
+    "models/FLUX/FLUX.1-dev/ae.safetensors",
+    "models/FLUX/FLUX.1-dev/flux1-dev.safetensors"
+])
+```
+
+When downloading models, we support downloading from [ModelScope](https://www.modelscope.cn/) and [HuggingFace](https://huggingface.co/), and we also support downloading non-preset models. For more information about model downloading, please refer to [Model Download](./DownloadModels.md).
+
+When loading models, you can put all the model paths you want to load into it. For model weight files in formats such as `.safetensors`, `ModelManager` will automatically determine the model type after loading; for folder format models, `ModelManager` will try to parse the `config.json` file within and try to call the corresponding module in third-party libraries such as `transformers`. For models supported by DiffSynth-Studio, please refer to [Supported Models](./Models.md).
+
+## Building Pipeline
+
+DiffSynth-Studio provides multiple inference `Pipeline`s, which can be directly obtained through `ModelManager` to get the required models and initialize. For example, the text-to-image `Pipeline` for the FLUX.1-dev model can be constructed as follows:
+
+```python
+pipe = FluxImagePipeline.from_model_manager(model_manager)
+```
+
+For more `Pipeline`s used for image generation and video generation, see [Inference Pipelines](./Pipelines.md).
--- a/docs/source_en/tutorial/DownloadModels.md
+++ b/docs/source_en/tutorial/DownloadModels.md
@@ -0,0 +1,34 @@
+# Download Models
+
+We have preset some mainstream Diffusion model download links in DiffSynth-Studio, which you can download and use.
+
+## Download Preset Models
+
+You can directly use the `download_models` function to download the preset model files, where the model ID can refer to the [config file](/diffsynth/configs/model_config.py).
+
+```python
+from diffsynth import download_models
+
+download_models(["FLUX.1-dev"])
+```
+
+For VSCode users, after activating Pylance or other Python language services, typing `""` in the code will display all supported model IDs.
+
+![image](https://github.com/user-attachments/assets/2bbfec32-e015-45a7-98d9-57af13200b7c)
+
+## Download Non-Preset Models
+
+You can select models from two download sources: [ModelScope](https://modelscope.cn/models) and [HuggingFace](https://huggingface.co/models). Of course, you can also manually download the models you need through browsers or other tools.
+
+```python
+from diffsynth import download_customized_models
+
+download_customized_models(
+    model_id="Kwai-Kolors/Kolors",
+    origin_file_path="vae/diffusion_pytorch_model.fp16.bin",
+    local_dir="models/kolors/Kolors/vae",
+    downloading_priority=["ModelScope", "HuggingFace"]
+)
+```
+
+In this code snippet, we will prioritize downloading from `ModelScope` according to the download priority, and download the file `vae/diffusion_pytorch_model.fp16.bin` from the model repository with ID `Kwai-Kolors/Kolors` in the [model library](https://modelscope.cn/models/Kwai-Kolors/Kolors) to the local path `models/kolors/Kolors/vae`.
--- a/docs/source_en/tutorial/Extensions.md
+++ b/docs/source_en/tutorial/Extensions.md
@@ -0,0 +1,49 @@
+# Extension Features
+
+This document introduces some technologies related to the Diffusion models implemented in DiffSynth, which have significant application potential in image and video processing.
+
+- **[RIFE](https://github.com/hzwer/ECCV2022-RIFE)**: RIFE is a frame interpolation method based on real-time intermediate flow estimation. It uses a model with an IFNet structure that can quickly estimate intermediate flows end-to-end. RIFE does not rely on pre-trained optical flow models and supports frame interpolation at arbitrary time steps, processing through time-encoded inputs.
+
+    In this code snippet, we use the RIFE model to double the frame rate of a video.
+
+    ```python
+    from diffsynth import VideoData, ModelManager, save_video
+    from diffsynth.extensions.RIFE import RIFEInterpolater
+
+    model_manager = ModelManager(model_id_list=["RIFE"])
+    rife = RIFEInterpolater.from_model_manager(model_manager)
+    video = VideoData("input_video.mp4", height=512, width=768).raw_data()
+    video = rife.interpolate(video)
+    save_video(video, "output_video.mp4", fps=60)
+    ```
+
+- **[ESRGAN](https://github.com/xinntao/ESRGAN)**: ESRGAN is an image super-resolution model that can achieve a fourfold increase in resolution. This method significantly enhances the realism of generated images by optimizing network architecture, adversarial loss, and perceptual loss.
+
+    In this code snippet, we use the ESRGAN model to quadruple the resolution of an image.
+
+    ```python
+    from PIL import Image
+    from diffsynth import ModelManager
+    from diffsynth.extensions.ESRGAN import ESRGAN
+
+    model_manager = ModelManager(model_id_list=["ESRGAN_x4"])
+    esrgan = ESRGAN.from_model_manager(model_manager)
+    image = Image.open("input_image.jpg")
+    image = esrgan.upscale(image)
+    image.save("output_image.jpg")
+    ```
+
+- **[FastBlend](https://arxiv.org/abs/2311.09265)**: FastBlend is a model-free video de-flickering algorithm. Flicker often occurs in style videos processed frame by frame using image generation models. FastBlend can eliminate flicker in style videos based on the motion features in the original video (guide video).
+
+    In this code snippet, we use FastBlend to remove the flicker effect from a style video.
+
+    ```python
+    from diffsynth import VideoData, save_video
+    from diffsynth.extensions.FastBlend import FastBlendSmoother
+
+    fastblend = FastBlendSmoother()
+    guide_video = VideoData("guide_video.mp4", height=512, width=768).raw_data()
+    style_video = VideoData("style_video.mp4", height=512, width=768).raw_data()
+    output_video = fastblend(style_video, original_frames=guide_video)
+    save_video(output_video, "output_video.mp4", fps=30)
+    ```
--- a/docs/source_en/tutorial/Installation.md
+++ b/docs/source_en/tutorial/Installation.md
@@ -0,0 +1,26 @@
+# Installation
+
+Currently, DiffSynth-Studio supports installation via cloning from GitHub or using pip. We recommend users to clone from GitHub to experience the latest features.
+
+## From Source
+
+1. Clone the source repository:
+
+    ```bash
+    git clone https://github.com/modelscope/DiffSynth-Studio.git
+    ```
+
+2. Navigate to the project directory and install:
+
+    ```bash
+    cd DiffSynth-Studio
+    pip install -e .
+    ```
+
+## From PyPI
+
+Install directly via PyPI:
+
+```bash
+pip install diffsynth
+```
--- a/docs/source_en/tutorial/Models.md
+++ b/docs/source_en/tutorial/Models.md
@@ -0,0 +1,18 @@
+# 模型
+
+So far, the models supported by DiffSynth Studio are as follows:
+
+* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)
+* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)
+* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)
+* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
+* [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
+* [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)
+* [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)
+* [RIFE](https://github.com/hzwer/ECCV2022-RIFE)
+* [ESRGAN](https://github.com/xinntao/ESRGAN)
+* [Ip-Adapter](https://github.com/tencent-ailab/IP-Adapter)
+* [AnimateDiff](https://github.com/guoyww/animatediff/)
+* [ControlNet](https://github.com/lllyasviel/ControlNet)
+* [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
+* [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)
--- a/docs/source_en/tutorial/Pipelines.md
+++ b/docs/source_en/tutorial/Pipelines.md
@@ -0,0 +1,22 @@
+# Pipelines
+
+DiffSynth-Studio includes multiple pipelines, categorized into two types: image generation and video generation.
+
+## Image Pipelines
+
+| Pipeline                   | Models                                                     |
+|----------------------------|----------------------------------------------------------------|
+| SDImagePipeline             | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter |
+| SDXLImagePipeline           | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter |
+| SD3ImagePipeline            | text_encoder_1: SD3TextEncoder1<br>text_encoder_2: SD3TextEncoder2<br>text_encoder_3: SD3TextEncoder3<br>dit: SD3DiT<br>vae_decoder: SD3VAEDecoder<br>vae_encoder: SD3VAEEncoder |
+| HunyuanDiTImagePipeline     | text_encoder: HunyuanDiTCLIPTextEncoder<br>text_encoder_t5: HunyuanDiTT5TextEncoder<br>dit: HunyuanDiT<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder |
+| FluxImagePipeline     | text_encoder_1: FluxTextEncoder1<br>text_encoder_2: FluxTextEncoder2<br>dit: FluxDiT<br>vae_decoder: FluxVAEDecoder<br>vae_encoder: FluxVAEEncoder |
+
+## Video Pipelines
+
+| Pipeline                   | Models                                                     |
+|----------------------------|----------------------------------------------------------------|
+| SDVideoPipeline            | text_encoder: SDTextEncoder<br>unet: SDUNet<br>vae_decoder: SDVAEDecoder<br>vae_encoder: SDVAEEncoder<br>controlnet: MultiControlNetManager<br>ipadapter_image_encoder: IpAdapterCLIPImageEmbedder<br>ipadapter: SDIpAdapter<br>motion_modules: SDMotionModel |
+| SDXLVideoPipeline          | text_encoder: SDXLTextEncoder<br>text_encoder_2: SDXLTextEncoder2<br>text_encoder_kolors: ChatGLMModel<br>unet: SDXLUNet<br>vae_decoder: SDXLVAEDecoder<br>vae_encoder: SDXLVAEEncoder<br>ipadapter_image_encoder: IpAdapterXLCLIPImageEmbedder<br>ipadapter: SDXLIpAdapter<br>motion_modules: SDXLMotionModel |
+| SVDVideoPipeline           | image_encoder: SVDImageEncoder<br>unet: SVDUNet<br>vae_encoder: SVDVAEEncoder<br>vae_decoder: SVDVAEDecoder |
+| CogVideoPipeline           | text_encoder: FluxTextEncoder2<br>dit: CogDiT<br>vae_encoder: CogVAEEncoder<br>vae_decoder: CogVAEDecoder |
--- a/docs/source_en/tutorial/PromptProcessing.md
+++ b/docs/source_en/tutorial/PromptProcessing.md
@@ -0,0 +1,35 @@
+# Prompt Processing
+
+DiffSynth includes prompt processing functionality, which is divided into:
+
+- **Prompt Refiners (`prompt_refiner_classes`)**: Includes prompt refinement, prompt translation from Chinese to English, and both refinement and translation of prompts. Available parameters are as follows:
+
+    - **English Prompt Refinement**: 'BeautifulPrompt', using the model [pai-bloom-1b1-text2prompt-sd](https://modelscope.cn/models/AI-ModelScope/pai-bloom-1b1-text2prompt-sd).
+
+    - **Prompt Translation from Chinese to English**: 'Translator', using the model [opus-mt-zh-e](https://modelscope.cn/models/moxying/opus-mt-zh-en).
+
+    - **Prompt Translation and Refinement**: 'QwenPrompt', using the model [Qwen2-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct).
+
+- **Prompt Extenders (`prompt_extender_classes`)**: Based on Omost's prompt partition control expansion. Available parameter is:
+
+    - **Prompt Partition Expansion**: 'OmostPromter'.
+
+## Usage Instructions
+
+### Prompt Refiners
+
+When loading the model pipeline, you can specify the desired prompt refiner functionality using the `prompt_refiner_classes` parameter. For example code, refer to [sd_prompt_refining.py](examples/image_synthesis/sd_prompt_refining.py).
+
+Available `prompt_refiner_classes` parameters include: Translator, BeautifulPrompt, QwenPrompt.
+
+```python
+pipe = SDXLImagePipeline.from_model_manager(model_manager, prompt_refiner_classes=[Translator, BeautifulPrompt])
+```
+
+### Prompt Extenders
+
+When loading the model pipeline, you can specify the desired prompt extender using the `prompt_extender_classes` parameter. For example code, refer to [omost_flux_text_to_image.py](examples/image_synthesis/omost_flux_text_to_image.py).
+
+```python
+pipe = FluxImagePipeline.from_model_manager(model_manager, prompt_extender_classes=[OmostPromter])
+```
--- a/docs/source_en/tutorial/Schedulers.md
+++ b/docs/source_en/tutorial/Schedulers.md
@@ -0,0 +1,11 @@
+# Schedulers
+
+Schedulers control the entire denoising (or sampling) process of the model. When loading the Pipeline, DiffSynth automatically selects the most suitable schedulers for the current Pipeline, **requiring no additional configuration**.
+
+The supported schedulers are:
+
+- **EnhancedDDIMScheduler**: Extends the denoising process introduced in the Denoising Diffusion Probabilistic Models (DDPM) with non-Markovian guidance.
+
+- **FlowMatchScheduler**: Implements the flow matching sampling method introduced in [Stable Diffusion 3](https://arxiv.org/abs/2403.03206).
+
+- **ContinuousODEScheduler**: A scheduler based on Ordinary Differential Equations (ODE).