mirror of https://github.com/modelscope/DiffSynth-Studio.git synced 2026-03-19 14:58:12 +00:00

Files

yrk111222 f6e676cdf9 Add files via upload

再改一次

2024-10-22 09:56:03 +08:00

3.3 KiB

Raw Blame History

When Image Models Meet AnimateDiff—Model Combination Technology

We have already witnessed the powerful image generation capabilities of the Stable Diffusion model and its ecosystem models. Now, we introduce a new module: AnimateDiff, which allows us to transfer the capabilities of image models to videos. In this article, we showcase an anime-style video rendering solution built on DiffSynth-Studio: Diffutoon.

Download Models

The following examples will use many models, so let's download them first.

An anime-style Stable Diffusion architecture model
Two ControlNet models
A Textual Inversion model
An AnimateDiff model

from diffsynth import download_models

download_models([
    "AingDiffusion_v12",
    "AnimateDiff_v2",
    "ControlNet_v11p_sd15_lineart",
    "ControlNet_v11f1e_sd15_tile",
    "TextualInversion_VeryBadImageNegative_v1.3"
])

Download Video

You can choose any video you like. We use this video as a demonstration. You can download this video file with the following command, but please note, do not use it for commercial purposes without obtaining the commercial copyright from the original video creator.

modelscope download --dataset Artiprocher/examples_in_diffsynth data/examples/diffutoon/input_video.mp4 --local_dir ./

Generate Anime

from diffsynth import ModelManager, SDVideoPipeline, ControlNetConfigUnit, VideoData, save_video
import torch

# Load models
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
model_manager.load_models([
    "models/stable_diffusion/aingdiffusion_v12.safetensors",
    "models/AnimateDiff/mm_sd_v15_v2.ckpt",
    "models/ControlNet/control_v11p_sd15_lineart.pth",
    "models/ControlNet/control_v11f1e_sd15_tile.pth",
])

# Build pipeline
pipe = SDVideoPipeline.from_model_manager(
    model_manager,
    [
        ControlNetConfigUnit(
            processor_id="tile",
            model_path="models/ControlNet/control_v11f1e_sd15_tile.pth",
            scale=0.5
        ),
        ControlNetConfigUnit(
            processor_id="lineart",
            model_path="models/ControlNet/control_v11p_sd15_lineart.pth",
            scale=0.5
        )
    ]
)
pipe.prompter.load_textual_inversions(["models/textual_inversion/verybadimagenegative_v1.3.pt"])

# Load video
video = VideoData(
    video_file="data/examples/diffutoon/input_video.mp4",
    height=1536, width=1536
)
input_video = [video[i] for i in range(30)]

# Generate
torch.manual_seed(0)
output_video = pipe(
    prompt="best quality, perfect anime illustration, light, a girl is dancing, smile, solo",
    negative_prompt="verybadimagenegative_v1.3",
    cfg_scale=7, clip_skip=2,
    input_frames=input_video, denoising_strength=1.0,
    controlnet_frames=input_video, num_frames=len(input_video),
    num_inference_steps=10, height=1536, width=1536,
    animatediff_batch_size=16, animatediff_stride=8,
)

# Save video
save_video(output_video, "output_video.mp4", fps=30)

3.3 KiB Raw Blame History

When Image Models Meet AnimateDiff—Model Combination Technology

Download Models

Download Video

Generate Anime

Effect Display

3.3 KiB

Raw Blame History