Files
DiffSynth-Studio/docs/source_en/creating/ToonShading.md
yrk111222 24b78148b8 Add files via upload
电脑更换,完成到D:\translate\DiffSynth-Studio\docs\source_en\finetune,该写第四个文档
2024-10-18 11:36:48 +08:00

3.3 KiB

When Image Models Meet AnimateDiff—Model Combination Technology

We have already witnessed the powerful image generation capabilities of the Stable Diffusion model and its ecosystem models. Now, we introduce a new module: AnimateDiff, which allows us to transfer the capabilities of image models to videos. In this article, we showcase an anime-style video rendering solution built on DiffSynth-Studio: Diffutoon.

Download Models

The following examples will use many models, so let's download them first.

  • An anime-style Stable Diffusion architecture model
  • Two ControlNet models
  • A Textual Inversion model
  • An AnimateDiff model
from diffsynth import download_models

download_models([
    "AingDiffusion_v12",
    "AnimateDiff_v2",
    "ControlNet_v11p_sd15_lineart",
    "ControlNet_v11f1e_sd15_tile",
    "TextualInversion_VeryBadImageNegative_v1.3"
])

Download Video

You can choose any video you like. We use this video as a demonstration. You can download this video file with the following command, but please note, do not use it for commercial purposes without obtaining the commercial copyright from the original video creator.

modelscope download --dataset Artiprocher/examples_in_diffsynth data/examples/diffutoon/input_video.mp4 --local_dir ./

Generate Anime

from diffsynth import ModelManager, SDVideoPipeline, ControlNetConfigUnit, VideoData, save_video
import torch

# Load models
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
model_manager.load_models([
    "models/stable_diffusion/aingdiffusion_v12.safetensors",
    "models/AnimateDiff/mm_sd_v15_v2.ckpt",
    "models/ControlNet/control_v11p_sd15_lineart.pth",
    "models/ControlNet/control_v11f1e_sd15_tile.pth",
])

# Build pipeline
pipe = SDVideoPipeline.from_model_manager(
    model_manager,
    [
        ControlNetConfigUnit(
            processor_id="tile",
            model_path="models/ControlNet/control_v11f1e_sd15_tile.pth",
            scale=0.5
        ),
        ControlNetConfigUnit(
            processor_id="lineart",
            model_path="models/ControlNet/control_v11p_sd15_lineart.pth",
            scale=0.5
        )
    ]
)
pipe.prompter.load_textual_inversions(["models/textual_inversion/verybadimagenegative_v1.3.pt"])

# Load video
video = VideoData(
    video_file="data/examples/diffutoon/input_video.mp4",
    height=1536, width=1536
)
input_video = [video[i] for i in range(30)]

# Generate
torch.manual_seed(0)
output_video = pipe(
    prompt="best quality, perfect anime illustration, light, a girl is dancing, smile, solo",
    negative_prompt="verybadimagenegative_v1.3",
    cfg_scale=7, clip_skip=2,
    input_frames=input_video, denoising_strength=1.0,
    controlnet_frames=input_video, num_frames=len(input_video),
    num_inference_steps=10, height=1536, width=1536,
    animatediff_batch_size=16, animatediff_stride=8,
)

# Save video
save_video(output_video, "output_video.mp4", fps=30)

Effect Display