mirror of
https://github.com/modelscope/DiffSynth-Studio.git
synced 2026-03-19 14:58:12 +00:00
3.3 KiB
3.3 KiB
When Image Models Meet AnimateDiff—Model Combination Technology
We have already witnessed the powerful image generation capabilities of the Stable Diffusion model and its ecosystem models. Now, we introduce a new module: AnimateDiff, which allows us to transfer the capabilities of image models to videos. In this article, we showcase an anime-style video rendering solution built on DiffSynth-Studio: Diffutoon.
Download Models
The following examples will use many models, so let's download them first.
- An anime-style Stable Diffusion architecture model
- Two ControlNet models
- A Textual Inversion model
- An AnimateDiff model
from diffsynth import download_models
download_models([
"AingDiffusion_v12",
"AnimateDiff_v2",
"ControlNet_v11p_sd15_lineart",
"ControlNet_v11f1e_sd15_tile",
"TextualInversion_VeryBadImageNegative_v1.3"
])
Download Video
You can choose any video you like. We use this video as a demonstration. You can download this video file with the following command, but please note, do not use it for commercial purposes without obtaining the commercial copyright from the original video creator.
modelscope download --dataset Artiprocher/examples_in_diffsynth data/examples/diffutoon/input_video.mp4 --local_dir ./
Generate Anime
from diffsynth import ModelManager, SDVideoPipeline, ControlNetConfigUnit, VideoData, save_video
import torch
# Load models
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda")
model_manager.load_models([
"models/stable_diffusion/aingdiffusion_v12.safetensors",
"models/AnimateDiff/mm_sd_v15_v2.ckpt",
"models/ControlNet/control_v11p_sd15_lineart.pth",
"models/ControlNet/control_v11f1e_sd15_tile.pth",
])
# Build pipeline
pipe = SDVideoPipeline.from_model_manager(
model_manager,
[
ControlNetConfigUnit(
processor_id="tile",
model_path="models/ControlNet/control_v11f1e_sd15_tile.pth",
scale=0.5
),
ControlNetConfigUnit(
processor_id="lineart",
model_path="models/ControlNet/control_v11p_sd15_lineart.pth",
scale=0.5
)
]
)
pipe.prompter.load_textual_inversions(["models/textual_inversion/verybadimagenegative_v1.3.pt"])
# Load video
video = VideoData(
video_file="data/examples/diffutoon/input_video.mp4",
height=1536, width=1536
)
input_video = [video[i] for i in range(30)]
# Generate
torch.manual_seed(0)
output_video = pipe(
prompt="best quality, perfect anime illustration, light, a girl is dancing, smile, solo",
negative_prompt="verybadimagenegative_v1.3",
cfg_scale=7, clip_skip=2,
input_frames=input_video, denoising_strength=1.0,
controlnet_frames=input_video, num_frames=len(input_video),
num_inference_steps=10, height=1536, width=1536,
animatediff_batch_size=16, animatediff_stride=8,
)
# Save video
save_video(output_video, "output_video.mp4", fps=30)