diff --git a/docs/source/index.rst b/docs/source/index.rst index 51f2f80..69b7276 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -30,6 +30,23 @@ DiffSynth-Studio 文档 creating/ToonShading.md creating/PromptRefine.md +.. toctree:: + :maxdepth: 1 + :caption: 模型列表 + + model/StableDiffusion.md + model/StableDiffusionXL.md + model/ControlNet.md + model/AnimateDiff.md + model/IPAdapter.md + model/HunyuanDiT.md + model/Kolors.md + model/StableDiffusion3.md + model/StableVideoDiffusion.md + model/ExVideo.md + model/FLUX.md + model/CogVideo.md + .. toctree:: :maxdepth: 1 :caption: 微调 diff --git a/docs/source/model/AnimateDiff.md b/docs/source/model/AnimateDiff.md new file mode 100644 index 0000000..09a40a5 --- /dev/null +++ b/docs/source/model/AnimateDiff.md @@ -0,0 +1 @@ +# AnimateDiff \ No newline at end of file diff --git a/docs/source/model/CogVideo.md b/docs/source/model/CogVideo.md new file mode 100644 index 0000000..9e41820 --- /dev/null +++ b/docs/source/model/CogVideo.md @@ -0,0 +1 @@ +# CogVideoX \ No newline at end of file diff --git a/docs/source/model/ControlNet.md b/docs/source/model/ControlNet.md new file mode 100644 index 0000000..2ca22b8 --- /dev/null +++ b/docs/source/model/ControlNet.md @@ -0,0 +1 @@ +# ControlNet \ No newline at end of file diff --git a/docs/source/model/ExVideo.md b/docs/source/model/ExVideo.md new file mode 100644 index 0000000..a8d79af --- /dev/null +++ b/docs/source/model/ExVideo.md @@ -0,0 +1,97 @@ +# ExVideo + +## 相关链接 + +* 论文:[ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning](https://arxiv.org/abs/2406.14130) +* 模型 + * ExVideo-CogVideoX + * [HuggingFace](https://huggingface.co/ECNU-CILab/ExVideo-CogVideoX-LoRA-129f-v1) + * [ModelScope](https://modelscope.cn/models/ECNU-CILab/ExVideo-CogVideoX-LoRA-129f-v1) + * ExVideo-SVD + * [HuggingFace](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1) + * [ModelScope](https://modelscope.cn/models/ECNU-CILab/ExVideo-SVD-128f-v1) + +## 模型介绍 + +ExVideo 是一种视频生成模型的后训练(post-training)方法,旨在增强视频生成模型的能力,使其能够生成更长的视频。目前,ExVideo 已经发布了两个版本,分别将 Stable Video Diffusion 扩展到 128 帧、将 CogVideoX-5B 扩展到 129 帧。 + +在基于 Stable Video Diffusion 的 ExVideo 扩展模块中,静态的位置编码被替换为了可训练的参数矩阵,并在时序模块中添加了额外的单位卷积(Identidy 3D Convolution),在保留预训练模型本身能力的前提下,使其能够捕获更长时间尺度上的信息,从而生成更长视频。而在基于 CogVideoX-5B 的 ExVideo 扩展模块中,由于模型基础架构为 DiT,为保证计算效率,扩展模块采用 LoRA 的形式构建。 + +![](https://github.com/user-attachments/assets/94aa31ba-3ee3-4421-9713-83333a165660) + +为了在有限的计算资源上实现长视频的训练,ExVideo 做了很多工程优化,包括: + +* Parameter freezing:冻结除了扩展模块以外的所有参数 +* Mixed precision:扩展模块部分以全精度维护,其他部分以 BFloat16 精度维护 +* Gradient checkpointing:在前向传播时丢弃中间变量,并反向传播时重新计算 +* Flash attention:在所有注意力机制上启用加速过的注意力实现 +* Shard optimizer states and gradients:基于 DeepSpeed 把部分参数分拆到多个 GPU 上 + +Stable Video Diffusion + ExVideo 的生成效果: + + + +CogVideoX-5B + ExVideo 的生成效果: + + + +## 代码样例 + +ExVideo-SVD + +```python +from diffsynth import save_video, ModelManager, SVDVideoPipeline +import torch, requests +from PIL import Image + + +# Load models +model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", + model_id_list=["stable-video-diffusion-img2vid-xt", "ExVideo-SVD-128f-v1"]) +pipe = SVDVideoPipeline.from_model_manager(model_manager) + +# Generate a video +torch.manual_seed(0) +image = Image.open(requests.get("https://www.modelscope.cn/api/v1/studio/ECNU-CILab/ExVideo-SVD-128f-v1/repo?Revision=master&FilePath=images%2F0.png", stream=True).raw) +image.save("image.png") +video = pipe( + input_image=image.resize((512, 512)), + num_frames=128, fps=30, height=512, width=512, + motion_bucket_id=127, + num_inference_steps=50, + min_cfg_scale=2, max_cfg_scale=2, contrast_enhance_scale=1.2 +) +save_video(video, "video.mp4", fps=30) +``` + +ExVideo-CogVideoX + +```python +from diffsynth import ModelManager, CogVideoPipeline, save_video, download_models +import torch + + +download_models(["CogVideoX-5B", "ExVideo-CogVideoX-LoRA-129f-v1"]) +model_manager = ModelManager(torch_dtype=torch.bfloat16) +model_manager.load_models([ + "models/CogVideo/CogVideoX-5b/text_encoder", + "models/CogVideo/CogVideoX-5b/transformer", + "models/CogVideo/CogVideoX-5b/vae/diffusion_pytorch_model.safetensors", +]) +model_manager.load_lora("models/lora/ExVideo-CogVideoX-LoRA-129f-v1.safetensors") +pipe = CogVideoPipeline.from_model_manager(model_manager) + +torch.manual_seed(6) +video = pipe( + prompt="an astronaut riding a horse on Mars.", + height=480, width=720, num_frames=129, + cfg_scale=7.0, num_inference_steps=100, +) +save_video(video, "video_with_lora.mp4", fps=8, quality=5) +``` \ No newline at end of file diff --git a/docs/source/model/FLUX.md b/docs/source/model/FLUX.md new file mode 100644 index 0000000..1255682 --- /dev/null +++ b/docs/source/model/FLUX.md @@ -0,0 +1 @@ +# FLUX \ No newline at end of file diff --git a/docs/source/model/HunyuanDiT.md b/docs/source/model/HunyuanDiT.md new file mode 100644 index 0000000..4df2413 --- /dev/null +++ b/docs/source/model/HunyuanDiT.md @@ -0,0 +1 @@ +# Hunyuan-DiT \ No newline at end of file diff --git a/docs/source/model/IPAdapter.md b/docs/source/model/IPAdapter.md new file mode 100644 index 0000000..58e1d8b --- /dev/null +++ b/docs/source/model/IPAdapter.md @@ -0,0 +1 @@ +# IP-Adapter \ No newline at end of file diff --git a/docs/source/model/Kolors.md b/docs/source/model/Kolors.md new file mode 100644 index 0000000..e7fa7a9 --- /dev/null +++ b/docs/source/model/Kolors.md @@ -0,0 +1 @@ +# Kolors \ No newline at end of file diff --git a/docs/source/model/StableDiffusion.md b/docs/source/model/StableDiffusion.md new file mode 100644 index 0000000..460225c --- /dev/null +++ b/docs/source/model/StableDiffusion.md @@ -0,0 +1 @@ +# Stable Diffusion \ No newline at end of file diff --git a/docs/source/model/StableDiffusion3.md b/docs/source/model/StableDiffusion3.md new file mode 100644 index 0000000..e867b34 --- /dev/null +++ b/docs/source/model/StableDiffusion3.md @@ -0,0 +1 @@ +# Stable Diffusion 3 \ No newline at end of file diff --git a/docs/source/model/StableDiffusionXL.md b/docs/source/model/StableDiffusionXL.md new file mode 100644 index 0000000..1889325 --- /dev/null +++ b/docs/source/model/StableDiffusionXL.md @@ -0,0 +1 @@ +# Stable Diffusion XL \ No newline at end of file diff --git a/docs/source/model/StableVideoDiffusion.md b/docs/source/model/StableVideoDiffusion.md new file mode 100644 index 0000000..afe86c4 --- /dev/null +++ b/docs/source/model/StableVideoDiffusion.md @@ -0,0 +1 @@ +# Stable Video Diffusion \ No newline at end of file