mirror of
https://github.com/modelscope/DiffSynth-Studio.git
synced 2026-03-18 22:08:13 +00:00
139 lines
7.6 KiB
Markdown
139 lines
7.6 KiB
Markdown
# Anima
|
|
|
|
Anima is an image generation model trained and open-sourced by CircleStone Labs and Comfy Org.
|
|
|
|
## Installation
|
|
|
|
Before using this project for model inference and training, please install DiffSynth-Studio first.
|
|
|
|
```shell
|
|
git clone https://github.com/modelscope/DiffSynth-Studio.git
|
|
cd DiffSynth-Studio
|
|
pip install -e .
|
|
```
|
|
|
|
For more installation information, please refer to [Install Dependencies](../Pipeline_Usage/Setup.md).
|
|
|
|
## Quick Start
|
|
|
|
The following code demonstrates how to quickly load the [circlestone-labs/Anima](https://www.modelscope.cn/models/circlestone-labs/Anima) model for inference. VRAM management is enabled by default, allowing the framework to automatically control model parameter loading based on available VRAM. Minimum 8GB VRAM required.
|
|
|
|
```python
|
|
from diffsynth.pipelines.anima_image import AnimaImagePipeline, ModelConfig
|
|
import torch
|
|
|
|
vram_config = {
|
|
"offload_dtype": "disk",
|
|
"offload_device": "disk",
|
|
"onload_dtype": "disk",
|
|
"onload_device": "disk",
|
|
"preparing_dtype": torch.bfloat16,
|
|
"preparing_device": "cuda",
|
|
"computation_dtype": torch.bfloat16,
|
|
"computation_device": "cuda",
|
|
}
|
|
pipe = AnimaImagePipeline.from_pretrained(
|
|
torch_dtype=torch.bfloat16,
|
|
device="cuda",
|
|
model_configs=[
|
|
ModelConfig(model_id="circlestone-labs/Anima", origin_file_pattern="split_files/diffusion_models/anima-preview.safetensors", **vram_config),
|
|
ModelConfig(model_id="circlestone-labs/Anima", origin_file_pattern="split_files/text_encoders/qwen_3_06b_base.safetensors", **vram_config),
|
|
ModelConfig(model_id="circlestone-labs/Anima", origin_file_pattern="split_files/vae/qwen_image_vae.safetensors", **vram_config),
|
|
],
|
|
tokenizer_config=ModelConfig(model_id="Qwen/Qwen3-0.6B", origin_file_pattern="./"),
|
|
tokenizer_t5xxl_config=ModelConfig(model_id="stabilityai/stable-diffusion-3.5-large", origin_file_pattern="tokenizer_3/"),
|
|
vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
|
|
)
|
|
prompt = "Masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait."
|
|
negative_prompt = "worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw,"
|
|
image = pipe(prompt, seed=0, num_inference_steps=50)
|
|
image.save("image.jpg")
|
|
```
|
|
|
|
## Model Overview
|
|
|
|
|Model ID|Inference|Low VRAM Inference|Full Training|Validation after Full Training|LoRA Training|Validation after LoRA Training|
|
|
|-|-|-|-|-|-|-|
|
|
|[circlestone-labs/Anima](https://www.modelscope.cn/models/circlestone-labs/Anima)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_inference/anima-preview.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_inference_low_vram/anima-preview.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_training/full/anima-preview.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_training/validate_full/anima-preview.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_training/lora/anima-preview.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_training/validate_lora/anima-preview.py)|
|
|
|
|
Special training scripts:
|
|
|
|
* Differential LoRA Training: [doc](../Training/Differential_LoRA.md)
|
|
* FP8 Precision Training: [doc](../Training/FP8_Precision.md)
|
|
* Two-Stage Split Training: [doc](../Training/Split_Training.md)
|
|
* End-to-End Direct Distillation: [doc](../Training/Direct_Distill.md)
|
|
|
|
## Model Inference
|
|
|
|
Models are loaded through `AnimaImagePipeline.from_pretrained`, see [Model Inference](../Pipeline_Usage/Model_Inference.md#loading-models) for details.
|
|
|
|
Input parameters for `AnimaImagePipeline` inference include:
|
|
|
|
* `prompt`: Text description of the desired image content.
|
|
* `negative_prompt`: Content to exclude from the generated image (default: `""`).
|
|
* `cfg_scale`: Classifier-free guidance parameter (default: 4.0).
|
|
* `input_image`: Input image for image-to-image generation (default: `None`).
|
|
* `denoising_strength`: Controls similarity to input image (default: 1.0).
|
|
* `height`: Image height (must be multiple of 16, default: 1024).
|
|
* `width`: Image width (must be multiple of 16, default: 1024).
|
|
* `seed`: Random seed (default: `None`).
|
|
* `rand_device`: Device for random noise generation (default: `"cpu"`).
|
|
* `num_inference_steps`: Inference steps (default: 30).
|
|
* `sigma_shift`: Scheduler sigma offset (default: `None`).
|
|
* `progress_bar_cmd`: Progress bar implementation (default: `tqdm.tqdm`).
|
|
|
|
For VRAM constraints, enable [VRAM Management](../Pipeline_Usage/VRAM_management.md). Recommended low-VRAM configurations are provided in the "Model Overview" table above.
|
|
|
|
## Model Training
|
|
|
|
Anima models are trained through [`examples/anima/model_training/train.py`](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_training/train.py) with parameters including:
|
|
|
|
* General Training Parameters
|
|
* Dataset Configuration
|
|
* `--dataset_base_path`: Dataset root directory.
|
|
* `--dataset_metadata_path`: Metadata file path.
|
|
* `--dataset_repeat`: Dataset repetition per epoch.
|
|
* `--dataset_num_workers`: Dataloader worker count.
|
|
* `--data_file_keys`: Metadata fields to load (comma-separated).
|
|
* Model Loading
|
|
* `--model_paths`: Model paths (JSON format).
|
|
* `--model_id_with_origin_paths`: Model IDs with origin paths (e.g., `"anima-team/anima-1B:text_encoder/*.safetensors"`).
|
|
* `--extra_inputs`: Additional pipeline inputs (e.g., `controlnet_inputs` for ControlNet).
|
|
* `--fp8_models`: FP8-formatted models (same format as `--model_paths`).
|
|
* Training Configuration
|
|
* `--learning_rate`: Learning rate.
|
|
* `--num_epochs`: Training epochs.
|
|
* `--trainable_models`: Trainable components (e.g., `dit`, `vae`, `text_encoder`).
|
|
* `--find_unused_parameters`: Handle unused parameters in DDP training.
|
|
* `--weight_decay`: Weight decay value.
|
|
* `--task`: Training task (default: `sft`).
|
|
* Output Configuration
|
|
* `--output_path`: Model output directory.
|
|
* `--remove_prefix_in_ckpt`: Remove state dict prefixes.
|
|
* `--save_steps`: Model saving interval.
|
|
* LoRA Configuration
|
|
* `--lora_base_model`: Target model for LoRA.
|
|
* `--lora_target_modules`: Target modules for LoRA.
|
|
* `--lora_rank`: LoRA rank.
|
|
* `--lora_checkpoint`: LoRA checkpoint path.
|
|
* `--preset_lora_path`: Preloaded LoRA checkpoint path.
|
|
* `--preset_lora_model`: Model to merge LoRA with (e.g., `dit`).
|
|
* Gradient Configuration
|
|
* `--use_gradient_checkpointing`: Enable gradient checkpointing.
|
|
* `--use_gradient_checkpointing_offload`: Offload checkpointing to CPU.
|
|
* `--gradient_accumulation_steps`: Gradient accumulation steps.
|
|
* Image Resolution
|
|
* `--height`: Image height (empty for dynamic resolution).
|
|
* `--width`: Image width (empty for dynamic resolution).
|
|
* `--max_pixels`: Maximum pixel area for dynamic resolution.
|
|
* Anima-Specific Parameters
|
|
* `--tokenizer_path`: Tokenizer path for text-to-image models.
|
|
* `--tokenizer_t5xxl_path`: T5-XXL tokenizer path.
|
|
|
|
We provide a sample image dataset for testing:
|
|
|
|
```shell
|
|
modelscope download --dataset DiffSynth-Studio/diffsynth_example_dataset --local_dir ./data/diffsynth_example_dataset
|
|
```
|
|
|
|
For training script details, refer to [Model Training](../Pipeline_Usage/Model_Training.md). For advanced training techniques, see [Training Framework Documentation](https://github.com/modelscope/DiffSynth-Studio/tree/main/docs/zh/Training/). |