DiffSynth-Studio/docs/en/Model_Details/Anima.md

# Anima

Anima is an image generation model trained and open-sourced by CircleStone Labs and Comfy Org.

## Installation

Before using this project for model inference and training, please install DiffSynth-Studio first.

```shell
git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .
```

For more installation information, please refer to [Install Dependencies](../Pipeline_Usage/Setup.md).

## Quick Start

The following code demonstrates how to quickly load the [circlestone-labs/Anima](https://www.modelscope.cn/models/circlestone-labs/Anima) model for inference. VRAM management is enabled by default, allowing the framework to automatically control model parameter loading based on available VRAM. Minimum 8GB VRAM required.

```python
from diffsynth.pipelines.anima_image import AnimaImagePipeline, ModelConfig
import torch

vram_config = {
    "offload_dtype": "disk",
    "offload_device": "disk",
    "onload_dtype": "disk",
    "onload_device": "disk",
    "preparing_dtype": torch.bfloat16,
    "preparing_device": "cuda",
    "computation_dtype": torch.bfloat16,
    "computation_device": "cuda",
}
pipe = AnimaImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="circlestone-labs/Anima", origin_file_pattern="split_files/diffusion_models/anima-preview.safetensors", **vram_config),
        ModelConfig(model_id="circlestone-labs/Anima", origin_file_pattern="split_files/text_encoders/qwen_3_06b_base.safetensors", **vram_config),
        ModelConfig(model_id="circlestone-labs/Anima", origin_file_pattern="split_files/vae/qwen_image_vae.safetensors", **vram_config),
    ],
    tokenizer_config=ModelConfig(model_id="Qwen/Qwen3-0.6B", origin_file_pattern="./"),
    tokenizer_t5xxl_config=ModelConfig(model_id="stabilityai/stable-diffusion-3.5-large", origin_file_pattern="tokenizer_3/"),
    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
)
prompt = "Masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait."
negative_prompt = "worst quality, low quality, monochrome, zombie, interlocked fingers, Aissist, cleavage, nsfw,"
image = pipe(prompt, seed=0, num_inference_steps=50)
image.save("image.jpg")
```

## Model Overview

|Model ID|Inference|Low VRAM Inference|Full Training|Validation after Full Training|LoRA Training|Validation after LoRA Training|
|-|-|-|-|-|-|-|
|[circlestone-labs/Anima](https://www.modelscope.cn/models/circlestone-labs/Anima)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_inference/anima-preview.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_inference_low_vram/anima-preview.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_training/full/anima-preview.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_training/validate_full/anima-preview.py)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_training/lora/anima-preview.sh)|[code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_training/validate_lora/anima-preview.py)|

Special training scripts:

* Differential LoRA Training: [doc](../Training/Differential_LoRA.md)
* FP8 Precision Training: [doc](../Training/FP8_Precision.md)
* Two-Stage Split Training: [doc](../Training/Split_Training.md)
* End-to-End Direct Distillation: [doc](../Training/Direct_Distill.md)

## Model Inference

Models are loaded through `AnimaImagePipeline.from_pretrained`, see [Model Inference](../Pipeline_Usage/Model_Inference.md#loading-models) for details.

Input parameters for `AnimaImagePipeline` inference include:

* `prompt`: Text description of the desired image content.
* `negative_prompt`: Content to exclude from the generated image (default: `""`).
* `cfg_scale`: Classifier-free guidance parameter (default: 4.0).
* `input_image`: Input image for image-to-image generation (default: `None`).
* `denoising_strength`: Controls similarity to input image (default: 1.0).
* `height`: Image height (must be multiple of 16, default: 1024).
* `width`: Image width (must be multiple of 16, default: 1024).
* `seed`: Random seed (default: `None`).
* `rand_device`: Device for random noise generation (default: `"cpu"`).
* `num_inference_steps`: Inference steps (default: 30).
* `sigma_shift`: Scheduler sigma offset (default: `None`).
* `progress_bar_cmd`: Progress bar implementation (default: `tqdm.tqdm`).

For VRAM constraints, enable [VRAM Management](../Pipeline_Usage/VRAM_management.md). Recommended low-VRAM configurations are provided in the "Model Overview" table above.

## Model Training

Anima models are trained through [`examples/anima/model_training/train.py`](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/anima/model_training/train.py) with parameters including:

* General Training Parameters
    * Dataset Configuration
        * `--dataset_base_path`: Dataset root directory.
        * `--dataset_metadata_path`: Metadata file path.
        * `--dataset_repeat`: Dataset repetition per epoch.
        * `--dataset_num_workers`: Dataloader worker count.
        * `--data_file_keys`: Metadata fields to load (comma-separated).
    * Model Loading
        * `--model_paths`: Model paths (JSON format).
        * `--model_id_with_origin_paths`: Model IDs with origin paths (e.g., `"anima-team/anima-1B:text_encoder/*.safetensors"`).
        * `--extra_inputs`: Additional pipeline inputs (e.g., `controlnet_inputs` for ControlNet).
        * `--fp8_models`: FP8-formatted models (same format as `--model_paths`).
    * Training Configuration
        * `--learning_rate`: Learning rate.
        * `--num_epochs`: Training epochs.
        * `--trainable_models`: Trainable components (e.g., `dit`, `vae`, `text_encoder`).
        * `--find_unused_parameters`: Handle unused parameters in DDP training.
        * `--weight_decay`: Weight decay value.
        * `--task`: Training task (default: `sft`).
    * Output Configuration
        * `--output_path`: Model output directory.
        * `--remove_prefix_in_ckpt`: Remove state dict prefixes.
        * `--save_steps`: Model saving interval.
    * LoRA Configuration
        * `--lora_base_model`: Target model for LoRA.
        * `--lora_target_modules`: Target modules for LoRA.
        * `--lora_rank`: LoRA rank.
        * `--lora_checkpoint`: LoRA checkpoint path.
        * `--preset_lora_path`: Preloaded LoRA checkpoint path.
        * `--preset_lora_model`: Model to merge LoRA with (e.g., `dit`).
    * Gradient Configuration
        * `--use_gradient_checkpointing`: Enable gradient checkpointing.
        * `--use_gradient_checkpointing_offload`: Offload checkpointing to CPU.
        * `--gradient_accumulation_steps`: Gradient accumulation steps.
    * Image Resolution
        * `--height`: Image height (empty for dynamic resolution).
        * `--width`: Image width (empty for dynamic resolution).
        * `--max_pixels`: Maximum pixel area for dynamic resolution.
* Anima-Specific Parameters
    * `--tokenizer_path`: Tokenizer path for text-to-image models.
    * `--tokenizer_t5xxl_path`: T5-XXL tokenizer path.

We provide a sample image dataset for testing:

```shell
modelscope download --dataset DiffSynth-Studio/diffsynth_example_dataset --local_dir ./data/diffsynth_example_dataset
```

For training script details, refer to [Model Training](../Pipeline_Usage/Model_Training.md). For advanced training techniques, see [Training Framework Documentation](https://github.com/modelscope/DiffSynth-Studio/tree/main/docs/zh/Training/).