Wan 2.2 Animate
Wan Animate is a unified approach for character animation and character replacement. With one image of a character and one reference video, it can animate the character to follow the performer’s motion and expressions. It can also place the animated character back into the reference video to replace the original person, while matching scene lighting and color tone for consistent integration. The design separates reference conditions from regions to be generated, making the inputs clear and flexible.
The method builds on Wan and uses skeleton signals for body motion together with facial features drawn from source images to reenact expressions. An auxiliary Relighting LoRA helps apply environmental lighting and color tone without drifting away from the character’s appearance. This allows high controllability of motion and expression while maintaining identity and scene consistency.
Two operation modes are included. Move Mode animates the input image by following the motion in the video. Mix Mode replaces the person in the video with the character from the image. Inputs are bounded by practical limits on size, resolution, duration, and aspect ratio to keep processing reliable.
Quick Overview
Item | Details |
---|---|
Primary Topic | Wan Animate: character animation and replacement |
Modes | Move Mode, Mix Mode |
Key Signals | Skeleton motion, facial features from source images |
Lighting | Relighting LoRA for scene lighting and color tone |
Quality Variants | wan-pro 25fps 720p, wan-std 15fps 720p |
Typical Inputs | One reference image and one reference video |
Video Constraints | File 200MB; mp4/avi/mov |
Image Constraints | File 5MB; jpg/png/jpeg/webp/bmp |
Demo
The embedded demo is provided for illustration. It may enforce input limits for file size, duration, and resolution.
How Wan Animate Works
The input format separates what is used as guidance from what is generated. The image supplies identity and facial cues. The video supplies motion and timing. A skeleton signal is aligned with the performer to guide pose and body movement. Facial reenactment is driven by features extracted from the source image so that the character keeps consistent appearance while following expressions from the video.
For character replacement, an auxiliary Relighting LoRA adjusts the character’s lighting to the scene. The goal is to keep the character stable while matching the overall color tone and illumination, so the output looks coherent with the background footage. This balance helps preserve identity without introducing mismatched highlights or shadows.
The same structure supports Move Mode and Mix Mode, so the setup stays consistent across tasks. The system is designed for predictable control over motion, expression, and appearance, with quality profiles that target typical 720p outputs at either 25 or 15 frames per second.
Move Mode
Move Mode animates the character image using motion extracted from the reference video. This is suitable when you want to create a character video driven by a performer. The character follows body pose and timing guided by a spatially aligned skeleton. Facial signals help replicate expressions in a stable way so that the identity remains consistent.
Input guidelines:
- Prepare a clear character image with the character centered and unobstructed.
- Use a reference video showing the motion you want to transfer. Keep the performer visible and framed well.
- Follow limits on size, duration, and aspect ratio to reduce processing errors.
Typical results track the video’s timing closely. The output is influenced by image quality, pose coverage in the reference video, and expression range. If motion is partially hidden or off-frame, pose estimation may produce incomplete guidance; framing the performer well helps.
Mix Mode
Mix Mode replaces the person in the video with the animated character from the image. The system preserves motion and expressions while adjusting lighting and color tone to the target scene. The Relighting LoRA promotes stable appearance under different illumination so that the character stays consistent across frames.
Input guidelines:
- Pick a reference video where the person is well lit and mostly visible.
- Use a character image with clean features and appropriate pose relative to the scene.
- If the target scene has strong color casts or fast lighting changes, test a short clip first.
The produced video keeps the scene’s background and timing from the original footage. Face reenactment and pose follow the performer while the character’s style and identity remain from the input image.
Installation Overview
You can install Wan Animate locally using a Python environment with common deep learning dependencies. A typical setup uses a recent CUDA toolkit, a stable PyTorch release, and a package list defined by the repository. Prepare a GPU with sufficient memory for 720p generation and set the quality profile according to your time and hardware constraints.
- Create and activate a fresh environment.
- Install PyTorch and dependencies appropriate for your CUDA version.
- Install Wan packages and any extra video I/O tools.
- Download model weights as instructed by the repository.
- Run a quick test with a short video and a small image.
For a detailed walkthrough, see the Installation page in this site. It includes notes on quality settings (wan-pro and wan-std), I/O constraints, and basic command examples.
Practical Tips
- Keep the subject centered and visible across the video to help skeleton guidance.
- Use images with clean features and avoid heavy occlusion around the face.
- For complex lighting, test with wan-std first, then switch to wan-pro for final output.
- Match the aspect ratio of the source video to the intended output whenever possible.
- Trim reference videos to 2–10 seconds for quicker iteration before longer runs.
Frequently Asked Questions
Summary
Wan Animate combines motion control, expression reenactment, and lighting adjustment into a single approach for both character animation and replacement. The input design keeps guidance separate from generation, so the same principles apply across tasks. The quality profiles help you balance time and output smoothness, and the input limits steer projects toward reliable runs.