0:00
/0:03

Have you ever looked at a photo and imagined it moving—maybe even starring in its own short film? Now you can turn that daydream into reality, no animation degree required! Welcome to the world of Wan2.1 VACE, where the magic of AI transforms your favorite images into lively, expressive videos with just a few clicks. Whether you’re a digital artist, a social media enthusiast, or simply someone who loves to experiment with new tech, this guide will show you how easy it is to animate images using Reference-to-Video and Control with Reference workflows.

What is VACE?

Source: VACE Github

VACE stands for Video All-in-one Creation and Editing, and it’s a special set of tools built into the Wan2.1 AI model. What makes VACE so useful is that it lets you do everything from creating brand-new videos to editing existing ones - all in one place. With VACE, you can turn a simple picture or even just some text into a video, change backgrounds or characters, make people or objects move in new ways, and even improve or extend video scenes.

0:00
/0:12

Souce: Wan AI

But in this we'll guide, we'll dive into Reference-to-Video with Control Reference.

What is Wan VACE Reference-to-Video?

Imagine you have a favorite picture or drawing, and you want to see it come to life as a video. With VACE Reference-to-Video, it’s as easy as clicking a button! You just upload your image and video with control movement and the VACE’s smart model uses it as a guide to create a brand-new video that looks just like your picture—same style, same colors, and the same mood. Whether you want to animate a character, change a background, or just try something fun, VACE makes it happen.

0:00
/0:03

Ready to make your pictures dance, dazzle, and truly come alive? Let’s get started—your images are about to steal the spotlight!

How to Run Wan2.1 VACE Reference-to-Video and Control with Reference

Installation guide

Verified to work on ThinkDiffusion Build: May 19, 2025

ComfyUI v0.3.34 with the use Wan 14B T2V and 14B VACE model

Note: We specify the build date because ComfyUI and custom node versions updated after this date may change the behavior or outputs of the workflow.

💡
Download the workflow and drag & drop it into your ComfyUI window, whether locally or on ThinkDiffusion. If you're using ThinkDiffusion, minimum requirement is the Turbo 24gb machine, but we do recommend the Ultra 48gb machine.

Custom Nodes

If there are red nodes in the workflow, it means that the workflow lacks the certain required nodes. Install the custom nodes in order for the workflow to work.

  1. Go to ComfyUI Manager  > Click Install Missing Custom Nodes
  1. Check the list below if there's a list of custom nodes that needs to be installed and click the install.

Models

For this guide you'll need to download these 5 recommended models.

1. wan2.1_t2v_14B_fp8_e4m3fn.safetensors
2. Wan2_1-VACE_module_14B_fp8_e4m3fn.safetensors
3. Wan2_1_VAE_bf16.safetensors
4. depth_anything_v2_vitl_fp16.safetensors
5. umt5-xxl-enc-bf16.safetensors
  1. Go to ComfyUI Manager  > Click Model Manager
  1. Search for the models above and when you find the exact model that you're looking for, click install, and make sure to press refresh when you are finished.

Optional Model Path Source

Some of these models may not be available in the model manager.

You could also use the model path source instead: by pasting the model's link address into ThinkDiffusion MyFiles using upload URL.

Model Name Model Link Address ThinkDiffusion Upload Directory
wan2.1_t2v_14B_fp8_e4m3fn.safetensors
📋 Copy Path
.../comfyui/models/diffusion_models/
Wan2_1-VACE_module_14B_fp8_e4m3fn.safetensors
📋 Copy Path
.../comfyui/models/diffusion_models/
Wan2_1_VAE_bf16.safetensors
📋 Copy Path
.../comfyui/models/vae/
depth_anything_v2_vitl_fp16.safetensors
📋 Copy Path
.../comfyui/models/depthanything/
umt5-xxl-enc-bf16.safetensors
📋 Copy Path
.../comfyui/models/text_encoders/

Step-by-step Workflow Guide

This workflow was pretty easy to set up and runs well from the default settings. Here are a few steps where you might want to take extra note.

Steps Recommended Nodes
1. Set the Models

Set the models as seen on the image. 14B base model should be paired with 14B VACE model. If you have sageattn installed, use the sdpa.
2. Load Video as Reference for Control

Load a video with a movement that can similar to reference image. Video should be clear, and avoid multiple movements in video. Frames should be under 100 and set the 854x480 resolution.
3. Load Image as Reference for Video

Load a reference image for the subject of the video. Image should be clear and shows a subject such as person, animal, moving things. The extrapadding is how large the outpainted area of the video will be.
4. Write Prompt

Write a simple prompt that mentions the subject and its movement.
5. Check Sampling Settings

Set the settings as seen on the image. You can play with steps and cfg only.

If you need to retain the subject appearance get only the movement, reduce the WanVideo VACE Encode strength to 0.50
6. Check the Generated Video

💡
When utilizing the workflow, I have observed that the generation process can take a significant amount of time, particularly when handling outpainted areas and generating additional frames. This is compounded by the fact that the workflow employs the 14B parameter VACE model, which, while delivering high-quality and detailed results, requires substantial computational resources and processing time to render complex video transformations.
💡
I notice that the generated output is a unique blend of the reference image and the control video. The result combines the style and features from the image I provide with the motion and structure of the video input, creating a new animation that maintains character consistency and dynamic movement. I find this approach gives me more creative control and helps produce visually engaging results.
💡
When experimenting with pose estimation tools like DWPose and OpenPose as alternatives to Depth-based ControlNet, I’ve encountered occasional errors, particularly with complex poses or non-human subjects. However, there are several other ControlNet models and techniques that can be used alongside or instead of these tools to achieve better results.

Examples

0:00
/0:05
0:00
/0:04
0:00
/0:03

If you’re having issues with installation or slow hardware, you can try any of these workflows on a more powerful GPU in your browser with ThinkDiffusion.

Stay tuned for our upcoming Wan workflow tutorial that uses VACE!