How to Use Wan 2.1 for Video Style Transfer: A Step-by-Step Guide

0:00

/0:05

Prompt: A woman in casual attire strolls across a vast, sunlit expanse of dry land under a brilliant blue sky. Her relaxed outfit flutters gently in the warm breeze as she walks, surrounded by green grass and distant hills. The scene is rendered in a whimsical, soft Ghibli art style, with vibrant colors and a peaceful, dreamy atmosphere.

What is Video Style Transfer?

Style transfer lets you apply the visual characteristics of one image (like a painting style) to another image or video. Unlike basic filters, AI-powered style transfer analyzes both the content and artistic elements, creating results that look like they were created in that style from scratch.

With Wan 2.1, you can transform ordinary videos into ones that appear hand-painted, animated, or rendered in specific artistic styles while maintaining the original movement and content.

What You'll Learn

In this guide, you'll learn how to use Wan 2.1 to apply artistic styles to your videos. We'll cover:

Setting up the necessary tools and models
Creating a style reference image
Applying that style to your video
Practical tips for getting better results

Why Style Transfer is Important

0:00

/0:04

Prompt: A car speeds along a wide highway beneath a sky streaked with thin, wispy clouds. The landscape stretches out on either side, open and inviting. The scene is depicted in a bold, vibrant Pop Art style, with exaggerated colors, dynamic lines, and a sense of energetic motion.

Style transfer is important in video generation because it allows creators to transform ordinary videos into visually stunning and unique works of art by applying different artistic styles or effects.

Deeper analysis: Unlike simple filters, AI style transfer analyzes both content and style elements
Frame consistency: The AI maintains consistency across video frames, preventing flickering
Artistic adaptation: The style adapts to the specific content of each frame rather than applying a uniform effect

How to run Wan 2.1 Video2Video Style Transfer

Installation Guide

No installs. No downloads. Run ComfyUI workflows in the Cloud.

Launch ComfyUI Now

👉🏼 Download and Set Up the Workflow

Download the workflow file and drag & drop it into your ComfyUI window.

ThinkDiffusion-StableDiffusion-ComfyUI-Video2Video-Wan-Style-Transfer

Download the File Here

ThinkDiffusion-StableDiffusion-ComfyUI-Video2Video-Wan-Style-Transfer.json

95 KB

💡

If you're using ThinkDiffusion, we recommend the Ultra 48gb machine.

👉🏼 Install Required Custom Nodes

If you see red nodes in your workflow, you need to install missing custom nodes:

Go to ComfyUI Manager
Click "Install Missing Custom Nodes"
Select and install all missing nodes from the list

Select and install all missing nodes from the list

👉🏼 Download Required Models

For this guide you'll need 9 recommended models to be downloaded.

1. flux1-dev-fp8.safetensors (Preloaded on TD)
2. clip_l.safetensors (Preloaded on TD)
3. t5xxl_fp8_e4m3fn.safetensors (Preloaded on TD)
4. ae.sft (Preloaded on TD)
5. open-clip-xlm-roberta-large-vit-huge-14_fp16.safetensors
6. Wan2.1-Fun-Control-14B_fp8_e4m3fn.safetensors
7. Wan2_1_VAE_bf16.safetensors
8. umt5-xxl-enc-bf16.safetensors
9. flux-depth-controlnet-v3.safetensors

Go to ComfyUI Manager > Click Model Manager

Search for the models above and when you find the exact model that you're looking for, click install and make sure to press refresh when you are finished.

Model Path Source

Some of my models may not be available in the model manager. Use the model path source instead if you prefer to install the models using model's link address and paste into ThinkDiffusion MyFiles using upload URL.

Model Name	Model Link Address	ThinkDiffusion Upload Directory
flux1-dev-fp8.safetensors (Preloaded on TD)	📋 Copy Path	.../comfyui/models/diffusion_models/
clip_l.safetensors (Preloaded on TD)	📋 Copy Path	.../comfyui/models/clip/
t5xxl_fp8_e4m3fn.safetensors (Preloaded on TD)	📋 Copy Path	.../comfyui/models/clip/
ae.sft (Preloaded on TD)	📋 Copy Path	.../comfyui/models/vae/
open-clip-xlm-roberta-large-vit-huge-14_fp16.safetensors	📋 Copy Path	.../comfyui/models/clip_vision/
Wan2.1-Fun-Control-14B_fp8_e4m3fn.safetensors	📋 Copy Path	.../comfyui/models/diffusion_models/
Wan2_1_VAE_bf16.safetensors	📋 Copy Path	.../comfyui/models/vae/
umt5-xxl-enc-bf16.safetensors	📋 Copy Path	.../comfyui/models/text_encoders/
flux-depth-controlnet-v3.safetensors	📋 Copy Path	.../comfyui/models/xlabs/controlnets/

Step-by-step Workflow Guide

This workflow was pretty easy to set up and runs well from the default settings.

The process has two main stages:

Creating a style reference image
Applying that style to your video

Pre-Stage: Load Your Video

Step	Recommended Nodes
Load a Video Set resolution up to 1280x720. Set the frame load cap up to 73. Select first the Stage 1 workflow to get the Style and Stage 2 for Video2Video with application of Style to video. For videos with human set the preprocessor with Depth+OpenPose and non-human video use the depth and scribble

💡

Preprocessor Tip: Depth+OpenPose works best for human subjects because it preserves both spatial structure and body positioning. For style transfer with non-human subjects, Scribble often maintains more of the original scene details.

Stage 1: Create Style Reference Image

Step	Recommended Nodes
1. Auto Input First Frame The workflow automatically uses the first frame from your pre-stage input video.
2. Set Models Configure the models as shown on the image.
3. Write Prompt Create a detailed prompt describing how you want the styled image to look.
4. Configure Controlnet Set the controlnet settings as seen on the image. Be aware of controlnet model the version 3, be sure use to that.
5. Check Sampling Settings Set the settings as seen on the image. If you want different style, set the control after generate as Randomize.
6. Generate and Save Output Save your styled image or copy to clipspace.

Stage 2: Apply Style to Video

Step	Recommended Nodes
1. Input Reference Image Load the image you created in Stage 1.
2. Set Models Configure the models as shown on the image.
3. Write a Prompt Write a prompt and it is recommended to use the prompt same as the prompt used in the Stage 1. Use only the 1st text box. The second serves as the negative prompt.
4. Check Sampling Settings Set the settings as seen on the image.
5. Check Output

💡

When I work on lipsyncing for videos using Depth+OpenPose or Scribble, I find that the best results typically come from close-up shots of the face, where the details of mouth movements are most visible and easier for the AI to track accurately. While these methods can produce convincing lipsync animations, they're not flawless, sometimes the synchronization isn't perfect, and subtle mismatches can occur, especially if the facial movements are complex or the video quality varies. I understand that even with advanced tools, achieving 100% accuracy in every frame is challenging, but close-up videos generally give me the most reliable and natural-looking lipsync results.

💡

When I’m working on extending a video, I realize that if the subject isn’t visible in the final frame, the system won’t have any reference for what the subject originally looked like. To avoid this issue, I make sure to plan ahead so that my subject remains visible at the end of the clip, ensuring continuity for the next segment.

Examples

💡

There are 3 sections in the examples below.
1st is the style that we want to use for
2nd is the original video
3rd is the generated video where the style is applied