Seamless Lip Sync: Create Stunning Videos with LatentSync

💡

8/13/2025 Changelog - Updated the workflow, and tutorial procedures. It uses now the latest model LatentSync 1.6

Lip synced video from just an audio clip and a base video? We got you!

LatentSync is an advanced lip sync framework that creates natural-looking speech by analyzing audio and generating matching lip movements. It uses audio-conditioned models to ensure accuracy and integrates with ComfyUI, a powerful AI workflow tool, to refine the process.

One of the biggest challenges is maintaining smooth and consistent lip movements across frames for a realistic result, something we will explore today!

0:00

/0:05

We've also written another guide on an alternate lip syncing technique with Automatic1111 and SadTalker:

The Framework

💡

The LatentSync framework takes a video, uses AI to predict the correct lip sync based on audio input and refines it by comparing with real frames for accuracy.

Difference of LatentSync & LivePortrait

💡

LatentSync focuses solely on achieving precise lip syncing, which completes in just a few minutes. It is ideal for projects where speed is crucial. In contrast, LivePortrait generates full facial expressions, but I find that this process takes significantly longer.

How to run LatentSync in ComfyUI

Installation guide

ThinkDiffusion-StableDiffusion-ComfyUI-Video2Video_LatentSync_1_6

DOWNLOAD THE FILE HERE

ThinkDiffusion-StableDiffusion-ComfyUI-Video2Video_LatentSync_1_6.json

8 KB

Verified to work on ThinkDiffusion Build: June 27, 2025

ComfyUI v0.3.47 with LatentSync 1.6 model support.
Why do we specify the build date? ComfyUI and custom node versions that are part of this workflow that are updated after this date may change the behavior or outputs of the workflow.

Minimum Machine Size: Ultra

Use the specified machine size or higher to ensure it meets the VRAM and performance requirements of the workflow

💡

Download the workflow and drag & drop it into your ComfyUI window, whether locally or on ThinkDiffusion. If you're using ThinkDiffusion, it's necessary to use at minimum the Turbo 24gb machine, but we do recommend the Ultra 48gb machine.

Custom Node

If there are red nodes in the workflow, it means that the workflow lacks the certain required nodes. Install the custom nodes in order for the workflow to work.

Go to ComfyUI Manager > Click Install Missing Custom Nodes

Check the list below if there's a list of custom nodes that needs to be installed and click the install.

Models

For this guide you'll need 5 models for this.

💡

Copy the download link address and upload to the ThinkDiffusion directories listed below. If you have existing files of latentsync, delete and redownload the latest file (including 1.6 version).

Model Name	Model Link Address	ThinkDiffusion Upload Directory
latentsync_unet.pt	📋 Copy Path	...comfyui/custom_nodes/ComfyUI-LatentSyncWrapper/checkpoints/
tiny.pt	📋 Copy Path	...comfyui/custom_nodes/ComfyUI-LatentSyncWrapper/checkpoints/whisper/
stable_syncnet.pt	📋 Copy Path	...comfyui/custom_nodes/ComfyUI-LatentSyncWrapper/checkpoints/
vae/config.json	📋 Copy Path	...comfyui/custom_nodes/ComfyUI-LatentSyncWrapper/checkpoints/vae/
vae/diffusion_pytorch_model.safetensors	📋 Copy Path	...comfyui/custom_nodes/ComfyUI-LatentSyncWrapper/checkpoints/vae/

💡

If in case that the generated video has an artifact in lips and low quality. Create a folder auxiliary in ...comfyui/custom_nodes/ComfyUI-LatentSyncWrapper/checkpoints/ and upload the other core files within that folder. - download here Auxiliary files

💡

If there's no whisper and vae folder, create it and download those required models.

LatentSync File Structure

Step-by-step Workflow Guide

This workflow was pretty easy to set up and runs well from the default settings. Here are a few steps where you might want to take extra note.

Steps	Recommended Nodes
1. Input the Video and Audio Load a video with clear front-facing face, realistic, 25 fps and does not contain more than one face. The resolution should not go beyond 1080p. Load an audio file that has no background music and noise. It should be a clear vocal voice.
2. Check Sampling Video Length Adjuster Node includes three modes: "Normal" passes frames with padding to prevent loss, perfect for standard lip-syncing. "Pingpong" creates a forward-backward video loop for back-and-forth animations. "looptoaudio" repeats frames to match longer audio durations, maintaining synchronization.
3. Check Output Check the results of your generation. If unsatisfactory, generate again.

💡

If you encounter a "RuntimeError: Face not detected," be sure to set the number of frame load cap under 150 in Load Video. The video clip's length should be under 1 minute. These adjustments should help to troubleshoot the issue, allowing the tool to function properly and detect faces as intended.

💡

If you encounter a "tuple index out of range" error, be sure to update both your ComfyUI and the LatentSync custom nodes. Keeping these components up-to-date is essential for ensuring smooth operation and compatibility, which ultimately helps to maintain an efficient workflow and effective results.

💡

When I notice the final result has visible flickering or isn't in sync, I simply generate it again. I find this step crucial to ensure that the output meets the quality standards I have for my videos.

Limitations of LatentSync

While testing this, I noticed a few certain limitations:

It works best with videos showing clear, front-facing views of faces.
It doesn't support anime or cartoon faces.
The video should be at 25 frames per second.
The face should stay visible the whole time in the video and don't use videos with more than one face.

Examples

0:00

/0:05

Man reciting a number countdown.

0:00

/0:05

Woman as an Optimus Prime.

0:00

/0:05

Man in desk initiating a sequence.

If you’re having issues with installation or slow hardware, you can try any of these workflows on a more powerful GPU in your browser with ThinkDiffusion.

TRY THINKDIFFUSION

If you enjoy ComfyUI and you want to test out creating awesome animations, then feel free to check out some more workflows below. And have fun out there with your videos!