Introducing SadTalker and talking photos

Imagine generating lifelike talking head videos with just a facial image and an audio clip. With Sad Talker, this innovative approach becomes a reality!

How does SadTalker do it?

Sad talker harnesses the power of cutting-edge 3D modeling techniques like ExpNet and PoseVAE. 

It excels at capturing intricate facial expressions and head poses directly from your audio input. 

By applying the resulting 3D motion coefficients to the facial render, Stylized Audio-Driven Talking-head video (SadTalker) crafts videos with unparalleled natural motion and exceptional image quality, surpassing previous methods.

The best part? SadTalker is now seamlessly integrated into the user-friendly stable-diffusion-webui platform and is available on ThinkDiffusion as a pre-installed extension.

Sadtalker is like sticking your hand up the photo to make it say whatever you want
Sadtalker is like sticking your hand up the photo to make it say whatever you want

This integration simplifies the entire process for designers and creative minds alike. Whether you're a seasoned pro or just starting out, you'll find it a breeze to use. The stable version of SadTalker, combined with the stable-diffusion-webui, guarantees reliable and consistent performance, making it effortless for users to create high-quality talking head videos like never before.

0:00
/0:05

We are loving this demo by Olivia Sarkias 

Using Sad Talker

  • (1) Click the Sad Talker tab
  • (2) Upload an image of a face
  • (3) Upload a voice recording
    (I use the sound recorder built in app within windows. WAV and M4A files seem to work well)
Just plug in your source image and audio file into SadTalker on ThinkDiffusion, and let A.I. take care of the rest
Just plug in your source image and audio file into SadTalker on ThinkDiffusion, and let A.I. take care of the rest
  • Hit Generate!
0:00
/0:03

SadTalker Settings

Pose Style

  • This will affect the head movement. If we set this to 45 then I have found that this tends to give the best results.
Setting Pose style to 45 yields the best results in our experience but feel free to play around with the setting!
Setting Pose style to 45 yields the best results in our experience but feel free to play around with the setting!

Face model resolution and GFPGAN face enhancer

  • You can set this to 256 or 512 which will be the resolution of the face.
  • You can use 512 to have a higher resolution and also select GFPGAN as the face enhancer and you can see we have a much clearer image.
You can change the face resolution and use GFPGAN options to enhance the final video's quality
You can change the face resolution and use GFPGAN options to enhance the final video's quality
  • Hit generate!
0:00
/0:03

PreProcess Options Explained

These settings control how the image is modified to become the target face model resolution.

Crop:
If we select a different image that includes the upper body, when we set the preprocess to crop, it will crop the image

Adjust the preprocess options to control how the composition is modified to become the target face model resolution.
Adjust the preprocess options to control how the composition is modified to become the target face model resolution.
0:00
/0:03

Using the Preprocess "Crop" setting 

Resize:
If we run the same image and use resize as the preprocess, it will resize the image but the results are not very good so it is not suggested to use this.

0:00
/0:03

Using the Preprocess "Resize" setting 

Full:
If we run the same image and use full as the preprocess, it will use the full image and gives better results than resize for an upper body image.

0:00
/0:03

Using the Preprocess "Full" setting 

ExtCrop:
If we use ExtCrop then we get a very similar result to Crop but the face crop is larger. It is personal preference as to which one works best for you

0:00
/0:03

Using the Preprocess "ExtCrop" setting 

ExtFull:
The final preprocess is ExtFull and is very similar to full but the extent is again larger. Once again it is personal preference as to whether full or Extfull works best for you

0:00
/0:03

Using the Preprocess "ExtFull" setting 

If you’re having issues with installation or slow hardware, you can have SadTalker preinstalled with a more powerful GPU in your browser with ThinkDiffusion.

If you’d like to start swapping faces before making them start talking, check out my post to using Reactor here and have fun making all your photos come to life!