Introducing SadTalker and talking photos
Imagine generating lifelike talking head videos with just a facial image and an audio clip. With Sad Talker, this innovative approach becomes a reality!
How does SadTalker do it?
Sad talker harnesses the power of cutting-edge 3D modeling techniques like ExpNet and PoseVAE.
It excels at capturing intricate facial expressions and head poses directly from your audio input.
By applying the resulting 3D motion coefficients to the facial render, Stylized Audio-Driven Talking-head video (SadTalker) crafts videos with unparalleled natural motion and exceptional image quality, surpassing previous methods.
The best part? SadTalker is now seamlessly integrated into the user-friendly stable-diffusion-webui platform and is available on ThinkDiffusion as a pre-installed extension.
This integration simplifies the entire process for designers and creative minds alike. Whether you're a seasoned pro or just starting out, you'll find it a breeze to use. The stable version of SadTalker, combined with the stable-diffusion-webui, guarantees reliable and consistent performance, making it effortless for users to create high-quality talking head videos like never before.
Using Sad Talker
- (1) Click the Sad Talker tab
- (2) Upload an image of a face
- (3) Upload a voice recording
(I use the sound recorder built in app within windows. WAV and M4A files seem to work well)
- Hit Generate!
SadTalker Settings
Pose Style
- This will affect the head movement. If we set this to 45 then I have found that this tends to give the best results.
Face model resolution and GFPGAN face enhancer
- You can set this to 256 or 512 which will be the resolution of the face.
- You can use 512 to have a higher resolution and also select GFPGAN as the face enhancer and you can see we have a much clearer image.
- Hit generate!
PreProcess Options Explained
These settings control how the image is modified to become the target face model resolution.
Crop:
If we select a different image that includes the upper body, when we set the preprocess to crop, it will crop the image
Resize:
If we run the same image and use resize as the preprocess, it will resize the image but the results are not very good so it is not suggested to use this.
Full:
If we run the same image and use full as the preprocess, it will use the full image and gives better results than resize for an upper body image.
ExtCrop:
If we use ExtCrop then we get a very similar result to Crop but the face crop is larger. It is personal preference as to which one works best for you
ExtFull:
The final preprocess is ExtFull and is very similar to full but the extent is again larger. Once again it is personal preference as to whether full or Extfull works best for you
If you’re having issues with installation or slow hardware, you can have SadTalker preinstalled with a more powerful GPU in your browser with ThinkDiffusion.
If you’d like to start swapping faces before making them start talking, check out my post to using Reactor here and have fun making all your photos come to life!
Member discussion