Skip to content

(CVPR 2023)SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Notifications You must be signed in to change notification settings

codeaudit/SadTalker

 
 

Repository files navigation

😭 SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

     

Wenxuan Zhang *,1,2Xiaodong Cun *,2Xuan Wang 3Yong Zhang 2Xi Shen 2
Yu Guo1 Ying Shan 2   Fei Wang 1

1 Xi'an Jiaotong University   2 Tencent AI Lab   3 Ant Group  

CVPR 2023

sadtalker

TL;DR: A realistic and stylized talking head video generation method from a single image and audio


Changelog

  • 2023.03.06 Solve some bugs in code and errors in installation
  • 2023.03.03 Release the test code for audio-driven single image animation!
  • 2023.02.28 SadTalker has been accepted by CVPR 2023!

Pipeline

main_of_sadtalker

TODO

  • Generating 2D face from a single Image.
  • Generating 3D face from Audio.
  • Generating 4D free-view talking examples from audio and a single image.
  • Gradio/Colab Demo.
  • integrade with stable-diffusion-web-ui. (stay tunning!)
sadtalker_demo_short.mp4
  • training code of each componments.

Test

Requirements

  • Python
  • PyTorch
  • ffmpeg

Conda Installation

git clone https://github.com/Winfredy/SadTalker.git
cd SadTalker 
conda create -n sadtalker python=3.8
source activate sadtalker
pip3 install torch torchvision torchaudio
conda config --add channels conda-forge
conda install ffmpeg
pip install ffmpy
pip install Cmake
pip install boost
conda install dlib
pip install -r requirements.txt

Models

Please download our pre-trained model and put it in ./checkpoints.

Model Description
checkpoints/auido2exp_00300-model.pth Pre-trained ExpNet in Sadtalker.
checkpoints/auido2pose_00140-model.pth Pre-trained PoseVAE in Sadtalker.
checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/facevid2vid_00189-model.pth.tar Pre-trained face-vid2vid model from the reappearance of face-vid2vid.
checkpoints/epoch_20.pth Pre-trained 3DMM extractor in Deep3DFaceReconstruction.
checkpoints/wav2lip.pth Highly accurate lip-sync model in Wav2lip.
checkpoints/shape_predictor_68_face_landmarks.dat Face landmark model used in dilb.
checkpoints/BFM 3DMM library file.
checkpoints/hub Face detection models used in face alignment.

Generating 2D face from a single Image

python inference.py --driven_audio <audio.wav> --source_image <video.mp4 or picture.png> --result_dir <a file to store results>

Generating 3D face from Audio

To do ...

Generating 4D free-view talking examples from audio and a single image

We use camera_yaw, camera_pitch, camera_roll to control camera pose. For example, --camera_yaw -20 30 10 means the camera yaw degree changes from -20 to 30 and then changes from 30 to 10.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --camera_yaw -20 30 10

free_view

Citation

If you find our work useful in your research, please consider citing:

@article{zhang2022sadtalker,
  title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
  author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
  journal={arXiv preprint arXiv:2211.12194},
  year={2022}
}

Acknowledgements

Facerender code borrows heavily from zhanglonghao and PIRender. We thank the author for sharing their wonderful code. In training process, We also use the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.

Related Work

About

(CVPR 2023)SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%