GitHub - codeaudit/SadTalker: （CVPR 2023）SadTalker：Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

😭 SadTalker： Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Wenxuan Zhang ^*,1,2 Xiaodong Cun ^*,2 Xuan Wang ³ Yong Zhang ² Xi Shen ²
Yu Guo¹ Ying Shan ² Fei Wang ¹

¹ Xi'an Jiaotong University ² Tencent AI Lab ³ Ant Group

CVPR 2023

TL;DR: A realistic and stylized talking head video generation method from a single image and audio

Changelog

2023.03.06 Solve some bugs in code and errors in installation
2023.03.03 Release the test code for audio-driven single image animation!
2023.02.28 SadTalker has been accepted by CVPR 2023!

Pipeline

TODO

Generating 2D face from a single Image.
Generating 3D face from Audio.
Generating 4D free-view talking examples from audio and a single image.
Gradio/Colab Demo.
integrade with stable-diffusion-web-ui. (stay tunning!)

sadtalker_demo_short.mp4

training code of each componments.

Test

Requirements

Python
PyTorch
ffmpeg

Conda Installation

git clone https://github.com/Winfredy/SadTalker.git
cd SadTalker 
conda create -n sadtalker python=3.8
source activate sadtalker
pip3 install torch torchvision torchaudio
conda config --add channels conda-forge
conda install ffmpeg
pip install ffmpy
pip install Cmake
pip install boost
conda install dlib
pip install -r requirements.txt

Models

Please download our pre-trained model and put it in ./checkpoints.

Model	Description
checkpoints/auido2exp_00300-model.pth	Pre-trained ExpNet in Sadtalker.
checkpoints/auido2pose_00140-model.pth	Pre-trained PoseVAE in Sadtalker.
checkpoints/mapping_00229-model.pth.tar	Pre-trained MappingNet in Sadtalker.
checkpoints/facevid2vid_00189-model.pth.tar	Pre-trained face-vid2vid model from the reappearance of face-vid2vid.
checkpoints/epoch_20.pth	Pre-trained 3DMM extractor in Deep3DFaceReconstruction.
checkpoints/wav2lip.pth	Highly accurate lip-sync model in Wav2lip.
checkpoints/shape_predictor_68_face_landmarks.dat	Face landmark model used in dilb.
checkpoints/BFM	3DMM library file.
checkpoints/hub	Face detection models used in face alignment.

Generating 2D face from a single Image

python inference.py --driven_audio <audio.wav> --source_image <video.mp4 or picture.png> --result_dir <a file to store results>

Generating 3D face from Audio

To do ...

Generating 4D free-view talking examples from audio and a single image

We use camera_yaw, camera_pitch, camera_roll to control camera pose. For example, --camera_yaw -20 30 10 means the camera yaw degree changes from -20 to 30 and then changes from 30 to 10.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --camera_yaw -20 30 10

Citation

If you find our work useful in your research, please consider citing:

@article{zhang2022sadtalker,
  title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
  author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
  journal={arXiv preprint arXiv:2211.12194},
  year={2022}
}

Acknowledgements

Facerender code borrows heavily from zhanglonghao and PIRender. We thank the author for sharing their wonderful code. In training process, We also use the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
audio2exp_models		audio2exp_models
audio2pose_models		audio2pose_models
config		config
examples		examples
face3d		face3d
facerender		facerender
README.md		README.md
audio.py		audio.py
croper.py		croper.py
free_view_result.gif		free_view_result.gif
generate_batch.py		generate_batch.py
generate_facerender_batch.py		generate_facerender_batch.py
hparams.py		hparams.py
inference.py		inference.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test_audio2coeff.py		test_audio2coeff.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

😭 SadTalker： Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Changelog

Pipeline

TODO

Test

Requirements

Conda Installation

Models

Generating 2D face from a single Image

Generating 3D face from Audio

Generating 4D free-view talking examples from audio and a single image

Citation

Acknowledgements

Related Work

About

Uh oh!

Releases

Packages

Languages

codeaudit/SadTalker

Folders and files

Latest commit

History

Repository files navigation

😭 SadTalker： Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Changelog

Pipeline

TODO

Test

Requirements

Conda Installation

Models

Generating 2D face from a single Image

Generating 3D face from Audio

Generating 4D free-view talking examples from audio and a single image

Citation

Acknowledgements

Related Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages