Welcome to LPIPS-AttnWav2Lip, the ultimate audio-driven lip-syncing wizardry! This magical model transforms your audio into perfectly synced lip movements, making your talking head videos come alive. 🌟
Published in the prestigious journal Speech Communication, this work is a game-changer for talking head generation in the wild. Check out the paper here: LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild.
Get the pre-trained model here: Baidu Drive (Password: hat7).
LPIPS-AttnWav2Lip combines the power of LPIPS perceptual loss and attention mechanisms to deliver high-quality, synchronized lip movements. Whether it’s noisy environments or complex scenarios, this model has got you covered. 💪
Below is the framework of LPIPS-AttnWav2Lip:
Our method outperforms existing approaches in both quality and synchronization. Check out the comparison below:
- Prepare Your Inputs:
- A video or image with a face (
--faceparameter). - An audio file (
--audioparameter).
- A video or image with a face (
- Run the Magic:
python inference.py \ --checkpoint_path <path_to_model_weights> \ --face <path_to_face_video_or_image> \ --audio <path_to_audio_file> \ --outfile <path_to_output_video> - Voilà! Your synced video is ready to dazzle. ✨
- Get the Data:
- Download the LRS2 dataset from here.
- Process It:
python preprocess.py \ --data_root <path_to_LRS2_dataset> \ --preprocessed_root <path_to_save_preprocessed_data> - Done! Your data is now model-ready. 🚀
- Set the Stage:
- Tweak the
hparams.pyfile to set your training parameters.
- Tweak the
- Train Away:
python hq_wav2lip_train_lpips.py \ --data_root <path_to_preprocessed_data> \ --checkpoint_dir <path_to_save_checkpoints> \ --syncnet_checkpoint_path <path_to_pretrained_syncnet> - Save the Day: The model checkpoints will be saved for your future adventures. 🏆
If you find this work useful, please give us a shout-out by citing:
@article{chen2024lpips,
title={LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild},
author={Chen, Zhipeng and Wang, Xinheng and Xie, Lun and Yuan, Haijie and Pan, Hang},
journal={Speech Communication},
volume={157},
pages={103028},
year={2024},
publisher={Elsevier}
}Let’s make the world a more synchronized place, one lip movement at a time! 😄

