Skip to content

This repository contains the official code for LPIPS-AttnWav2Lip. The paper has been accepted by the journal Speech Communication.

Notifications You must be signed in to change notification settings

FelixChan9527/LPIPS-AttnWav2Lip

Repository files navigation

LPIPS-AttnWav2Lip 🎤👄

Welcome to LPIPS-AttnWav2Lip, the ultimate audio-driven lip-syncing wizardry! This magical model transforms your audio into perfectly synced lip movements, making your talking head videos come alive. 🌟

Published in the prestigious journal Speech Communication, this work is a game-changer for talking head generation in the wild. Check out the paper here: LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild.

Model Download Link 🔗

Get the pre-trained model here: Baidu Drive (Password: hat7).

What’s the Buzz About? 🐝

LPIPS-AttnWav2Lip combines the power of LPIPS perceptual loss and attention mechanisms to deliver high-quality, synchronized lip movements. Whether it’s noisy environments or complex scenarios, this model has got you covered. 💪

Framework Overview 🧠

Below is the framework of LPIPS-AttnWav2Lip:

Framework

Comparison with Other Methods 📊

Our method outperforms existing approaches in both quality and synchronization. Check out the comparison below:

Comparison

How to Infer Like a Pro 🔮

  1. Prepare Your Inputs:
    • A video or image with a face (--face parameter).
    • An audio file (--audio parameter).
  2. Run the Magic:
    python inference.py \
        --checkpoint_path <path_to_model_weights> \
        --face <path_to_face_video_or_image> \
        --audio <path_to_audio_file> \
        --outfile <path_to_output_video>
  3. Voilà! Your synced video is ready to dazzle. ✨

Prepping the Dataset 🛠️

  1. Get the Data:
    • Download the LRS2 dataset from here.
  2. Process It:
    python preprocess.py \
        --data_root <path_to_LRS2_dataset> \
        --preprocessed_root <path_to_save_preprocessed_data>
  3. Done! Your data is now model-ready. 🚀

Training the Beast 🦾

  1. Set the Stage:
    • Tweak the hparams.py file to set your training parameters.
  2. Train Away:
    python hq_wav2lip_train_lpips.py \
        --data_root <path_to_preprocessed_data> \
        --checkpoint_dir <path_to_save_checkpoints> \
        --syncnet_checkpoint_path <path_to_pretrained_syncnet>
  3. Save the Day: The model checkpoints will be saved for your future adventures. 🏆

Citation 📜

If you find this work useful, please give us a shout-out by citing:

@article{chen2024lpips,
  title={LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild},
  author={Chen, Zhipeng and Wang, Xinheng and Xie, Lun and Yuan, Haijie and Pan, Hang},
  journal={Speech Communication},
  volume={157},
  pages={103028},
  year={2024},
  publisher={Elsevier}
}

Let’s make the world a more synchronized place, one lip movement at a time! 😄

About

This repository contains the official code for LPIPS-AttnWav2Lip. The paper has been accepted by the journal Speech Communication.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages