SyncNet

This repository contains the demo for the audio-to-video synchronisation network (SyncNet). This network can be used for audio-visual synchronisation tasks including:

Removing temporal lags between the audio and visual streams in a video;
Determining who is speaking amongst multiple faces in a video.

Please cite the paper below if you make use of the software.

Dependencies

pip install -r requirements.txt

In addition, ffmpeg is required.

Note, the model expects video at 25 fps and audio at 16kHz

Demo

The demos expect cropped videos from the run_pipeline step below. SyncNet demo:

python demo_syncnet.py --videofile data/example.avi --tmp_dir /path/to/temp/directory

Check that this script returns:

AV offset:      3 
Min dist:       5.353
Confidence:     10.021

Feature Extraction

This also expects that the videos are cropped with the pipeline.

python demo_feature.py --videofile data/example.avi --tmp_dir /path/to/save/features

Pipeline

The pipeline consists of three steps:

run_pipeline: extracts video of individual faces into seperate videos. Saves 221x221 video and audio to
run_syncnet: calls the syncnet model on the video streams, gathering features and confidence values
run_visualize: combines the detected faces with the confidence in the original video

Full pipeline (these steps are sequential):

sh download_model.sh
python run_pipeline.py --videofile /path/to/video.mp4 --reference name_of_video --data_dir /path/to/output
python run_syncnet.py --videofile /path/to/video.mp4 --reference name_of_video --data_dir /path/to/output
python run_visualise.py --videofile /path/to/video.mp4 --reference name_of_video --data_dir /path/to/output

Key Outputs:

$DATA_DIR/pycrop/$REFERENCE/*.avi - cropped face tracks from run_pipeline
$DATA_DIR/pywork/$REFERENCE/offsets.txt - audio-video offset values from run_syncnet (From original readme - Not currently written???)
$DATA_DIR/pyavi/$REFERENCE/video_out.avi - output video (as shown below)
$DATA_DIR/pyavi/$REFERENCE/framewise_confidences.csv - output per confidence frames (**assumes only person in frame**)
$DATA_DIR/pyavi/$REFERENCE/results.txt

All Outputs:

data_dir/
- pyavi/ref/
  - video.avi (original video in .avi, resampled to 25 FPS)
  - video_only.avi (Video without audio)
  - audio.wav (audio resampled to 16k SR)
  - video_out.avi (output visualization)
  - framewise_confidences.csv
  - results.txt

- pycrop/ref/
  -000#.avi (224x224 crop around each face-scene detected)
  -000etc...

- pywork/ref/
  - activesd.pckl (distances - a measures of likelihood of talking for each face-frame)
  - faces.pckl (detected faces?)
  - scene.pckl (tracks 'scenes' - continuous detected faces)
  - tracks.pckl (tracks location of detected faces)

- pytmp/ref/
  - every face crop

- pyframes/ref/
  - every frame as a jpg

Publications

@InProceedings{Chung16a,
  author       = "Chung, J.~S. and Zisserman, A.",
  title        = "Out of time: automated lip sync in the wild",
  booktitle    = "Workshop on Multi-view Lip-reading, ACCV",
  year         = "2016",
}

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.ipynb_checkpoints		.ipynb_checkpoints
detectors		detectors
img		img
scoring		scoring
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
SyncNetInstance.py		SyncNetInstance.py
SyncNetModel.py		SyncNetModel.py
data_loader.py		data_loader.py
demo_feature.py		demo_feature.py
demo_syncnet.py		demo_syncnet.py
download_model.sh		download_model.sh
requirements.txt		requirements.txt
run_all.sh		run_all.sh
run_pipeline.py		run_pipeline.py
run_pipeline_components.sh		run_pipeline_components.sh
run_syncnet.py		run_syncnet.py
run_visualise.py		run_visualise.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SyncNet

Dependencies

Demo

Feature Extraction

Pipeline

Publications

About

Uh oh!

Releases

Packages

Languages

License

chrismbirmingham/syncnet_python

Folders and files

Latest commit

History

Repository files navigation

SyncNet

Dependencies

Demo

Feature Extraction

Pipeline

Publications

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages