DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution

Zheng Chen, Zichen Zou, Kewei Zhang, Xiongfei Su, Xin Yuan, Yong Guo, and Yulun Zhang, "DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution", 2025

[arXiv] [supplementary material] [dataset] [pretrained models]

🔥🔥🔥 News

2025-6-09: Test datasets, inference scripts, and pretrained models are available. ⭐️⭐️⭐️
2025-5-22: This repo is released.

Abstract: Diffusion models have demonstrated promising performance in real-world video super-resolution (VSR). However, the dozens of sampling steps they require, make inference extremely slow. Sampling acceleration techniques, particularly single-step, provide a potential solution. Nonetheless, achieving one step in VSR remains challenging, due to the high training overhead on video data and stringent fidelity demands. To tackle the above issues, we propose DOVE, an efficient one-step diffusion model for real-world VSR. DOVE is obtained by fine-tuning a pretrained video diffusion model (i.e., CogVideoX). To effectively train DOVE, we introduce the latent–pixel training strategy. The strategy employs a two-stage scheme to gradually adapt the model to the video super-resolution task. Meanwhile, we design a video processing pipeline to construct a high-quality dataset tailored for VSR, termed HQ-VSR. Fine-tuning on this dataset further enhances the restoration capability of DOVE. Extensive experiments show that DOVE exhibits comparable or superior performance to multi-step diffusion-based VSR methods. It also offers outstanding inference efficiency, achieving up to a 28× speed-up over existing methods such as MGLD-VSR.

VideoLQ-007.mp4

RealVSR-016.mp4

Training Strategy

Video Processing Pipeline

🔖 TODO

⚙️ Dependencies

Python 3.11
PyTorch>=2.5.0
Diffusers

# Clone the github repo and go to the default directory 'DOVE'.
git clone https://github.com/zhengchen1999/DOVE.git
conda create -n DOVE python=3.11
conda activate DOVE
pip install -r requirements.txt
pip install diffusers["torch"] transformers
pip install pyiqa

🔗 Contents

📁 Datasets

🗳️ Test Datasets

We provide several real-world and synthetic test datasets for evaluation. All datasets follow a consistent directory structure:

Dataset	Type	# Num	Download
UDM10	Synthetic	10	Google Drive
SPMCS	Synthetic	30	Google Drive
YouHQ40	Synthetic	40	Google Drive
RealVSR	Real-world	50	Google Drive
MVSR4x	Real-world	15	Google Drive
VideoLQ	Real-world	50	Google Drive

All datasets are hosted on here. Make sure the path is correct (datasets/test/) before running inference.

The directory structure is as follows:

datasets/
└── test/
    └── [DatasetName]/
        ├── GT/         # Ground Truth: folder of high-quality frames (one per clip)
        ├── GT-Video/   # Ground Truth (video version): lossless MKV format
        ├── LQ/         # Low-quality Input: folder of degraded frames (one per clip)
        └── LQ-Video/   # Low-Quality Input (video version): lossless MKV format

📦 Models

We provide pretrained weights for DOVE and DOVE-2B.

Model Name	Description	HuggingFace	Google Drive	Baidu Disk	Visual Results
DOVE	Base version, built on CogVideoX1.5-5B;	TODO	Download	Download	Download
DOVE-2B	Smaller version, based on CogVideoX-2B	TODO	TODO	TODO	TODO

Place downloaded model files into the pretrained_models/ folder, e.g., pretrained_models/DOVE.

🔨 Testing

We provide inference commands below. Before running, make sure to download the corresponding pretrained models and test datasets.

For more options and usage, please refer to inference_script.py.

The full testing commands are provided in the shell script: inference.sh.

# 🔹 Demo inference
python inference_script.py \
    --input_dir datasets/demo \
    --model_path pretrained_models/DOVE \
    --output_path results/DOVE/demo \
    --is_vae_st \
    --save_format yuv420p

# 🔹 Reproduce paper results
python inference_script.py \
    --input_dir datasets/test/UDM10/LQ-Video \
    --model_path pretrained_models/DOVE \
    --output_path results/DOVE/UDM10 \
    --is_vae_st \

# 🔹 Evaluate quantitative metrics
python eval_metrics.py \
    --gt datasets/test/UDM10/GT \
    --pred results/DOVE/UDM10 \
    --metrics psnr,ssim,lpips,dists,clipiqa

💡 If you encounter out-of-memory (OOM) issues, you can enable chunk-based testing by setting the following parameters: tile_size_hw, overlap_hw, chunk_len, and overlap_t.

💡 Default save format is yuv444p. If playback fails, try save_format=yuv420p (may slightly affect metrics).

TODO: Add metric computation scripts for FasterVQA, DOVER, and $E^*_{warp}$.

🔎 Results

We achieve state-of-the-art performance on real-world video super-resolution. Visual results are available at Google Drive.

Quantitative Results (click to expand)

Results in Tab. 2 of the main paper

Qualitative Results (click to expand)

Results in Fig. 4 of the main paper

More Qualitative Results

More results in Fig. 3 of the supplementary material

More results in Fig. 4 of the supplementary material

More results in Fig. 5 of the supplementary material

More results in Fig. 6 of the supplementary material

More results in Fig. 7 of the supplementary material

📎 Citation

If you find the code helpful in your research or work, please cite the following paper(s).

@article{chen2025dove,
  title={DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution},
  author={Chen, Zheng and Zou, Zichen and Zhang, Kewei and Su, Xiongfei and Yuan, Xin and Guo, Yong and Zhang, Yulun},
  journal={arXiv preprint arXiv:2505.16239},
  year={2025}
}

💡 Acknowledgements

This project is based on CogVideo and Open-Sora.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution

🔥🔥🔥 News

Training Strategy

Video Processing Pipeline

🔖 TODO

⚙️ Dependencies

🔗 Contents

📁 Datasets

🗳️ Test Datasets

📦 Models

🔨 Testing

🔎 Results

📎 Citation

💡 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
assets		assets
datasets		datasets
pretrained_models		pretrained_models
.gitignore		.gitignore
README.md		README.md
eval_metrics.py		eval_metrics.py
inference.sh		inference.sh
inference_script.py		inference_script.py
requirements.txt		requirements.txt

benjaminherb/DOVE

Folders and files

Latest commit

History

Repository files navigation

DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution

🔥🔥🔥 News

Training Strategy

Video Processing Pipeline

🔖 TODO

⚙️ Dependencies

🔗 Contents

📁 Datasets

🗳️ Test Datasets

📦 Models

🔨 Testing

🔎 Results

📎 Citation

💡 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages