1TJNU 2ZJU 3NUS
- 🏆 2025.06 🎉 Our paper has been accepted to ICCV 2025!
- 📄 2025.07 📝 The arXiv preprint is now available: arxiv.org/abs/2507.17661
- 🚧 Coming Soon 🛠️ We are preparing the code release. Stay tuned on GitHub!
Monocular Semantic Scene Completion (MSSC) aims to infer voxel-wise occupancy and semantic labels from a single RGB image. Existing methods typically rely on single-stage pipelines that jointly handle visible segmentation and occluded region hallucination. However, these methods often suffer from depth estimation errors and limited generalizability to complex scenes.
MonoMRN is a novel two-stage framework designed to address these challenges:
- Stage 1: Coarse MSSC
- Stage 2: Masked Recurrent Network (MRN)
‣ Focuses on refining occluded regions
‣ Designs a Masked Sparse Gated Recurrent Unit (MS-GRU) to focus on occupied regions
‣ Proposes a Distance Attention Projection to reduce projection errors
- 🔁 Masked Sparse GRU (MS-GRU): Efficient recurrent unit that updates only occupied voxels
- 🎯 Distance Attention Projection: Improves feature projection accuracy
- 🏠 + 🚗 Indoor & Outdoor Scenes: Works seamlessly on NYUv2 and SemanticKITTI
@inproceedings{wang2025MonoMRN,
title={Monocular Semantic Scene Completion via Masked Recurrent Networks},
author={Wang, Xuzhi and Wu, Xinran and Wang, Song and Kong, Lingdong and Zhao, Ziping},
booktitle={Proceedings of the IEEE/CVF Conference on International Conference on Computer Vision (ICCV)},
year={2025}
}