Abstract: Event cameras offer a considerable alternative to RGB cameras in many scenarios. While there are recent works on event-based novel-view synthesis, dense 3D mesh reconstruction remains scarcely explored and existing event-based techniques are severely limited in their 3D reconstruction accuracy. To address this limitation, we present EventNeuS, a self-supervised neural model for learning 3D representations from monocular colour event streams. Our approach, for the first time, combines 3D signed distance function and density field learning with event-based supervision. Furthermore, we introduce spherical harmonics encodings into our model for enhanced handling of view-dependent effects. EventNeuS outperforms existing approaches by a significant margin, achieving 34% lower Chamfer distance and 31% lower mean absolute error on average compared to the best previous method.

Overview


Overview of the proposed EventNeuS method. We start with the trajectory that captures the object from multiple viewpoints (e.g., Seiffert's spherical spiral). We accumulate all events within a time window \([t_0, t_1]\) to form an event frame \(E^k(t_0, t_1)\). For each event frame, we randomly choose a mini-batch of pixels and sample points along the corresponding rays. After applying positional encoding to these 3D coordinates, we feed them into the \(f_{\text{sdf}}\) network, which outputs a signed distance function (SDF) and its gradients. We further refine our sampling near surfaces via importance sampling. Next, we combine the resulting SDF features and gradients with view directions encoded via spherical harmonics \(SH(d)\) in the \(f_{\text{color}}\) network to predict color. Finally, we convert the SDF to a density field, integrate it along each ray to obtain the accumulated transmittance \(\alpha\), and use the color predictions to get the rendered color \(c\) through volumetric rendering. We render two views (at the start and at the end of the event window) and take their difference, applying an MSE loss against the ground-truth accumulated event frame. This difference enforces consistency between the rendered scene changes and the actual events recorded during \([t_0, t_1]\).

We present a qualitative comparison of 3D mesh reconstructions. Our method, EventNeuS, consistently recovers higher-fidelity geometry and fewer artifacts compared to baselines. Interact with the synchronized views below to inspect the details.

Drag to rotate  |  Scroll to zoom  |  All views synchronized

* PAEv3D requires training on high-resolution event stream 692×520 px for convergence

@inproceedings{sachan2026eventneus,
  title={EventNeuS: 3D Mesh Reconstruction from a Single Event Camera},
  author={Sachan, Shreyas and Rudnev, Viktor and Elgharib, Mohamed and Theobalt, Christian and Golyanik, Vladislav},
  booktitle={International Conference on 3D Vision (3DV)},
  year={2026}
}
For questions, clarifications, please get in touch with:
Shreyas Sachan
shreyas.sachan@gmail.com
Vladislav Golyanik
golyanik@mpi-inf.mpg.de