Embodied Vision · World Models · Autonomous Driving · 3D Scene Understanding
H-EmbodVis (Huazhong University of Science and Technology Embodied Vision Projects) is a research initiative. We primarily focus on Embodied AI, while also exploring Autonomous Driving and Generative Models.
We focus on building intelligent systems that can perceive, understand, and interact with the physical world. Key directions include:
- Embodied AI & Agents: Integrating vision, language, and action planning.
- World Models for Autonomous Driving: Developing end-to-end driving frameworks and simulators.
- 3D Vision & Point Cloud Analysis: Efficient architectures for 3D representation learning.
- Multimodal Foundation Models: Large-scale models for diverse data modalities.
- HERMES (ICCV 2025) A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation.
- Orion (ICCV 2025) Holistic End-to-End Autonomous Driving via Vision-Language Instructed Action Generation.
- Awesome-World-Model Curated collection of papers on World Models for Autonomous Driving and Robotics.
- PointMamba (NeurIPS 2024) State Space Models (Mamba) applied to Point Cloud Analysis.
- UniSeg3D (NeurIPS 2024) A Unified Framework for 3D Scene Understanding.
- PointGST (IEEE TPAMI) Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning.
- EasyCache Training-Free Video Diffusion Acceleration.
- NAUTILUS (NeurIPS 2025) A Large Multimodal Model for Underwater Scene Understanding.
- GRANT (AAAI 2026 Oral) Teaching Embodied Agents for Parallel Task Execution.
- MERGE (NeurIPS 2025) Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models.
We are always looking for passionate collaborators and students.
- Connect: Reach out via email (
dkliang@hust.edu.cn). - Reuse: Creating impactful open-source software is a core value. Please cite our papers if you use our code.