Skip to content
@H-EmbodVis

H-EmbodVis

Embodied Vision Projects from Huazhong University of Science and Technology

H-EmbodVis

Embodied Vision · World Models · Autonomous Driving · 3D Scene Understanding

H-EmbodVis (Huazhong University of Science and Technology Embodied Vision Projects) is a research initiative. We primarily focus on Embodied AI, while also exploring Autonomous Driving and Generative Models.


🔬 Research Areas

We focus on building intelligent systems that can perceive, understand, and interact with the physical world. Key directions include:

  • Embodied AI & Agents: Integrating vision, language, and action planning.
  • World Models for Autonomous Driving: Developing end-to-end driving frameworks and simulators.
  • 3D Vision & Point Cloud Analysis: Efficient architectures for 3D representation learning.
  • Multimodal Foundation Models: Large-scale models for diverse data modalities.

🌟 Featured Projects

Autonomous Driving & World Models

  • HERMES (ICCV 2025) A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation.
  • Orion (ICCV 2025) Holistic End-to-End Autonomous Driving via Vision-Language Instructed Action Generation.
  • Awesome-World-Model Curated collection of papers on World Models for Autonomous Driving and Robotics.

3D Vision & Efficient Computing

  • PointMamba (NeurIPS 2024) State Space Models (Mamba) applied to Point Cloud Analysis.
  • UniSeg3D (NeurIPS 2024) A Unified Framework for 3D Scene Understanding.
  • PointGST (IEEE TPAMI) Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning.
  • EasyCache Training-Free Video Diffusion Acceleration.

Multimodal & Embodied Agents

  • NAUTILUS (NeurIPS 2025) A Large Multimodal Model for Underwater Scene Understanding.
  • GRANT (AAAI 2026 Oral) Teaching Embodied Agents for Parallel Task Execution.
  • MERGE (NeurIPS 2025) Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models.

Collaboration

We are always looking for passionate collaborators and students.

  • Connect: Reach out via email (dkliang@hust.edu.cn).
  • Reuse: Creating impactful open-source software is a core value. Please cite our papers if you use our code.

🌐 Website | 🎓 Google Scholar | 📂 Repositories

Pinned Loading

  1. GRANT GRANT Public

    [AAAI 2026 Oral] Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution

    Python 356 11

  2. PointMamba PointMamba Public

    Forked from LMD0311/PointMamba

    [NeurIPS 2024] PointMamba: A Simple State Space Model for Point Cloud Analysis

    Python

  3. UniSeg3D UniSeg3D Public

    Forked from dk-liang/UniSeg3D

    [NeurIPS 2024] A Unified Framework for 3D Scene Understanding

    Python

  4. Orion Orion Public

    Forked from xiaomi-mlab/Orion

    [ICCV 2025] Official code of "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation"

    Python

  5. HERMES HERMES Public

    Forked from LMD0311/HERMES

    [ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

    Python

  6. PointGST PointGST Public

    Forked from jerryfeng2003/PointGST

    [IEEE TPAMI] Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning

    Python

Repositories

Showing 10 of 13 repositories
  • .github Public

    readme

    H-EmbodVis/.github’s past year of commit activity
    0 0 0 0 Updated Feb 5, 2026
  • NAUTILUS Public

    [NeurIPS 2025] NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding

    H-EmbodVis/NAUTILUS’s past year of commit activity
    Python 350 27 1 0 Updated Dec 18, 2025
  • GRANT Public

    [AAAI 2026 Oral] Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution

    H-EmbodVis/GRANT’s past year of commit activity
    Python 356 Apache-2.0 11 0 0 Updated Dec 12, 2025
  • MERGE Public

    [NeurIPS 2025] More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models

    H-EmbodVis/MERGE’s past year of commit activity
    Python 215 Apache-2.0 18 0 0 Updated Oct 31, 2025
  • PointGST Public Forked from jerryfeng2003/PointGST

    [IEEE TPAMI] Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning

    H-EmbodVis/PointGST’s past year of commit activity
    Python 0 Apache-2.0 35 0 0 Updated Sep 12, 2025
  • EasyCache Public

    Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching

    H-EmbodVis/EasyCache’s past year of commit activity
    Python 283 Apache-2.0 4 1 0 Updated Aug 29, 2025
  • Awesome-World-Model Public Forked from LMD0311/Awesome-World-Model

    Collect some World Models for Autonomous Driving (and Robotic) papers.

    H-EmbodVis/Awesome-World-Model’s past year of commit activity
    0 71 0 0 Updated Jul 14, 2025
  • HERMES Public Forked from LMD0311/HERMES

    [ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

    H-EmbodVis/HERMES’s past year of commit activity
    Python 0 Apache-2.0 12 0 0 Updated Jul 13, 2025
  • Orion Public Forked from xiaomi-mlab/Orion

    [ICCV 2025] Official code of "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation"

    H-EmbodVis/Orion’s past year of commit activity
    Python 0 Apache-2.0 59 0 0 Updated Jun 26, 2025
  • PointMamba Public Forked from LMD0311/PointMamba

    [NeurIPS 2024] PointMamba: A Simple State Space Model for Point Cloud Analysis

    H-EmbodVis/PointMamba’s past year of commit activity
    Python 0 Apache-2.0 37 0 0 Updated Mar 19, 2025

Top languages

Loading…

Most used topics

Loading…