🐢 Inspiration

As a graduate student at the University of Maryland, I wanted prospective students, remote learners, and visitors to experience UMD's campus in a truly immersive way — not a flat photo gallery or a boring video tour. When I discovered World Labs' Marble platform and its photorealistic Gaussian splat technology, I immediately thought: what if you could just say "take me to McKeldin Library" and walk through it with your own head movements?

The BigThink × World Labs hackathon was the perfect opportunity to build that.


🛠️ What I Built

Testudo Tours is a multimodal spatial campus explorer that combines:

  • 🎤 Voice navigation — say any UMD landmark and instantly teleport to its 3D world
  • 👤 Head tracking — MediaPipe Face Mesh detects yaw/pitch of your face in real time; turning your head rotates the Marble camera via OS-level mouse simulation
  • 🌍 7 Gaussian splat worlds — Smith School of Business, Xfinity Center, Stamp Student Union, McKeldin Library, Gossett Football House, Riggs Alumni Center, and a Campus Hub portal

The system runs as a local Flask server (server.py) serving a full-screen web dashboard (index.html). When you say a location, Flask launches a Chrome window at the correct Marble world URL, auto-clicks the canvas to focus it, and then streams pyautogui mouse drag commands derived from your head orientation — all at ~25 fps.


⚙️ How I Built It

Layer Technology
3D Worlds World Labs Marble (Gaussian Splat)
Backend Python · Flask · pyautogui · subprocess
Head Tracking MediaPipe Face Mesh (browser, WebAssembly)
Voice Commands Web Speech API (Chrome)
Frontend Vanilla JS · HTML/CSS · full-screen portal UI
Camera Control OS mouse simulation via pyautogui dragRel

The head tracking math maps face landmark deltas to camera rotation:

$$\Delta\theta_{yaw} = x_{nose} - \frac{x_{leftEye} + x_{rightEye}}{2}$$

$$\Delta\phi_{pitch} = \frac{y_{nose} - \frac{y_{top} + y_{chin}}{2}}{y_{chin} - y_{top}}$$

These deltas are sent to Flask at 25 Hz, which translates them into pixel-level mouse drag commands inside the Marble Chrome window.


🧱 Challenges I Faced

The biggest challenge was that World Labs' Marble platform sets Content-Security-Policy: frame-ancestors 'none', which completely blocks iframe embedding. Every approach to embed the worlds — standard iframes, sandboxed iframes, even proxy rewriting — was blocked at the browser level.

The solution was to abandon embedding entirely and instead open Marble natively in a separate Chrome window, then control its camera externally via pyautogui OS mouse simulation. This required solving a secondary challenge: ensuring Chrome's WebGL canvas was the OS focus target before pyautogui started sending drag events — solved with triple-click focus sequences and a threading lock to prevent drag state corruption across concurrent HTTP requests.


📚 What I Learned

  • Gaussian splat worlds are a fundamentally different rendering paradigm — there is no scene graph to manipulate programmatically; the only camera control is mouse drag
  • MediaPipe Face Mesh runs entirely in the browser via WASM at ~30 fps with zero server round-trips for inference
  • Web Speech API's continuous recognition mode works surprisingly well for real-time command detection with minimal latency
  • Cross-origin security policies (CSP, CORS) are strict enough that creative OS-level workarounds sometimes become necessary

🔮 What's Next

  • WebXR / VR headset support — pipe head orientation from the headset's IMU directly to camera control, replacing face tracking
  • Multi-user tours — a guide leads a group through campus worlds simultaneously
  • AI campus guide — an LLM-powered Testudo mascot that answers questions about each building as you explore it
  • Mobile version — use device gyroscope for head-like orientation control on phones

Built With

Share this project:

Updates