🐢 Inspiration
As a graduate student at the University of Maryland, I wanted prospective students, remote learners, and visitors to experience UMD's campus in a truly immersive way — not a flat photo gallery or a boring video tour. When I discovered World Labs' Marble platform and its photorealistic Gaussian splat technology, I immediately thought: what if you could just say "take me to McKeldin Library" and walk through it with your own head movements?
The BigThink × World Labs hackathon was the perfect opportunity to build that.
🛠️ What I Built
Testudo Tours is a multimodal spatial campus explorer that combines:
- 🎤 Voice navigation — say any UMD landmark and instantly teleport to its 3D world
- 👤 Head tracking — MediaPipe Face Mesh detects yaw/pitch of your face in real time; turning your head rotates the Marble camera via OS-level mouse simulation
- 🌍 7 Gaussian splat worlds — Smith School of Business, Xfinity Center, Stamp Student Union, McKeldin Library, Gossett Football House, Riggs Alumni Center, and a Campus Hub portal
The system runs as a local Flask server (server.py) serving a full-screen web dashboard (index.html). When you say a location, Flask launches a Chrome window at the correct Marble world URL, auto-clicks the canvas to focus it, and then streams pyautogui mouse drag commands derived from your head orientation — all at ~25 fps.
⚙️ How I Built It
| Layer | Technology |
|---|---|
| 3D Worlds | World Labs Marble (Gaussian Splat) |
| Backend | Python · Flask · pyautogui · subprocess |
| Head Tracking | MediaPipe Face Mesh (browser, WebAssembly) |
| Voice Commands | Web Speech API (Chrome) |
| Frontend | Vanilla JS · HTML/CSS · full-screen portal UI |
| Camera Control | OS mouse simulation via pyautogui dragRel |
The head tracking math maps face landmark deltas to camera rotation:
$$\Delta\theta_{yaw} = x_{nose} - \frac{x_{leftEye} + x_{rightEye}}{2}$$
$$\Delta\phi_{pitch} = \frac{y_{nose} - \frac{y_{top} + y_{chin}}{2}}{y_{chin} - y_{top}}$$
These deltas are sent to Flask at 25 Hz, which translates them into pixel-level mouse drag commands inside the Marble Chrome window.
🧱 Challenges I Faced
The biggest challenge was that World Labs' Marble platform sets Content-Security-Policy: frame-ancestors 'none', which completely blocks iframe embedding. Every approach to embed the worlds — standard iframes, sandboxed iframes, even proxy rewriting — was blocked at the browser level.
The solution was to abandon embedding entirely and instead open Marble natively in a separate Chrome window, then control its camera externally via pyautogui OS mouse simulation. This required solving a secondary challenge: ensuring Chrome's WebGL canvas was the OS focus target before pyautogui started sending drag events — solved with triple-click focus sequences and a threading lock to prevent drag state corruption across concurrent HTTP requests.
📚 What I Learned
- Gaussian splat worlds are a fundamentally different rendering paradigm — there is no scene graph to manipulate programmatically; the only camera control is mouse drag
- MediaPipe Face Mesh runs entirely in the browser via WASM at ~30 fps with zero server round-trips for inference
- Web Speech API's continuous recognition mode works surprisingly well for real-time command detection with minimal latency
- Cross-origin security policies (CSP, CORS) are strict enough that creative OS-level workarounds sometimes become necessary
🔮 What's Next
- WebXR / VR headset support — pipe head orientation from the headset's IMU directly to camera control, replacing face tracking
- Multi-user tours — a guide leads a group through campus worlds simultaneously
- AI campus guide — an LLM-powered Testudo mascot that answers questions about each building as you explore it
- Mobile version — use device gyroscope for head-like orientation control on phones
Log in or sign up for Devpost to join the conversation.