Build smart AI apps for smart glasses, fast.
GlassKit is an open-source dev suite for building vision-enabled smart glasses apps. It provides SDKs and backends that turn real-time camera and microphone streams into specialized AI responses and actions, tailored to your workflow.
Today: this repository focuses on end-to-end examples you can adapt. Next: reusable SDKs + a production-ready backend are coming up.
| IKEA assembly assistant | Sushi speedrun HUD | Privacy filter |
|---|---|---|
demo.webm |
demo.webm |
demo.mp4 |
|
Code ➡️ ·
Code (+ RF-DETR) ➡️
Real-time, vision-enabled voice assistant for Rokid Glasses. Streams mic + camera over WebRTC to the OpenAI Realtime API, plays back speech, and uses tool calls to guide tasks like IKEA assembly steps. The RF-DETR variant adds object detection and passes annotated frames to OpenAI for better visual understanding. |
Code ➡️
Real-world speedrun HUD for Rokid Glasses. Streams video over WebRTC with a data channel to the backend, which runs a fine-tuned RF-DETR object detector for automatic, hands-free split completion based on a configured route. |
Code ➡️
Real-time privacy filter that sits between the camera and app. Anonymizes faces without consent, detects and remembers verbal consent, and runs locally with recording support. |
Smart glasses apps are hard.
- Generic vision-capable LLMs often fail at real-world task support.
- Each glasses brand has different hardware, form factors, and frameworks.
- Real-time camera + mic streaming is non-trivial to build correctly and ergonomically.
GlassKit is built around:
- Vision model orchestration: choose the right mix of multimodal LLMs and object detectors for the job.
- Visual context management: define what the AI should know and how it is represented.
- Real-time streaming: camera + mic in, responses out, with sane developer ergonomics.
You define your AI with visual/textual context and your business logic. Then your app works like this:
- Camera frames and audio stream from the glasses to the backend via the SDK
- The backend processes inputs using vision models and LLMs with your custom context + logic
- Responses stream back to the glasses and the wearer via the SDK
You handle the app logic. GlassKit handles the glasses-to-AI pipeline.
- Pick an example from
examples/ - Open its README and follow the setup steps
- Run it, then modify for your workflow
GlassKit is early and under active development, but the examples are usable today.
- Current focus: end-to-end templates you can clone and adapt
- Coming next: reusable SDKs + production-ready backends
- Developer experience: demo video recording tooling; observability + debuggability tools
- Platform support today: Rokid Glasses
- Planned support: Meta glasses, Android XR, Mentra, and more
Contributions are welcome!