Vision Module is a Python library that provides webcam-based eye tracking. Extract facial features, train a model and predict gaze with an easy‑to‑use interface.
- Real‑time gaze estimation
- Multiple calibration workflows
- Optional filtering (Kalman / KDE)
- Model persistence – save / load a trained
GazeEstimator - Virtual-camera overlay that integrates with streaming software (e.g., OBS) via the bundled
eyetrax-virtualcamCLI
git clone https://github.com/tgondil/iris && cd iris
# editable install — pick one
python -m pip install -e .
pip install uv && uv syncThe Vision Module package provides multiple command‑line entry points
| Command | Purpose |
|---|---|
eyetrax-demo |
Run an on‑screen gaze overlay demo |
eyetrax-virtualcam |
Stream the overlay to a virtual webcam |
eyetrax-stream |
Stream gaze data via WebSocket (for Chrome extension) |
Options
| Flag | Values | Default | Description |
|---|---|---|---|
--filter |
kalman, kde, none |
none |
Smoothing filter |
--camera |
int | 0 |
Physical webcam index |
--calibration |
9p, 5p, lissajous |
9p |
Calibration routine |
--background (demo only) |
path | — | Background image |
--confidence (KDE only) |
0–1 | 0.5 |
Contour probability |
eyetrax-demo --filter kalmaneyetrax-virtualcam --filter kde --calibration 5pOBS_demo.mp4
Control Chrome with your voice and eyes! The Iris Voice & Gaze extension enables:
- 🎤 Voice control - Speak into any text field on any webpage
- 👁️ Gaze tracking - Highlight webpage elements based on where you're looking
- 🤝 Combined workflow - Complete hands-free browsing experience
-
Load the Chrome extension:
- Open
chrome://extensions - Enable Developer mode
- Click "Load unpacked"
- Select the
chrome_gaze_latchfolder
- Open
-
Start using voice:
- Click any text field on any webpage
- Start speaking - your words appear automatically!
-
Keyboard shortcut:
Cmd+Shift+S(Mac) orCtrl+Shift+S(Windows/Linux) to toggle speech
- Start the gaze server:
eyetrax-stream --filter kalman --calibration 9p- Browse the web - Elements will glow cyan as you look at them!
- 🎤 Real-time speech recognition - Uses Chrome's Web Speech API
- 🎯 Automatic activation - Voice starts when text fields are focused
- 👁️ Real-time gaze tracking - Element highlighting based on eye position
- ⏱️ Dwell-time filtering - Prevents accidental highlights (250ms)
- 🔄 Auto-reconnect - Seamless reconnection to gaze server
- ⌨️ Keyboard shortcuts - Quick toggle for speech recognition
- 🖱️ EEG support - Ready for brain-computer interface integration
- Look at a text field (gaze highlights it)
- Click (mouse, keyboard, or future EEG trigger)
- Speak your text (voice recognition fills it in)
- Look at submit button
- Confirm (click or EEG trigger)
- Quick Start:
chrome_gaze_latch/QUICKSTART.md - Installation:
chrome_gaze_latch/INSTALL.md - Full Documentation:
chrome_gaze_latch/README.md - Test Page:
chrome_gaze_latch/test.html
See also: VOICE_CONTROL.md for all voice control options.
from vision_module import GazeEstimator, run_9_point_calibration
import cv2
# Create estimator and calibrate
estimator = GazeEstimator()
run_9_point_calibration(estimator)
# Save model
estimator.save_model("gaze_model.pkl")
# Load model
estimator = GazeEstimator()
estimator.load_model("gaze_model.pkl")
cap = cv2.VideoCapture(0)
while True:
# Extract features from frame
ret, frame = cap.read()
features, blink = estimator.extract_features(frame)
# Predict screen coordinates
if features is not None and not blink:
x, y = estimator.predict([features])[0]
print(f"Gaze: ({x:.0f}, {y:.0f})")If you find this Vision Module useful, consider starring the repo or contributing. The project is available under the MIT license.
Based on EyeTrax by Chenkai Zhang.
-
Purpose Exposes a small FastAPI service to forward on-screen context to a Letta Cloud personal assistant agent and return a predicted next action.
-
Setup
- Set env vars before starting:
LETTA_PROJECTLETTA_TOKENLETTA_AGENT_ID
- Set env vars before starting:
-
Run
uvicorn letta.app:app --host 0.0.0.0 --port 8001
-
Health check
curl http://localhost:8001/health
-
Predict endpoint
curl -X POST http://localhost:8001/letta/predict \ -H 'Content-Type: application/json' \ -d '{ "context_text": "User is in VS Code reviewing API docs; next they want to scaffold an endpoint.", "metadata": {"app": "vscode"} }'
Returns JSON with
action,run_id, and rawmessagesfrom Letta.
