Iris Vision Module

Vision Module is a Python library that provides webcam-based eye tracking. Extract facial features, train a model and predict gaze with an easy‑to‑use interface.

Features

Real‑time gaze estimation
Multiple calibration workflows
Optional filtering (Kalman / KDE)
Model persistence – save / load a trained GazeEstimator
Virtual-camera overlay that integrates with streaming software (e.g., OBS) via the bundled eyetrax-virtualcam CLI

Installation

From source

git clone https://github.com/tgondil/iris && cd iris

# editable install — pick one
python -m pip install -e .
pip install uv && uv sync

Demo

The Vision Module package provides multiple command‑line entry points

Command	Purpose
`eyetrax-demo`	Run an on‑screen gaze overlay demo
`eyetrax-virtualcam`	Stream the overlay to a virtual webcam
`eyetrax-stream`	Stream gaze data via WebSocket (for Chrome extension)

Options

Flag	Values	Default	Description
`--filter`	`kalman`, `kde`, `none`	`none`	Smoothing filter
`--camera`	int	`0`	Physical webcam index
`--calibration`	`9p`, `5p`, `lissajous`	`9p`	Calibration routine
`--background` (demo only)	path	—	Background image
`--confidence` (KDE only)	0–1	`0.5`	Contour probability

Quick Examples

eyetrax-demo --filter kalman

eyetrax-virtualcam --filter kde --calibration 5p

Virtual camera demo

OBS_demo.mp4

🌐 Chrome Extension - Voice & Gaze Control

Control Chrome with your voice and eyes! The Iris Voice & Gaze extension enables:

🎤 Voice control - Speak into any text field on any webpage
👁️ Gaze tracking - Highlight webpage elements based on where you're looking
🤝 Combined workflow - Complete hands-free browsing experience

Quick Start - Voice Control

Load the Chrome extension:
- Open chrome://extensions
- Enable Developer mode
- Click "Load unpacked"
- Select the chrome_gaze_latch folder
Start using voice:
- Click any text field on any webpage
- Start speaking - your words appear automatically!
Keyboard shortcut:
- Cmd+Shift+S (Mac) or Ctrl+Shift+S (Windows/Linux) to toggle speech

Quick Start - Gaze Tracking

Start the gaze server:

eyetrax-stream --filter kalman --calibration 9p

Browse the web - Elements will glow cyan as you look at them!

Features

🎤 Real-time speech recognition - Uses Chrome's Web Speech API
🎯 Automatic activation - Voice starts when text fields are focused
👁️ Real-time gaze tracking - Element highlighting based on eye position
⏱️ Dwell-time filtering - Prevents accidental highlights (250ms)
🔄 Auto-reconnect - Seamless reconnection to gaze server
⌨️ Keyboard shortcuts - Quick toggle for speech recognition
🖱️ EEG support - Ready for brain-computer interface integration

Complete Hands-Free Workflow

Look at a text field (gaze highlights it)
Click (mouse, keyboard, or future EEG trigger)
Speak your text (voice recognition fills it in)
Look at submit button
Confirm (click or EEG trigger)

Documentation

Quick Start: chrome_gaze_latch/QUICKSTART.md
Installation: chrome_gaze_latch/INSTALL.md
Full Documentation: chrome_gaze_latch/README.md
Test Page: chrome_gaze_latch/test.html

See also: VOICE_CONTROL.md for all voice control options.

Library Usage

from vision_module import GazeEstimator, run_9_point_calibration
import cv2

# Create estimator and calibrate
estimator = GazeEstimator()
run_9_point_calibration(estimator)

# Save model
estimator.save_model("gaze_model.pkl")

# Load model
estimator = GazeEstimator()
estimator.load_model("gaze_model.pkl")

cap = cv2.VideoCapture(0)

while True:
    # Extract features from frame
    ret, frame = cap.read()
    features, blink = estimator.extract_features(frame)

    # Predict screen coordinates
    if features is not None and not blink:
        x, y = estimator.predict([features])[0]
        print(f"Gaze: ({x:.0f}, {y:.0f})")

More

If you find this Vision Module useful, consider starring the repo or contributing. The project is available under the MIT license.

Based on EyeTrax by Chenkai Zhang.

Letta API (Personal Assistant)

Purpose Exposes a small FastAPI service to forward on-screen context to a Letta Cloud personal assistant agent and return a predicted next action.
Setup
- Set env vars before starting:
  - LETTA_PROJECT
  - LETTA_TOKEN
  - LETTA_AGENT_ID

Run

uvicorn letta.app:app --host 0.0.0.0 --port 8001

Health check
```
curl http://localhost:8001/health
```

Predict endpoint

curl -X POST http://localhost:8001/letta/predict \
  -H 'Content-Type: application/json' \
  -d '{
    "context_text": "User is in VS Code reviewing API docs; next they want to scaffold an endpoint.",
    "metadata": {"app": "vscode"}
  }'

Returns JSON with action, run_id, and raw messages from Letta.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
chrome_gaze_latch		chrome_gaze_latch
chrome_letta_actions		chrome_letta_actions
eyetrax-fresh		eyetrax-fresh
fetch_agent		fetch_agent
letta		letta
src/vision_module		src/vision_module
voice		voice
.env		.env
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
FINAL_SETUP_GUIDE.md		FINAL_SETUP_GUIDE.md
LICENSE		LICENSE
QUICK_SETUP_GUIDE.md		QUICK_SETUP_GUIDE.md
README.md		README.md
VOICE_CONTROL.md		VOICE_CONTROL.md
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
setup_iris.sh		setup_iris.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iris Vision Module

Features

Installation

From source

Demo

Quick Examples

Virtual camera demo

🌐 Chrome Extension - Voice & Gaze Control

Quick Start - Voice Control

Quick Start - Gaze Tracking

Features

Complete Hands-Free Workflow

Documentation

Library Usage

More

Letta API (Personal Assistant)

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

tgondil/iris

Folders and files

Latest commit

History

Repository files navigation

Iris Vision Module

Features

Installation

From source

Demo

Quick Examples

Virtual camera demo

🌐 Chrome Extension - Voice & Gaze Control

Quick Start - Voice Control

Quick Start - Gaze Tracking

Features

Complete Hands-Free Workflow

Documentation

Library Usage

More

Letta API (Personal Assistant)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages