Skip to content

tgondil/iris

Repository files navigation

Iris Vision Module

License: MIT made-with-python

Demo

Vision Module is a Python library that provides webcam-based eye tracking. Extract facial features, train a model and predict gaze with an easy‑to‑use interface.

Features

  • Real‑time gaze estimation
  • Multiple calibration workflows
  • Optional filtering (Kalman / KDE)
  • Model persistence – save / load a trained GazeEstimator
  • Virtual-camera overlay that integrates with streaming software (e.g., OBS) via the bundled eyetrax-virtualcam CLI

Installation

From source

git clone https://github.com/tgondil/iris && cd iris

# editable install — pick one
python -m pip install -e .
pip install uv && uv sync

Demo

The Vision Module package provides multiple command‑line entry points

Command Purpose
eyetrax-demo Run an on‑screen gaze overlay demo
eyetrax-virtualcam Stream the overlay to a virtual webcam
eyetrax-stream Stream gaze data via WebSocket (for Chrome extension)

Options

Flag Values Default Description
--filter kalman, kde, none none Smoothing filter
--camera int 0 Physical webcam index
--calibration 9p, 5p, lissajous 9p Calibration routine
--background (demo only) path Background image
--confidence (KDE only) 0–1 0.5 Contour probability

Quick Examples

eyetrax-demo --filter kalman
eyetrax-virtualcam --filter kde --calibration 5p

Virtual camera demo

OBS_demo.mp4

🌐 Chrome Extension - Voice & Gaze Control

Control Chrome with your voice and eyes! The Iris Voice & Gaze extension enables:

  • 🎤 Voice control - Speak into any text field on any webpage
  • 👁️ Gaze tracking - Highlight webpage elements based on where you're looking
  • 🤝 Combined workflow - Complete hands-free browsing experience

Quick Start - Voice Control

  1. Load the Chrome extension:

    • Open chrome://extensions
    • Enable Developer mode
    • Click "Load unpacked"
    • Select the chrome_gaze_latch folder
  2. Start using voice:

    • Click any text field on any webpage
    • Start speaking - your words appear automatically!
  3. Keyboard shortcut:

    • Cmd+Shift+S (Mac) or Ctrl+Shift+S (Windows/Linux) to toggle speech

Quick Start - Gaze Tracking

  1. Start the gaze server:
eyetrax-stream --filter kalman --calibration 9p
  1. Browse the web - Elements will glow cyan as you look at them!

Features

  • 🎤 Real-time speech recognition - Uses Chrome's Web Speech API
  • 🎯 Automatic activation - Voice starts when text fields are focused
  • 👁️ Real-time gaze tracking - Element highlighting based on eye position
  • ⏱️ Dwell-time filtering - Prevents accidental highlights (250ms)
  • 🔄 Auto-reconnect - Seamless reconnection to gaze server
  • ⌨️ Keyboard shortcuts - Quick toggle for speech recognition
  • 🖱️ EEG support - Ready for brain-computer interface integration

Complete Hands-Free Workflow

  1. Look at a text field (gaze highlights it)
  2. Click (mouse, keyboard, or future EEG trigger)
  3. Speak your text (voice recognition fills it in)
  4. Look at submit button
  5. Confirm (click or EEG trigger)

Documentation

  • Quick Start: chrome_gaze_latch/QUICKSTART.md
  • Installation: chrome_gaze_latch/INSTALL.md
  • Full Documentation: chrome_gaze_latch/README.md
  • Test Page: chrome_gaze_latch/test.html

See also: VOICE_CONTROL.md for all voice control options.

Library Usage

from vision_module import GazeEstimator, run_9_point_calibration
import cv2

# Create estimator and calibrate
estimator = GazeEstimator()
run_9_point_calibration(estimator)

# Save model
estimator.save_model("gaze_model.pkl")

# Load model
estimator = GazeEstimator()
estimator.load_model("gaze_model.pkl")

cap = cv2.VideoCapture(0)

while True:
    # Extract features from frame
    ret, frame = cap.read()
    features, blink = estimator.extract_features(frame)

    # Predict screen coordinates
    if features is not None and not blink:
        x, y = estimator.predict([features])[0]
        print(f"Gaze: ({x:.0f}, {y:.0f})")

More

If you find this Vision Module useful, consider starring the repo or contributing. The project is available under the MIT license.

Based on EyeTrax by Chenkai Zhang.

Letta API (Personal Assistant)

  • Purpose Exposes a small FastAPI service to forward on-screen context to a Letta Cloud personal assistant agent and return a predicted next action.

  • Setup

    • Set env vars before starting:
      • LETTA_PROJECT
      • LETTA_TOKEN
      • LETTA_AGENT_ID
  • Run

    uvicorn letta.app:app --host 0.0.0.0 --port 8001
  • Health check

    curl http://localhost:8001/health
  • Predict endpoint

    curl -X POST http://localhost:8001/letta/predict \
      -H 'Content-Type: application/json' \
      -d '{
        "context_text": "User is in VS Code reviewing API docs; next they want to scaffold an endpoint.",
        "metadata": {"app": "vscode"}
      }'

    Returns JSON with action, run_id, and raw messages from Letta.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •