Skip to content

Conversation

@edenreich
Copy link
Contributor

Adds computer use capabilities including screenshot capture, mouse movement, mouse clicks, and keyboard typing. Includes a complete Docker-based example with Ubuntu GUI desktop, X11/Wayland support, and live screenshot streaming to web UI. Closes #358.

Key features:

  • Screenshot tool with streaming support
  • Mouse control (move and click)
  • Keyboard input (text and key combos)
  • Rate limiting and approval system
  • Complete Docker example with web UI integration

Technical details:

  • Supports both X11 and Wayland display servers
  • Screenshot streaming via WebSocket for live desktop viewing
  • Circular buffer for efficient screenshot storage
  • Rate limiting to prevent abuse
  • User approval system for sensitive operations
  • Docker example with headless Ubuntu desktop setup

Rename screenshot streaming UI components to use "Preview" terminology:
- Rename screenshot-overlay.js to preview-overlay.js
- Remove emoji from button (📷 Screenshots → Preview)
- Update overlay title from "Live Screenshot" to "Live Preview"
- Update user-facing messages to use "Preview" instead of "Screenshot"

This improves clarity and consistency in the web UI while keeping
internal implementation details (CSS classes, API endpoints) unchanged.

Signed-off-by: Eden Reich <eden.reich@gmail.com>
Signed-off-by: Eden Reich <eden.reich@gmail.com>
@edenreich
Copy link
Contributor Author

edenreich commented Jan 4, 2026

TODOs

  • Check whether it's a good idea to replace the switch from terminal to active window and back with a GUI window that is always on top - similar to how vercept AI is doing it and only show this when computer use is enabled and it's not a remote session over pty - basically only for local computer use it's necessary because the window is constantly changing focus
  • I should probably also have a visual indicator that the computer is currently watched when computer_use.screenshot.streaming is enabled

@edenreich edenreich changed the title feat: Add computer use tools for remote GUI automation feat: Add computer use tools for remote and local GUI automation Jan 4, 2026
Signed-off-by: Eden Reich <eden.reich@gmail.com>
Signed-off-by: Eden Reich <eden.reich@gmail.com>
Signed-off-by: Eden Reich <eden.reich@gmail.com>
**Thread Safety & Race Conditions:**
- Add thread-safe WindowCoordinator with serial DispatchQueue
- Protect window arrays (borders, click indicators, move trails) from concurrent access
- Fix segmentation faults during computer use tool execution
- Add process safety checks in writeEvent to prevent writes to dead processes

**Coordinate System Fixes:**
- Fix double coordinate conversion in click/move indicators
- Remove redundant Y-axis flip in Swift (Go already converts to macOS coords)
- Click indicators and move trails now appear at correct screen positions

**Control Event Architecture:**
- Add control event forwarder for GUI → TUI communication
- Implement dedicated always-open channel for pause/resume events
- Revert EventBridge.Tap() to simple unidirectional design
- Add GetEventBridge() to StateManager interface

**Code Organization:**
- Extract view classes into separate files:
  - ClickIndicator.swift: Circular ring indicator
  - MoveTrail.swift: Arrow showing mouse movement
  - ControlBar.swift: Pause/resume button bar
  - ImageThumbnail.swift: Full-screen image viewer
- Refactor monitorProcess into respawnWindow and restoreBorderOverlay
- Update build.sh to include all view files

**Manual Tool Execution Fixes:**
- Fix completion event status: "complete" → "completed"
- Add image attachments to completion events
- GetLatestScreenshot now properly shows completion and displays images
- Manual tools (!! syntax) now broadcast events to floating window

**Other Improvements:**
- Remove dead code in screenshot_server.go
- Simplify chat handler resume logic
- Clean up verbose comments in manager.go

Fixes segfaults during MouseClick/MouseMove operations and ensures
all visual indicators work correctly with proper event completion.

Signed-off-by: Eden Reich <eden.reich@gmail.com>
Signed-off-by: Eden Reich <eden.reich@gmail.com>
…ging

Signed-off-by: Eden Reich <eden.reich@gmail.com>
@ig-semantic-release-bot
Copy link

🎉 This PR is included in version 0.96.0-rc.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Signed-off-by: Eden Reich <eden.reich@gmail.com>
Co-authored-by: semantic-release-bot <semantic-release-bot@martynus.net>
@edenreich edenreich merged commit 7de2498 into main Jan 11, 2026
5 checks passed
@edenreich edenreich deleted the feat/computer-use-tools branch January 11, 2026 22:02
ig-semantic-release-bot bot pushed a commit that referenced this pull request Jan 11, 2026
## [0.96.0](v0.95.1...v0.96.0) (2026-01-11)

### 🚀 Features

* Add computer use tools for remote and local GUI automation ([#359](#359)) ([7de2498](7de2498))
@ig-semantic-release-bot
Copy link

🎉 This PR is included in version 0.96.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add tools for computer use

2 participants