Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
ebe5cef
feat: Add computer use tools with screenshot streaming and remote GUI…
edenreich Jan 2, 2026
841bb09
refactor(web): Rename screenshot overlay to preview
edenreich Jan 3, 2026
5bd5520
feat: Add typing delay and improve X11 keyboard support
edenreich Jan 3, 2026
9c32649
refactor: Refactor computer use tools with display protocol abstraction
edenreich Jan 3, 2026
ab05f03
docs: Add computer use tools documentation
edenreich Jan 3, 2026
0153abc
feat: Add macOS support and focus management tools
edenreich Jan 4, 2026
69ead75
chore: Enable computer use
edenreich Jan 4, 2026
c863fef
feat: Add floating window for computer use
edenreich Jan 4, 2026
75610a0
refactor: Replace interface{} with any type
edenreich Jan 4, 2026
b10a45f
feat: Add native macOS floating window for computer use
edenreich Jan 7, 2026
ed38ee7
refactor: Improve UI layout and mouse click logic
edenreich Jan 8, 2026
157bc38
refactor: Remove horizontal scrolling from ComputerUse dialog
edenreich Jan 8, 2026
0ed1361
fix: Window title overlapping the conversation history
edenreich Jan 8, 2026
c69d5b4
refactor: Add some space when approval actions appear
edenreich Jan 8, 2026
d0b6631
refactor: Progress on events
edenreich Jan 9, 2026
32adf3b
fix: Thread safety, coordinate fixes, and manual tool execution
edenreich Jan 9, 2026
ffe52bf
chore(deps): Bump claude-code
edenreich Jan 10, 2026
cdc8657
refactor: Forward inference gateway cli logs to stdout for easy debug…
edenreich Jan 10, 2026
a961db4
rc: Feature computer use tools (#361)
edenreich Jan 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
775 changes: 387 additions & 388 deletions .flox/env/manifest.lock

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion .flox/env/manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ ripgrep.version = "^15.1.0"
markdownlint-cli.pkg-path = "markdownlint-cli"
markdownlint-cli.version = "^0.47.0"
claude-code.pkg-path = "claude-code"
claude-code.version = "^2.0.76"
claude-code.version = "^2.1.1"
docker.pkg-path = "docker"
docker.version = "^29.1.2"
docker-compose.pkg-path = "docker-compose"
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/artifacts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,13 @@ jobs:
golang:1.25-alpine3.23 \
sh -c "go build -ldflags '-w -s -X github.com/inference-gateway/cli/cmd.version=${{ steps.version.outputs.version }} -X github.com/inference-gateway/cli/cmd.commit=${{ steps.version.outputs.commit }} -X github.com/inference-gateway/cli/cmd.date=${{ steps.version.outputs.date }}' -o infer-${{ matrix.goos }}-${{ matrix.goarch }} ."

- name: Computer Use App (macOS only)
if: matrix.goos == 'darwin'
run: |
cd internal/display/macos/ComputerUse
./build.sh
cd ../../../..

- name: Build binary (macOS with CGO for clipboard image support)
if: matrix.goos == 'darwin'
env:
Expand Down
102 changes: 102 additions & 0 deletions .infer/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ client:
logging:
debug: false
dir: ""
console_output: ""
tools:
enabled: true
sandbox:
Expand Down Expand Up @@ -169,6 +170,41 @@ agent:
- The system supports up to 5 concurrent tool executions by default
- This reduces back-and-forth communication and significantly improves performance

COMPUTER USE TOOLS:
You have TWO ways to interact with the system:
1. Direct terminal tools (PRIMARY): Bash, Read, Write, Edit, Grep, etc.
2. GUI automation tools (FALLBACK): MouseMove, KeyboardType, MouseClick, GetLatestScreenshot

CRITICAL: ALWAYS prefer direct terminal tools over GUI automation when possible.

When to use DIRECT tools (preferred):
- Reading files: Use Read tool, NOT KeyboardType to open an editor
- Writing files: Use Write/Edit tools, NOT GUI text editor
- Running commands: Use Bash tool, NOT KeyboardType in a terminal window
- Searching code: Use Grep tool, NOT opening files via GUI
- File operations: Use Bash/Read/Write, NOT GUI file manager

When to use GUI tools (only when necessary):
- Interacting with graphical applications that have no CLI equivalent
- Testing UI behavior or visual elements
- Automating tasks that MUST be done through a GUI
- Taking screenshots to inspect visual state

Why prefer direct tools:
- 10-100x faster execution (no GUI rendering delays)
- More reliable (no window focus issues, no timing problems)
- Precise output (structured data, not visual interpretation)
- Parallel execution support (batch multiple operations)
- Lower resource usage (no display server overhead)

Example - WRONG approach:
<tool>MouseMove(x=100, y=200)</tool>
<tool>MouseClick(button="left")</tool>
<tool>KeyboardType(text="cat file.txt")</tool>

Example - CORRECT approach:
<tool>Read(file_path="/path/to/file.txt")</tool>

WORKFLOW:
When asked to implement features or fix issues:
1. Plan with TodoWrite
Expand Down Expand Up @@ -255,6 +291,32 @@ agent:
FOCUS: System operations, service management, monitoring, diagnostics, and infrastructure tasks.

CONTEXT: This is a shared system environment, not a project workspace. Users may be managing servers, containers, services, or general infrastructure.

COMPUTER USE TOOLS:
You have TWO ways to interact with the system:
1. Direct terminal tools (PRIMARY): Bash, Read, Write, Edit, Grep, etc.
2. GUI automation tools (FALLBACK): MouseMove, KeyboardType, MouseClick, GetLatestScreenshot

CRITICAL: ALWAYS prefer direct terminal tools over GUI automation when possible.

When to use DIRECT tools (preferred):
- Reading files: Use Read tool, NOT KeyboardType to open an editor
- Writing files: Use Write/Edit tools, NOT GUI text editor
- Running commands: Use Bash tool, NOT KeyboardType in a terminal window
- Searching code: Use Grep tool, NOT opening files via GUI
- System operations: Use Bash for systemctl, journalctl, docker, etc.

When to use GUI tools (only when necessary):
- Interacting with graphical applications that have no CLI equivalent
- Testing UI behavior or visual elements
- Remote desktop administration tasks that MUST be done through a GUI

Why prefer direct tools:
- 10-100x faster execution (no GUI rendering delays)
- More reliable (no window focus issues, no timing problems)
- Works over SSH without X11 forwarding
- Precise output (structured data, not visual interpretation)
- Lower resource usage (critical for remote systems)
system_reminders:
enabled: true
interval: 4
Expand Down Expand Up @@ -637,4 +699,44 @@ web:
known_hosts_path: ~/.ssh/known_hosts
auto_install: true
install_version: latest
install_dir: ~/.local/bin
servers: []
computer_use:
enabled: false
floating_window:
enabled: true
respawn_on_close: true
position: top-right
always_on_top: true
screenshot:
enabled: true
max_width: 1920
max_height: 1080
target_width: 1024
target_height: 768
format: jpeg
quality: 85
streaming_enabled: true
capture_interval: 3
buffer_size: 5
temp_dir: ""
log_captures: false
show_overlay: true
mouse_move:
enabled: true
mouse_click:
enabled: true
mouse_scroll:
enabled: true
keyboard_type:
enabled: true
max_text_length: 1000
typing_delay_ms: 100
get_focused_app:
enabled: true
activate_app:
enabled: true
rate_limit:
enabled: true
max_actions_per_minute: 60
window_seconds: 60
20 changes: 2 additions & 18 deletions .infer/mcp.yaml
Original file line number Diff line number Diff line change
@@ -1,24 +1,8 @@
---
enabled: true
enabled: false
connection_timeout: 30
discovery_timeout: 30
liveness_probe_enabled: true
liveness_probe_interval: 10
max_retries: 10
servers:
- name: context7
enabled: true
description: Context7 - Up-to-date code documentation for LLMs
run: true
host: localhost
scheme: http
ports:
- "8010:8010"
path: /mcp
oci: mekayelanik/context7-mcp:stable
startup_timeout: 90
health_cmd: 'wget --spider -q http://localhost:8010/healthz || exit 1'
env:
PORT: "8010"
NODE_ENV: "production"
PROTOCOL: "SHTTP"
servers: []
63 changes: 63 additions & 0 deletions CHANGELOG.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ USER infer

ENV INFER_GATEWAY_RUN=false
ENV INFER_GATEWAY_URL=http://inference-gateway:8080
ENV INFER_LOGGING_CONSOLE_OUTPUT=stderr
ENV TERM=xterm-256color
ENV COLORTERM=truecolor

Expand Down
4 changes: 4 additions & 0 deletions Taskfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,11 @@ tasks:
- go run github.com/maxbrunsfeld/counterfeiter/v6 -o tests/mocks/domain internal/domain TaskTracker
- go run github.com/maxbrunsfeld/counterfeiter/v6 -o tests/mocks/domain internal/domain A2AAgentService
- go run github.com/maxbrunsfeld/counterfeiter/v6 -o tests/mocks/domain internal/domain MCPClient
- go run github.com/maxbrunsfeld/counterfeiter/v6 -o tests/mocks/domain internal/domain RateLimiter
- go run github.com/maxbrunsfeld/counterfeiter/v6 -o tests/mocks/domain internal/infra/storage ConversationStorage
- mkdir -p tests/mocks/display
- go run github.com/maxbrunsfeld/counterfeiter/v6 -o tests/mocks/display internal/display DisplayController
- go run github.com/maxbrunsfeld/counterfeiter/v6 -o tests/mocks/display internal/display Provider
- mkdir -p tests/mocks/services
- go run github.com/maxbrunsfeld/counterfeiter/v6 -o tests/mocks/services internal/services TitleGenerator
- mkdir -p tests/mocks/shortcuts
Expand Down
4 changes: 2 additions & 2 deletions cmd/agents.go
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ type ExternalAgent struct {
}

// getConfig loads the configuration from viper
func getConfig(cmd *cobra.Command) (*config.Config, error) {
func getConfig(_ *cobra.Command) (*config.Config, error) {
cfg, err := getConfigFromViper()
if err != nil {
return nil, fmt.Errorf("failed to load config: %w", err)
Expand Down Expand Up @@ -425,7 +425,7 @@ func listAgents(cmd *cobra.Command, args []string) error {
format, _ := cmd.Flags().GetString("format")

if format == "json" {
combinedOutput := map[string]interface{}{
combinedOutput := map[string]any{
"local": localAgents,
"external": externalAgents,
"total": totalAgents,
Expand Down
Loading