AI-CLI-Wrappers

This directory contains robust shell wrappers that provide a unified interface for various AI CLI tools (Claude, Gemini, Codex, OpenCode, Cursor, Antigravity). These wrappers are designed to be orchestrated by go-autonom8/climanager but can also be used independently.

Overview

The wrappers standardize the execution of AI agents by handling:

Persona Extraction: Parsing agent roles from Markdown files (.md).
Context Injection: Automatically loading CONTEXT.md or other project context.
Session Management: Handling session persistence, resumption, and creation across different providers.
Skill Execution: Invoking specific skills with structured input.
Process Lifecycle: Managing timeouts, signal handling, and cleanup of child processes.
Tool Access: configuring sandbox permissions and MCP tool access (YOLO mode).
Tool Activity Telemetry: extracting tool/function call counts, classes, and error signals into the response envelope.
JSON Output: Ensuring responses are returned in a structured JSON format for the caller.

Available Wrappers

Wrapper	Provider	Key Features
`claude.sh`	Anthropic Claude	Project-based sessions (`~/.claude/projects/`), cold start handling.
`gemini.sh`	Google Gemini	Native skills registry, MCP server support, index-based sessions.
`codex.sh`	OpenAI/Codex	Sandbox configuration (`danger-full-access`), playright browser support.
`opencode.sh`	OpenCode	Catalog-backed model normalization/fallback, optimized for fast code generation.
`cursor.sh`	Cursor Agent	Workspace-aware, beta skills support, auto-approval for MCPs.
`agravity.sh`	Google Antigravity (`agy`)	`--print=` non-interactive runs, UUID conversation resume, transcript-derived reasoning/tool telemetry.

Unified Interface

All wrappers support a common set of arguments to ensure interchangeable usage by the climanager.

Common Flags

Flag	Description
`--persona <ID>`	Selects a specific persona block from the agent file (e.g., `pm-claude`).
`--temperature <0.0-1.0>`	Sets the LLM temperature (if supported by provider).
`--context <File>`	Explicit path to a context file (e.g., `CONTEXT.md`).
`--context-dir <Dir>`	Directory to search for `CONTEXT.md` and project context.
`--context-max <Bytes>`	Max context file size in bytes (default: 51200 / 50KB). Truncates with warning if exceeded.
`--skip-context-file`	Disables context loading (for pure logic/schema tasks).
`--timeout <Seconds>`	Sets a hard timeout for the execution (includes cleanup buffer).
`--yolo`	Enables "YOLO mode" - bypasses permission prompts (e.g., `--dangerously-skip-permissions`).
`--allowed-tools`	Explicitly enables MCP tools/sandboxed execution.
`--model <Name>`	Model selection (e.g., `opus`, `sonnet`, `haiku`, or full model name).
`--permission-mode <Mode>`	Permission mode (e.g., `plan`, `default`). Maps to provider-specific flags.
`--dry-run`	Validates arguments, agent file, and prompt size without making an API call. Returns comprehensive validation JSON.
`--verbose` / `--debug`	Enables debug logging to stderr.

Session Management

Flag	Description
`--session-id <ID>`	Resumes an existing session.
`--resume <ID>`	Alias for `--session-id`.
`--new-session`	Creates a new session and captures the ID from the response (Claude).
`--manage-session <ID>`	Manages a named/tracked session (Gemini/Codex).

Skill Execution

Flag	Description
`--skill <Name>`	Invokes a specific skill instead of a full agent prompt.

Diagnostics & Telemetry

Flag	Description	Availability
`--health-check`	Returns provider CLI availability, version, and latency as JSON. No inference call made.	All wrappers
`--quota-status`	Checks cached usage-limit files and returns quota exhaustion status with estimated reset time.	Claude, Codex, Cursor
`--reasoning-fallback`	Emits reasoning and token telemetry from session logs without invoking the provider CLI. Requires `--session-id`.	All wrappers

Positional Arguments & Input

Agent File: The last positional argument should be the path to the agent definition file (.md).
Input Data: JSON input data is passed via stdin.

Example:

echo '{"task": "Analyze this code"}' | ./bin/claude.sh \
  --persona pm-claude \
  --context-dir /path/to/project \
  --timeout 120 \
  --yolo \
  agents/pm-agent.md

Output Format

The wrappers print JSON to stdout via emit_cli_response. All wrappers share this envelope:

Success Response

{
  "response": "The actual text response from the LLM...",
  "session_id": "uuid-or-index",
  "reasoning": "Extracted thinking/reasoning from the model...",
  "tokens_used": {
    "input_tokens": 1200,
    "output_tokens": 450,
    "estimated_output_tokens": 425,
    "total_tokens": 1650,
    "cost_usd": 0.012,
    "cache_read_input_tokens": 800,
    "cache_creation_input_tokens": 0
  },
  "metadata": {
    "token_usage_available": true,
    "reasoning_available": true,
    "reasoning_source": "session_assistant",
    "reasoning_absent_reason": "available",
    "tool_activity": {
      "call_count": 4,
      "write_count": 1,
      "error_count": 0,
      "tool_names": ["Read", "Grep", "Edit"],
      "result_classes": ["read", "write"],
      "activity_class": "write_active",
      "source": "wrapper:claude"
    }
  },
  "model_resolution": "provider model 'requested' -> 'effective' (fallback)"
}

Optional fields:

"model_resolution" when the wrapper normalized or fell back from the requested model.
"skill" for skill-oriented wrapper paths.
"metadata.tool_activity" only appears when the wrapper observed one or more tool calls; omitted for pure-text responses.

Skill invocations include an extra "skill": "<name>" field. Markdown code fences are stripped automatically.

Error Response

Returned via emit_cli_error_response when a provider call fails:

{
  "response": "",
  "session_id": "uuid-if-available",
  "reasoning": "",
  "tokens_used": {
    "input_tokens": 0,
    "output_tokens": 0,
    "estimated_output_tokens": 0,
    "total_tokens": 0,
    "cost_usd": 0,
    "cache_read_input_tokens": 0,
    "cache_creation_input_tokens": 0
  },
  "metadata": {
    "token_usage_available": false,
    "reasoning_available": false,
    "reasoning_source": "none",
    "reasoning_absent_reason": "error_path"
  },
  "error": "Detailed error message...",
  "error_type": "timeout",
  "exit_code": 124,
  "recoverable": true
}

Error types: timeout, quota, rate_limit, invalid_model, invalid_session, invalid_input, provider_error, unknown. Errors marked recoverable: true signal to the caller that a retry or fallback is appropriate.

Model Selection and Fallback

Wrappers now harden invalid model handling locally instead of always failing the call on first contact.

Model resolution order:

Explicit --model <name> from the caller.
AI_CLI_PROVIDERS_CONFIG or AUTONOM8_PROVIDERS_CONFIG, when set.
A nearby repo config discovered from the working directory: providers.yaml, go-autonom8/providers.yaml, .ai-cli-wrappers/providers.yaml, or .autonom8/providers.yaml.
Bundled wrapper defaults in defaults/providers.yaml.
Provider-native current/default model where the CLI exposes a live catalog.

Provider configs can use aliases under models: plus default_model:. Wrappers resolve the alias before execution and emit model_resolution whenever normalization or fallback occurs. Provider-specific CLI mechanics still live inside each wrapper; callers should not need to know whether a provider expects a family alias, full model ID, or provider-prefixed ID.

Behavior by provider:

cursor.sh
- validates requested models against cursor-agent models
- falls back to the current/default provider model when the requested or configured model does not exist
opencode.sh
- validates requested models against opencode models
- resolves common tail aliases like gpt-5.1 to full IDs like openai/gpt-5.1
- uses config-backed defaults for no-model calls
- falls back to the live provider catalog if a configured default is stale
claude.sh, codex.sh, gemini.sh
- attempt the requested model first
- if the provider returns an invalid-model class error, retry once with provider default
- gemini.sh additionally retries gemini-2.5-flash when provider-default routing hits Gemini capacity exhaustion

If all config/default discovery fails and the provider CLI requires an explicit model, the wrapper fails fast with invalid_input instead of silently embedding a task-model choice in shell code.

Health Check Response

{
  "provider": "claude",
  "status": "ok",
  "latency_ms": 142,
  "cli_available": true,
  "version": "1.0.18",
  "session_support": true
}

Quota Status Response

{
  "provider": "claude",
  "quota_exhausted": true,
  "reset_at": "2026-03-12T15:30:00Z",
  "reset_in_seconds": 1800,
  "retry_time": "try again at 3:30 PM",
  "source": "cached"
}

Reasoning Extraction

All wrappers attempt to extract model reasoning/thinking from multiple sources, in priority order:

Raw output — _reasoning, reasoning, thinking, thoughts, analysis fields from the JSON response.
Session logs — thinking content blocks from assistant messages in the session file (Claude format: .message.content[].type == "thinking").
Response payload — Reasoning fields embedded inside fenced JSON blocks in the assistant text.
Stream output — Lines matching thought|thinking|reasoning|analysis|plan:|step [0-9]+ from stderr (last resort).

Extracted reasoning is compacted (newlines collapsed, whitespace normalized) and capped at 600 characters. Placeholder values ({}, [], null, bare code fences) are filtered out.

The reasoning_source metadata field indicates which source yielded the reasoning: raw_output, session_assistant, response_payload, stream_log, or none.

Token Usage Tracking

Token usage is extracted from multiple sources, in priority order:

Raw JSON output — Parses usage.input_tokens, usage.output_tokens, usage.cost_usd and variants (inputTokens, prompt_tokens, token_usage.*, etc.).
Session file — Reads the last assistant message's .message.usage from the session JSONL file.
Stream output — Regex extraction of tokens used [N] from stderr progress output.

All sources are normalized to the same schema: {input_tokens, output_tokens, total_tokens, cost_usd}. If total_tokens is zero but input + output are available, the total is computed automatically.

Tool Activity Telemetry

All wrappers source lib/tool-telemetry.sh to emit a best-effort summary of tool/function calls observed during the run. When any calls are detected, the summary is merged into metadata.tool_activity on the success envelope; when none are detected, the field is omitted.

Two functions drive this:

autonom8_tool_activity_json <raw_output> <stream_output> <source> — parses JSON payloads and stream text to produce the telemetry object.
autonom8_merge_tool_activity <tool_activity_json> — piped after the final jq stage to fold the object into metadata.tool_activity only when activity was observed.

Extraction Sources

The library inspects both the raw CLI output and the streamed stderr for tool-call evidence:

Structured JSON — recognizes toolCalls[], tool_calls[], functionCall, function_call, and events whose type matches tool, tool_use, tool_call, function_call, tool-call, tool.start, or tool_start.
Stream text — scans for patterns like Tool <name> executed|called|started|completed|failed (also matches function and mcp prefixes).
Provider-native stores — opencode.sh additionally reads tool events from the OpenCode session SQLite (~/.local/share/opencode/opencode.db, part table) so tool activity is captured even when the CLI did not stream it.

Tool names are compacted: functions.<name> is stripped, mcp__ns__tool is normalized to ns.tool, and non-alphanumeric noise is collapsed to _.

Schema

{
  "call_count": 4,
  "write_count": 1,
  "error_count": 0,
  "tool_names": ["Read", "Grep", "Edit"],
  "result_classes": ["read", "write"],
  "activity_class": "write_active",
  "source": "wrapper:claude"
}

Field	Description
`call_count`	Total tool invocations observed (duplicates counted).
`write_count`	Invocations classified as mutating (see below).
`error_count`	Tool calls reporting `is_error`, `ok: false`, or error-class status/result, plus stream matches for `tool ... error/failed/failure`.
`tool_names`	Deduplicated list of compacted tool names.
`result_classes`	Deduplicated list of behavior classes seen.
`activity_class`	Roll-up: `tool_errors` \| `write_active` \| `tool_active` \| `none`.
`source`	Origin tag, e.g. `wrapper:claude`, `wrapper:opencode`.

Tool Classification

Tool names are classified by regex over their lowercased form:

Class	Matches
`write`	`apply_patch`, `write`, `edit`, `multi_edit`, `replace`, `create`, `delete`, `remove`, `move`, `rename`, `insert`
`read`	`read`, `cat`, `open`, `view`, `list`, `ls`, `find`, `grep`, `rg`, `search`
`browser`	`browser`, `playwright`, `screenshot`, `page`, `dom`, `axe`, `lighthouse`
`web`	`web`, `fetch`, `http`, `url`, `search_query`
`shell`	`exec`, `bash`, `shell`, `command`, `terminal`
`other`	anything else

Consumer Guidance

Use activity_class == "tool_errors" as a signal to inspect logs before trusting the response.
write_active indicates the agent mutated the working tree; callers may want to diff before committing.
tool_active with only read/browser/web classes implies a review or investigation pass rather than a change pass.
Missing tool_activity does not prove zero tool use — it means the wrapper could not detect any from the available signals.

Agent Stream Logging

When the environment variable A8_TICKET_ID is set (typically by the Go CLIManager), wrappers create per-invocation log files:

<work_dir>/.autonom8/agent_logs/<ticket_id>_<workflow>_<timestamp>.log

These logs capture:

Header with ticket ID, workflow name, provider, and start timestamp.
Stderr output from the CLI (progress, warnings, tool calls) via tee.
Full stdout response appended after completion.

This enables post-hoc debugging and audit trails per ticket.

Prompt Size Management

Each wrapper defines provider-appropriate limits:

Constant	Default	Purpose
`PROMPT_MAX_CHARS`	200,000	Hard limit (~50K tokens for Claude's 200K context)
`PROMPT_WARN_THRESHOLD`	160,000	Warning threshold (~40K tokens)

check_prompt_size logs warnings to stderr when approaching or exceeding limits.
get_prompt_stats returns a JSON object with prompt_size_chars, estimated_tokens, max_chars, over_limit.
save_debug_prompt saves the full prompt to disk when DEBUG_PROMPTS=true (for offline inspection).
--context-max truncates the context file before it enters the prompt, with a [... CONTEXT TRUNCATED ...] marker.

Session Resume Optimization

When resuming an existing session (--session-id or --resume), wrappers skip injecting the persona block into the prompt — the persona is already in the session context from the initial invocation. Only the new task data and critical instructions are sent. This reduces token usage significantly on multi-turn workflows.

Error Handling

Wrappers source lib/error_utils.sh (if available) for standardized error classification. The classify_error function maps stderr output to error types:

Error Type	Trigger	Recoverable
`quota`	"usage limit", "rate limit exceeded", "try again at"	Yes
`rate_limit`	"429", "too many requests"	Yes
`timeout`	Exit code 124, "context deadline exceeded"	Yes
`invalid_session`	"session not found", "invalid session"	Yes
`invalid_input`	Bad persona, missing agent file, malformed JSON	No
`provider_error`	CLI crash, non-zero exit, empty response	No
`unknown`	Unclassified	No

For quota errors, a system message file is written to <core_dir>/context/system-messages/inbox/ with timestamp, retry time, and severity — enabling upstream orchestrators to schedule retries.

Provider-Specific Details

Claude (`claude.sh`)

Sessions stored in ~/.claude/projects/<encoded-path>/. Path encoding replaces both / and _ with -.
--output-format json captures session_id and result from Claude's response envelope.
--resume <ID> for session continuation with persona-skip optimization.
Quota status via cached system message files (*-claude-usage-limit.json).
Supports try again at ... parsing for rate limit handling.
Model selection: --model opus, --model sonnet, --model haiku.

Gemini (`gemini.sh`)

Supports Native Skills: Checks .gemini/skills/ and registers them via --skills flag.
MCP server support with auto-registration.
Filters out informational logs ("YOLO mode enabled") to preserve JSON output.
Maps UUID session IDs to Gemini's internal numeric indices.

Codex (`codex.sh`)

Sessions stored in ~/.codex/sessions/.
Uses --sandbox danger-full-access when --yolo or --allowed-tools is set.
Exports SKIP_WEBKIT=1 and SKIP_FIREFOX=1 for Playwright stability.
Quota status via cached system message files.

OpenCode (`opencode.sh`)

Defaults to opencode/grok-code model.
Full session management, prompt size checking, and agent stream logging.
Model selection via --model flag.

Cursor (`cursor.sh`)

Uses cursor-agent CLI.
Supports workspace configuration for context awareness.
Auto-approves MCP tool usage when --allowed-tools is set.
Beta skills support via .cursor/skills/.
Quota status via cached system message files.

Antigravity (`agravity.sh`)

Uses Google's agy CLI (Antigravity).
Non-interactive runs use --print=<prompt>; the wrapper always sets --print-timeout=<CLI_TIMEOUT>s so timeouts are enforced on both sides.
--yolo / --allowed-tools maps to --dangerously-skip-permissions; --add-dir <path> is forwarded to agy (and the resolved workspace is added automatically).
Conversation state lives at ~/.gemini/antigravity-cli/conversations/<UUID>.pb; the wrapper discovers the freshest conversation id after each run.
--session-id <UUID> resumes via --conversation=<UUID>; non-UUID logical ids start fresh and the real provider id is captured on completion.
Reasoning and tool telemetry are mined from ~/.gemini/antigravity-cli/brain/<UUID>/.system_generated/logs/transcript.jsonl (assistant thinking blocks; tool_calls[] per step).
The agy CLI exposes no --model flag — the active model is configured via ~/.gemini/antigravity-cli/settings.json. The wrapper accepts --model for interface parity and records it in model_resolution, but does not pass it through.
Antigravity does not emit usage metadata; tokens_used reports estimated output tokens and a transcript-size-based total estimate.

AWS Lambda Integration

This repository also contains serverless implementations for AI agents, located in the aws-lambdas/ directory. These Lambdas provide direct API access to agent capabilities without requiring a local CLI environment.

Components

AI-Agent-Claude: AWS Lambda implementation for Anthropic's Claude.
AI-Agent-Gemini: AWS Lambda implementation for Google's Gemini.
AI-Agent-Codex: AWS Lambda implementation for OpenAI's Codex/GPT models.
sync_config.py: Utility script to sync persona and skill definitions to DynamoDB or S3.

See aws-lambdas/README.md for detailed deployment and usage instructions.

Integration with CLIManager

The go-autonom8/climanager package relies on these wrappers to:

Orchestrate Calls: It builds the command line arguments based on the CLIRequest.
Handle Fallbacks: If one wrapper fails (exit code != 0), it tries the next in the chain.
Manage Resources: It tracks PIDs and process groups to ensure clean termination on timeouts.
Parse Output: It decodes the JSON output and normalizes it for the application.

Cursor macOS Keychain Bootstrap

cursor.sh refreshes the macOS login keychain before each Cursor CLI process when running on Darwin. This is required for SSH-launched Mac Mini workers because Cursor stores CLI credentials in login.keychain-db, while a non-GUI worker process can outlive or miss a manual keychain unlock. The wrapper only unlocks the keychain so an already-authenticated Cursor CLI can read its credentials; it does not log in to Cursor or create credentials.

Configuration:

Variable	Default	Description
`AUTONOM8_CURSOR_UNLOCK_KEYCHAIN`	`1`	Set to `0` to disable Cursor-specific keychain refresh.
`AUTONOM8_UNLOCK_KEYCHAIN`	`1`	Shared fallback opt-out when the Cursor-specific variable is unset.
`AUTONOM8_KEYCHAIN_PASSWORD`	unset	Explicit keychain password. Preferred for process supervisors that inject secrets directly.
`AUTONOM8_KEYCHAIN_PASSWORD_ENV`	`mini`	Name of the env var containing the keychain password.
`AUTONOM8_KEYCHAIN_ENV_FILE`	`<login-home>/.env`	File sourced or parsed when the password env var is not already exported. Falls back to the login user's home if `HOME` is sanitized by a worker.
`AUTONOM8_KEYCHAIN_PATH`	`<login-home>/Library/Keychains/login.keychain-db`	Keychain unlocked before invoking Cursor. Falls back to the login user's home if `HOME` is sanitized by a worker.
`AUTONOM8_KEYCHAIN_UNLOCK_TIMEOUT_SECONDS`	`21600`	Timeout passed to `security set-keychain-settings -lut`.
`AUTONOM8_KEYCHAIN_SET_TIMEOUT`	`1`	Set to `0` to unlock without refreshing the keychain timeout.
`AUTONOM8_CURSOR_NORMALIZE_HOME`	`1`	Set to `0` to keep process `HOME` unchanged. By default Cursor invocations on macOS use the login user's home so Cursor Agent resolves the same keychain/config as an interactive session.

Security and operations notes:

Do not print or commit the mini value or any .env file containing it.
The wrapper first checks the already-exported password variable, then sources AUTONOM8_KEYCHAIN_ENV_FILE, then falls back to a simple KEY=value parser. This mirrors the operator command set -a; . "$HOME/.env"; security unlock-keychain -p "$mini" ... without logging the secret.
Missing or failed unlock is intentionally non-fatal. The Cursor call continues so the caller receives the provider's real credential_unavailable error.
If credential_unavailable only appears when multiple Cursor sessions start concurrently, treat that as a routing/concurrency issue rather than a password issue; avoid overlapping Cursor QA and implement calls or add a provider-level Cursor lock.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.autonom8/provider_results		.autonom8/provider_results
aws-lambdas		aws-lambdas
defaults		defaults
go_proxy_wrapper		go_proxy_wrapper
lib		lib
tests/reasoning_tokens		tests/reasoning_tokens
.DS_Store		.DS_Store
CLI_WRAPPERS_OPENCLAW.md		CLI_WRAPPERS_OPENCLAW.md
LICENSE		LICENSE
README.md		README.md
agravity.sh		agravity.sh
claude.sh		claude.sh
codex.sh		codex.sh
component_label.glb		component_label.glb
cursor.sh		cursor.sh
gemini.sh		gemini.sh
opencode.sh		opencode.sh
test-agent.md		test-agent.md
test-agent.sh		test-agent.sh
test-sessions.sh		test-sessions.sh

Folders and files

Latest commit

History

Repository files navigation

AI-CLI-Wrappers

Overview

Available Wrappers

Unified Interface

Common Flags

Session Management

Skill Execution

Diagnostics & Telemetry

Positional Arguments & Input

Output Format

Success Response

Error Response

Model Selection and Fallback

Health Check Response

Quota Status Response

Reasoning Extraction

Token Usage Tracking

Tool Activity Telemetry

Extraction Sources

Schema

Tool Classification

Consumer Guidance

Agent Stream Logging

Prompt Size Management

Session Resume Optimization

Error Handling

Provider-Specific Details

Claude (claude.sh)

Gemini (gemini.sh)

Codex (codex.sh)

OpenCode (opencode.sh)

Cursor (cursor.sh)

Antigravity (agravity.sh)

AWS Lambda Integration

Components

Integration with CLIManager

Cursor macOS Keychain Bootstrap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Claude (`claude.sh`)

Gemini (`gemini.sh`)

Codex (`codex.sh`)

OpenCode (`opencode.sh`)

Cursor (`cursor.sh`)

Antigravity (`agravity.sh`)

Packages