This directory contains robust shell wrappers that provide a unified interface for various AI CLI tools (Claude, Gemini, Codex, OpenCode, Cursor, Antigravity). These wrappers are designed to be orchestrated by go-autonom8/climanager but can also be used independently.
The wrappers standardize the execution of AI agents by handling:
- Persona Extraction: Parsing agent roles from Markdown files (
.md). - Context Injection: Automatically loading
CONTEXT.mdor other project context. - Session Management: Handling session persistence, resumption, and creation across different providers.
- Skill Execution: Invoking specific skills with structured input.
- Process Lifecycle: Managing timeouts, signal handling, and cleanup of child processes.
- Tool Access: configuring sandbox permissions and MCP tool access (YOLO mode).
- Tool Activity Telemetry: extracting tool/function call counts, classes, and error signals into the response envelope.
- JSON Output: Ensuring responses are returned in a structured JSON format for the caller.
| Wrapper | Provider | Key Features |
|---|---|---|
claude.sh |
Anthropic Claude | Project-based sessions (~/.claude/projects/), cold start handling. |
gemini.sh |
Google Gemini | Native skills registry, MCP server support, index-based sessions. |
codex.sh |
OpenAI/Codex | Sandbox configuration (danger-full-access), playright browser support. |
opencode.sh |
OpenCode | Catalog-backed model normalization/fallback, optimized for fast code generation. |
cursor.sh |
Cursor Agent | Workspace-aware, beta skills support, auto-approval for MCPs. |
agravity.sh |
Google Antigravity (agy) |
--print= non-interactive runs, UUID conversation resume, transcript-derived reasoning/tool telemetry. |
All wrappers support a common set of arguments to ensure interchangeable usage by the climanager.
| Flag | Description |
|---|---|
--persona <ID> |
Selects a specific persona block from the agent file (e.g., pm-claude). |
--temperature <0.0-1.0> |
Sets the LLM temperature (if supported by provider). |
--context <File> |
Explicit path to a context file (e.g., CONTEXT.md). |
--context-dir <Dir> |
Directory to search for CONTEXT.md and project context. |
--context-max <Bytes> |
Max context file size in bytes (default: 51200 / 50KB). Truncates with warning if exceeded. |
--skip-context-file |
Disables context loading (for pure logic/schema tasks). |
--timeout <Seconds> |
Sets a hard timeout for the execution (includes cleanup buffer). |
--yolo |
Enables "YOLO mode" - bypasses permission prompts (e.g., --dangerously-skip-permissions). |
--allowed-tools |
Explicitly enables MCP tools/sandboxed execution. |
--model <Name> |
Model selection (e.g., opus, sonnet, haiku, or full model name). |
--permission-mode <Mode> |
Permission mode (e.g., plan, default). Maps to provider-specific flags. |
--dry-run |
Validates arguments, agent file, and prompt size without making an API call. Returns comprehensive validation JSON. |
--verbose / --debug |
Enables debug logging to stderr. |
| Flag | Description |
|---|---|
--session-id <ID> |
Resumes an existing session. |
--resume <ID> |
Alias for --session-id. |
--new-session |
Creates a new session and captures the ID from the response (Claude). |
--manage-session <ID> |
Manages a named/tracked session (Gemini/Codex). |
| Flag | Description |
|---|---|
--skill <Name> |
Invokes a specific skill instead of a full agent prompt. |
| Flag | Description | Availability |
|---|---|---|
--health-check |
Returns provider CLI availability, version, and latency as JSON. No inference call made. | All wrappers |
--quota-status |
Checks cached usage-limit files and returns quota exhaustion status with estimated reset time. | Claude, Codex, Cursor |
--reasoning-fallback |
Emits reasoning and token telemetry from session logs without invoking the provider CLI. Requires --session-id. |
All wrappers |
- Agent File: The last positional argument should be the path to the agent definition file (
.md). - Input Data: JSON input data is passed via stdin.
Example:
echo '{"task": "Analyze this code"}' | ./bin/claude.sh \
--persona pm-claude \
--context-dir /path/to/project \
--timeout 120 \
--yolo \
agents/pm-agent.mdThe wrappers print JSON to stdout via emit_cli_response. All wrappers share this envelope:
{
"response": "The actual text response from the LLM...",
"session_id": "uuid-or-index",
"reasoning": "Extracted thinking/reasoning from the model...",
"tokens_used": {
"input_tokens": 1200,
"output_tokens": 450,
"estimated_output_tokens": 425,
"total_tokens": 1650,
"cost_usd": 0.012,
"cache_read_input_tokens": 800,
"cache_creation_input_tokens": 0
},
"metadata": {
"token_usage_available": true,
"reasoning_available": true,
"reasoning_source": "session_assistant",
"reasoning_absent_reason": "available",
"tool_activity": {
"call_count": 4,
"write_count": 1,
"error_count": 0,
"tool_names": ["Read", "Grep", "Edit"],
"result_classes": ["read", "write"],
"activity_class": "write_active",
"source": "wrapper:claude"
}
},
"model_resolution": "provider model 'requested' -> 'effective' (fallback)"
}Optional fields:
"model_resolution"when the wrapper normalized or fell back from the requested model."skill"for skill-oriented wrapper paths."metadata.tool_activity"only appears when the wrapper observed one or more tool calls; omitted for pure-text responses.
Skill invocations include an extra "skill": "<name>" field. Markdown code fences are stripped automatically.
Returned via emit_cli_error_response when a provider call fails:
{
"response": "",
"session_id": "uuid-if-available",
"reasoning": "",
"tokens_used": {
"input_tokens": 0,
"output_tokens": 0,
"estimated_output_tokens": 0,
"total_tokens": 0,
"cost_usd": 0,
"cache_read_input_tokens": 0,
"cache_creation_input_tokens": 0
},
"metadata": {
"token_usage_available": false,
"reasoning_available": false,
"reasoning_source": "none",
"reasoning_absent_reason": "error_path"
},
"error": "Detailed error message...",
"error_type": "timeout",
"exit_code": 124,
"recoverable": true
}Error types: timeout, quota, rate_limit, invalid_model, invalid_session, invalid_input, provider_error, unknown. Errors marked recoverable: true signal to the caller that a retry or fallback is appropriate.
Wrappers now harden invalid model handling locally instead of always failing the call on first contact.
Model resolution order:
- Explicit
--model <name>from the caller. AI_CLI_PROVIDERS_CONFIGorAUTONOM8_PROVIDERS_CONFIG, when set.- A nearby repo config discovered from the working directory:
providers.yaml,go-autonom8/providers.yaml,.ai-cli-wrappers/providers.yaml, or.autonom8/providers.yaml. - Bundled wrapper defaults in
defaults/providers.yaml. - Provider-native current/default model where the CLI exposes a live catalog.
Provider configs can use aliases under models: plus default_model:. Wrappers resolve the alias before execution and emit model_resolution whenever normalization or fallback occurs. Provider-specific CLI mechanics still live inside each wrapper; callers should not need to know whether a provider expects a family alias, full model ID, or provider-prefixed ID.
Behavior by provider:
cursor.sh- validates requested models against
cursor-agent models - falls back to the current/default provider model when the requested or configured model does not exist
- validates requested models against
opencode.sh- validates requested models against
opencode models - resolves common tail aliases like
gpt-5.1to full IDs likeopenai/gpt-5.1 - uses config-backed defaults for no-model calls
- falls back to the live provider catalog if a configured default is stale
- validates requested models against
claude.sh,codex.sh,gemini.sh- attempt the requested model first
- if the provider returns an invalid-model class error, retry once with provider default
gemini.shadditionally retriesgemini-2.5-flashwhen provider-default routing hits Gemini capacity exhaustion
If all config/default discovery fails and the provider CLI requires an explicit model, the wrapper fails fast with invalid_input instead of silently embedding a task-model choice in shell code.
{
"provider": "claude",
"status": "ok",
"latency_ms": 142,
"cli_available": true,
"version": "1.0.18",
"session_support": true
}{
"provider": "claude",
"quota_exhausted": true,
"reset_at": "2026-03-12T15:30:00Z",
"reset_in_seconds": 1800,
"retry_time": "try again at 3:30 PM",
"source": "cached"
}All wrappers attempt to extract model reasoning/thinking from multiple sources, in priority order:
- Raw output —
_reasoning,reasoning,thinking,thoughts,analysisfields from the JSON response. - Session logs —
thinkingcontent blocks from assistant messages in the session file (Claude format:.message.content[].type == "thinking"). - Response payload — Reasoning fields embedded inside fenced JSON blocks in the assistant text.
- Stream output — Lines matching
thought|thinking|reasoning|analysis|plan:|step [0-9]+from stderr (last resort).
Extracted reasoning is compacted (newlines collapsed, whitespace normalized) and capped at 600 characters. Placeholder values ({}, [], null, bare code fences) are filtered out.
The reasoning_source metadata field indicates which source yielded the reasoning: raw_output, session_assistant, response_payload, stream_log, or none.
Token usage is extracted from multiple sources, in priority order:
- Raw JSON output — Parses
usage.input_tokens,usage.output_tokens,usage.cost_usdand variants (inputTokens,prompt_tokens,token_usage.*, etc.). - Session file — Reads the last assistant message's
.message.usagefrom the session JSONL file. - Stream output — Regex extraction of
tokens used [N]from stderr progress output.
All sources are normalized to the same schema: {input_tokens, output_tokens, total_tokens, cost_usd}. If total_tokens is zero but input + output are available, the total is computed automatically.
All wrappers source lib/tool-telemetry.sh to emit a best-effort summary of tool/function calls observed during the run. When any calls are detected, the summary is merged into metadata.tool_activity on the success envelope; when none are detected, the field is omitted.
Two functions drive this:
autonom8_tool_activity_json <raw_output> <stream_output> <source>— parses JSON payloads and stream text to produce the telemetry object.autonom8_merge_tool_activity <tool_activity_json>— piped after the finaljqstage to fold the object intometadata.tool_activityonly when activity was observed.
The library inspects both the raw CLI output and the streamed stderr for tool-call evidence:
- Structured JSON — recognizes
toolCalls[],tool_calls[],functionCall,function_call, and events whosetypematchestool,tool_use,tool_call,function_call,tool-call,tool.start, ortool_start. - Stream text — scans for patterns like
Tool <name> executed|called|started|completed|failed(also matchesfunctionandmcpprefixes). - Provider-native stores —
opencode.shadditionally reads tool events from the OpenCode session SQLite (~/.local/share/opencode/opencode.db,parttable) so tool activity is captured even when the CLI did not stream it.
Tool names are compacted: functions.<name> is stripped, mcp__ns__tool is normalized to ns.tool, and non-alphanumeric noise is collapsed to _.
{
"call_count": 4,
"write_count": 1,
"error_count": 0,
"tool_names": ["Read", "Grep", "Edit"],
"result_classes": ["read", "write"],
"activity_class": "write_active",
"source": "wrapper:claude"
}| Field | Description |
|---|---|
call_count |
Total tool invocations observed (duplicates counted). |
write_count |
Invocations classified as mutating (see below). |
error_count |
Tool calls reporting is_error, ok: false, or error-class status/result, plus stream matches for tool ... error/failed/failure. |
tool_names |
Deduplicated list of compacted tool names. |
result_classes |
Deduplicated list of behavior classes seen. |
activity_class |
Roll-up: tool_errors | write_active | tool_active | none. |
source |
Origin tag, e.g. wrapper:claude, wrapper:opencode. |
Tool names are classified by regex over their lowercased form:
| Class | Matches |
|---|---|
write |
apply_patch, write, edit, multi_edit, replace, create, delete, remove, move, rename, insert |
read |
read, cat, open, view, list, ls, find, grep, rg, search |
browser |
browser, playwright, screenshot, page, dom, axe, lighthouse |
web |
web, fetch, http, url, search_query |
shell |
exec, bash, shell, command, terminal |
other |
anything else |
- Use
activity_class == "tool_errors"as a signal to inspect logs before trusting the response. write_activeindicates the agent mutated the working tree; callers may want to diff before committing.tool_activewith onlyread/browser/webclasses implies a review or investigation pass rather than a change pass.- Missing
tool_activitydoes not prove zero tool use — it means the wrapper could not detect any from the available signals.
When the environment variable A8_TICKET_ID is set (typically by the Go CLIManager), wrappers create per-invocation log files:
<work_dir>/.autonom8/agent_logs/<ticket_id>_<workflow>_<timestamp>.log
These logs capture:
- Header with ticket ID, workflow name, provider, and start timestamp.
- Stderr output from the CLI (progress, warnings, tool calls) via
tee. - Full stdout response appended after completion.
This enables post-hoc debugging and audit trails per ticket.
Each wrapper defines provider-appropriate limits:
| Constant | Default | Purpose |
|---|---|---|
PROMPT_MAX_CHARS |
200,000 | Hard limit (~50K tokens for Claude's 200K context) |
PROMPT_WARN_THRESHOLD |
160,000 | Warning threshold (~40K tokens) |
check_prompt_sizelogs warnings to stderr when approaching or exceeding limits.get_prompt_statsreturns a JSON object withprompt_size_chars,estimated_tokens,max_chars,over_limit.save_debug_promptsaves the full prompt to disk whenDEBUG_PROMPTS=true(for offline inspection).--context-maxtruncates the context file before it enters the prompt, with a[... CONTEXT TRUNCATED ...]marker.
When resuming an existing session (--session-id or --resume), wrappers skip injecting the persona block into the prompt — the persona is already in the session context from the initial invocation. Only the new task data and critical instructions are sent. This reduces token usage significantly on multi-turn workflows.
Wrappers source lib/error_utils.sh (if available) for standardized error classification. The classify_error function maps stderr output to error types:
| Error Type | Trigger | Recoverable |
|---|---|---|
quota |
"usage limit", "rate limit exceeded", "try again at" | Yes |
rate_limit |
"429", "too many requests" | Yes |
timeout |
Exit code 124, "context deadline exceeded" | Yes |
invalid_session |
"session not found", "invalid session" | Yes |
invalid_input |
Bad persona, missing agent file, malformed JSON | No |
provider_error |
CLI crash, non-zero exit, empty response | No |
unknown |
Unclassified | No |
For quota errors, a system message file is written to <core_dir>/context/system-messages/inbox/ with timestamp, retry time, and severity — enabling upstream orchestrators to schedule retries.
- Sessions stored in
~/.claude/projects/<encoded-path>/. Path encoding replaces both/and_with-. --output-format jsoncapturessession_idandresultfrom Claude's response envelope.--resume <ID>for session continuation with persona-skip optimization.- Quota status via cached system message files (
*-claude-usage-limit.json). - Supports
try again at ...parsing for rate limit handling. - Model selection:
--model opus,--model sonnet,--model haiku.
- Supports Native Skills: Checks
.gemini/skills/and registers them via--skillsflag. - MCP server support with auto-registration.
- Filters out informational logs ("YOLO mode enabled") to preserve JSON output.
- Maps UUID session IDs to Gemini's internal numeric indices.
- Sessions stored in
~/.codex/sessions/. - Uses
--sandbox danger-full-accesswhen--yoloor--allowed-toolsis set. - Exports
SKIP_WEBKIT=1andSKIP_FIREFOX=1for Playwright stability. - Quota status via cached system message files.
- Defaults to
opencode/grok-codemodel. - Full session management, prompt size checking, and agent stream logging.
- Model selection via
--modelflag.
- Uses
cursor-agentCLI. - Supports workspace configuration for context awareness.
- Auto-approves MCP tool usage when
--allowed-toolsis set. - Beta skills support via
.cursor/skills/. - Quota status via cached system message files.
- Uses Google's
agyCLI (Antigravity). - Non-interactive runs use
--print=<prompt>; the wrapper always sets--print-timeout=<CLI_TIMEOUT>sso timeouts are enforced on both sides. --yolo/--allowed-toolsmaps to--dangerously-skip-permissions;--add-dir <path>is forwarded toagy(and the resolved workspace is added automatically).- Conversation state lives at
~/.gemini/antigravity-cli/conversations/<UUID>.pb; the wrapper discovers the freshest conversation id after each run. --session-id <UUID>resumes via--conversation=<UUID>; non-UUID logical ids start fresh and the real provider id is captured on completion.- Reasoning and tool telemetry are mined from
~/.gemini/antigravity-cli/brain/<UUID>/.system_generated/logs/transcript.jsonl(assistantthinkingblocks;tool_calls[]per step). - The
agyCLI exposes no--modelflag — the active model is configured via~/.gemini/antigravity-cli/settings.json. The wrapper accepts--modelfor interface parity and records it inmodel_resolution, but does not pass it through. - Antigravity does not emit usage metadata;
tokens_usedreports estimated output tokens and a transcript-size-based total estimate.
This repository also contains serverless implementations for AI agents, located in the aws-lambdas/ directory. These Lambdas provide direct API access to agent capabilities without requiring a local CLI environment.
- AI-Agent-Claude: AWS Lambda implementation for Anthropic's Claude.
- AI-Agent-Gemini: AWS Lambda implementation for Google's Gemini.
- AI-Agent-Codex: AWS Lambda implementation for OpenAI's Codex/GPT models.
- sync_config.py: Utility script to sync persona and skill definitions to DynamoDB or S3.
See aws-lambdas/README.md for detailed deployment and usage instructions.
The go-autonom8/climanager package relies on these wrappers to:
- Orchestrate Calls: It builds the command line arguments based on the
CLIRequest. - Handle Fallbacks: If one wrapper fails (exit code != 0), it tries the next in the chain.
- Manage Resources: It tracks PIDs and process groups to ensure clean termination on timeouts.
- Parse Output: It decodes the JSON output and normalizes it for the application.
cursor.sh refreshes the macOS login keychain before each Cursor CLI process when running on Darwin. This is required for SSH-launched Mac Mini workers because Cursor stores CLI credentials in login.keychain-db, while a non-GUI worker process can outlive or miss a manual keychain unlock. The wrapper only unlocks the keychain so an already-authenticated Cursor CLI can read its credentials; it does not log in to Cursor or create credentials.
Configuration:
| Variable | Default | Description |
|---|---|---|
AUTONOM8_CURSOR_UNLOCK_KEYCHAIN |
1 |
Set to 0 to disable Cursor-specific keychain refresh. |
AUTONOM8_UNLOCK_KEYCHAIN |
1 |
Shared fallback opt-out when the Cursor-specific variable is unset. |
AUTONOM8_KEYCHAIN_PASSWORD |
unset | Explicit keychain password. Preferred for process supervisors that inject secrets directly. |
AUTONOM8_KEYCHAIN_PASSWORD_ENV |
mini |
Name of the env var containing the keychain password. |
AUTONOM8_KEYCHAIN_ENV_FILE |
<login-home>/.env |
File sourced or parsed when the password env var is not already exported. Falls back to the login user's home if HOME is sanitized by a worker. |
AUTONOM8_KEYCHAIN_PATH |
<login-home>/Library/Keychains/login.keychain-db |
Keychain unlocked before invoking Cursor. Falls back to the login user's home if HOME is sanitized by a worker. |
AUTONOM8_KEYCHAIN_UNLOCK_TIMEOUT_SECONDS |
21600 |
Timeout passed to security set-keychain-settings -lut. |
AUTONOM8_KEYCHAIN_SET_TIMEOUT |
1 |
Set to 0 to unlock without refreshing the keychain timeout. |
AUTONOM8_CURSOR_NORMALIZE_HOME |
1 |
Set to 0 to keep process HOME unchanged. By default Cursor invocations on macOS use the login user's home so Cursor Agent resolves the same keychain/config as an interactive session. |
Security and operations notes:
- Do not print or commit the
minivalue or any.envfile containing it. - The wrapper first checks the already-exported password variable, then sources
AUTONOM8_KEYCHAIN_ENV_FILE, then falls back to a simpleKEY=valueparser. This mirrors the operator commandset -a; . "$HOME/.env"; security unlock-keychain -p "$mini" ...without logging the secret. - Missing or failed unlock is intentionally non-fatal. The Cursor call continues so the caller receives the provider's real
credential_unavailableerror. - If
credential_unavailableonly appears when multiple Cursor sessions start concurrently, treat that as a routing/concurrency issue rather than a password issue; avoid overlapping Cursor QA and implement calls or add a provider-level Cursor lock.