Skip to content

Release v0.2.10#6

Merged
joshwilhelmi merged 159 commits intomainfrom
dev
Feb 3, 2026
Merged

Release v0.2.10#6
joshwilhelmi merged 159 commits intomainfrom
dev

Conversation

@joshwilhelmi
Copy link
Contributor

@joshwilhelmi joshwilhelmi commented Feb 3, 2026

Summary

  • Fix test suite alignment with production code changes
  • Add MCP templates for Codex, Copilot, Cursor, Windsurf
  • Deprecate unused workflow files
  • Add new meeseeks-box workflows

Test plan

  • All 10,040 tests pass
  • Coverage at 81%

Summary by CodeRabbit

  • New Features

    • Added support for Cursor, Windsurf, and Copilot IDEs alongside Claude Code, Gemini, and Codex
    • Introduced pipelines with sequential execution, data flow, and approval gates for deterministic automation
    • Launched Lobster-to-Gobby pipeline migration support with CLI import/export
    • Added dedicated code review workflow with approval gates and change requests
  • Enhancements

    • Improved memory extraction criteria (5-minute decision rule; higher importance threshold)
    • Consolidated skill injection into workflow lifecycle system
    • Enhanced terminal output streaming and PTY management
    • Expanded MCP tool registry with pipeline and discovery tools
  • Documentation

    • New guides for pipelines, Lobster migration, and extended hook support
    • Updated CLI installation and hook configuration documentation

joshwilhelmi and others added 30 commits January 31, 2026 19:09
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…D generation

ID generation now uses only content hash (not project_id), aligning with
content_exists() which already performs global deduplication. This prevents
duplicate memories when the same content is stored with different project_ids.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Key changes:
- Raise min_importance default from 0.7 to 0.8
- Reduce max_memories default from 5 to 3
- Add "5-minute rule": only capture if rediscovery would take >5 minutes
- Explicit DO NOT list: generic patterns, vague speculation, obvious info
- Encourage empty arrays for sessions with nothing notable

Also cleaned 123 low-value memories from DB via SQL patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add lowercase 'true' and 'false' to allowed_globals in ConditionEvaluator.
YAML/JSON uses lowercase booleans, but Python's eval expects True/False.
This caused "name 'true' is not defined" errors in workflow conditions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: Gemini preflight terminated before session persisted to disk,
causing resume with -r flag to fail (session not found).

Fix: Skip preflight entirely for Gemini. Instead:
- Pass GOBBY_SESSION_ID via environment when spawning terminal
- Hook dispatcher reads env vars and includes in terminal_context
- Session handler looks up pre-created session by gobby_session_id
- Updates external_id with Gemini's native session_id at SessionStart

Files changed:
- hook_dispatcher.py: Read GOBBY_* env vars into terminal_context
- spawn_executor.py: Use prepare_terminal_spawn, build_cli_command
- _session.py: Link sessions via gobby_session_id from terminal_context
- test_spawn_executor.py: Updated tests for new approach

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Session context is already injected via additionalContext at SessionStart
by the session handler. The prompt prefix was redundant.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create proactive-memory alwaysApply skill with capture guidelines
- Add explicit alwaysApply: false to memory skill
- Remove automatic memory_extract from session lifecycle

Agents now save memories during work when they discover valuable insights,
rather than post-session LLM extraction.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n events

Add default tool_name='(tool_selection)' for before_tool_selection events since
these fire before a specific tool is selected, so tool_name is not available.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add mcp__gobby__list_mcp_servers and mcp__gobby__search_tools to all workflow
steps that have MCP tool access for consistent tool discovery behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add llm_service parameter to GitHubCollectionProvider for description synthesis
- Add _fetch_skill_content() to fetch SKILL.md from GitHub API
- Add _synthesize_description() to generate concise descriptions via LLM
- Update get_skill_details() to fetch content, synthesize, and cache results
- Update HubManager to pass llm_service to GitHubCollectionProvider
- Add 14 new tests covering all new functionality

Uses lazy-fetch + LLM synthesis pattern matching MCP tool progressive disclosure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move skill_manager initialization before ActionExecutor creation and
pass it to the ActionExecutor constructor (line 285).

Changes:
- Moved HookSkillManager() initialization to line 261 (before ActionExecutor)
- Added skill_manager=self._skill_manager to ActionExecutor constructor

Tests:
- Added test_init_passes_skill_manager_to_action_executor to verify wiring
- All 74 HookManager tests pass

Part of epic #6640: Consolidate Skill Injection into Workflows

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for source='skills' in inject_context to enable workflow-based
skill injection. This is part of epic #6640 to consolidate skill injection
into workflows.

Changes:
- Add skill_manager and filter parameters to inject_context function
- Add 'skills' source branch that uses skill_manager.discover_core_skills()
- Add _format_skills() helper to format skills as markdown
- Support filter='always_apply' to only inject always-apply skills
- Update handle_inject_context to pass skill_manager from ActionContext
- Add 7 comprehensive tests for the new functionality

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for source='task_context' in inject_context to enable
workflow-based task context injection. This is part of epic #6640 to
consolidate skill injection into workflows.

Changes:
- Add session_task_manager parameter to inject_context function
- Add 'task_context' source branch that uses session_task_manager.get_session_tasks()
- Filter for 'worked_on' tasks to get the active task(s)
- Add _format_task_context() helper to format task info as markdown
- Update handle_inject_context to pass session_task_manager from ActionContext
- Add 7 comprehensive tests for the new functionality

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for source='memories' in inject_context to enable
workflow-based memory injection. This is part of epic #6640 to
consolidate skill injection into workflows.

Changes:
- Add memory_manager, prompt_text, limit, min_importance parameters to inject_context
- Add 'memories' source branch that uses memory_manager.recall()
- Add _format_memories() helper to format memories as markdown
- Update handle_inject_context to pass memory_manager from ActionContext
- Get prompt_text from event_data if not explicitly passed
- Add 9 comprehensive tests for the new functionality

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for passing a list of sources to inject_context, enabling
workflows to inject multiple context types in a single action. This is
part of epic #6640 to consolidate skill injection into workflows.

Changes:
- Update source parameter type to str | list[str] | None
- Add list handling logic that recursively processes each source
- Combine content from multiple sources with double newlines
- Skip sources that return no content
- Support template rendering for combined content
- Block only when require=True and ALL sources return empty
- Add 7 comprehensive tests for array source functionality

Example usage in workflows:
  source: [skills, task_context, memories]

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add skill and task context injection to on_session_start trigger.
This is part of epic #6640 to consolidate skill injection into workflows.

Changes to session-lifecycle.yaml:
- Add inject_context action for skills source with filter: always_apply
  - Injects always-apply skills for all session starts
  - Uses template to format skills list
- Add inject_context action for task_context source
  - Injects active task info if session has a claimed task
- Both actions run after task_sync_import to ensure task context is available

Updated both:
- src/gobby/install/shared/workflows/lifecycle/session-lifecycle.yaml
- .gobby/workflows/lifecycle/session-lifecycle.yaml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove deprecated _build_skill_injection_context() and
_restore_skills_from_parent() methods. Skill injection is now
handled by workflows via inject_context action with source=skills.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove startup and skill_discovery sections - now handled by
workflow inject_context action. Keep tool_discovery and rules
sections for progressive disclosure and task requirements.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Progressive disclosure instructions are now in MCP instructions.
Keep plan mode prompt focused on planning workflow only.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Progressive disclosure is now covered by MCP instructions.
Agents can still get_skill('discovering-tools') on demand.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add section documenting skills, task_context, memories sources
with examples for each pattern including array syntax.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Note that skill injection is now handled by workflow inject_context
action rather than hooks directly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Note that list_mcp_servers() and list_skills() are not required at
session start. Core skills are auto-injected via workflows.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The note applies to all CLIs (Claude, Gemini, Codex), not just Claude Code.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update memories.jsonl and tasks.jsonl
- Rename and reorganize prompt files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove duplicate type hint for render_context variable on line 140.
The first declaration on line 104 already defines the type.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…t access

DownloadResult is a dataclass, not a dict. Changed .get() and indexing
to attribute access (.success, .path, .error).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rename docs/plans/moltbot-plugin.md → openclaw-plugin.md
- Update all moltbot/Moltbot references to openclaw/OpenClaw
- Add openclaw.plugin.json manifest to file structure
- Update reference paths and OpenClawPluginApi type
- Create expand-ready plan at .gobby/plans/openclaw-plugin.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
joshwilhelmi and others added 26 commits February 2, 2026 20:11
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When spawning an agent without an explicit base_branch, use git_manager
to detect the caller's current branch. This allows worktrees to be
based off whatever branch the caller is working on.

Fallback order:
1. Explicit base_branch parameter
2. Agent definition's base_branch
3. Auto-detected current branch via git_manager
4. Default to "main" as last resort

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…140]

Add full multi-editor support for Cursor, Windsurf, and GitHub Copilot:

- Add CURSOR, WINDSURF, COPILOT values to SessionSource enum
- Add EVENT_TYPE_CLI_SUPPORT mappings for new sources
- Create thin adapter subclasses inheriting from ClaudeCodeAdapter
- Update agent system (isolation, sandbox, spawn, registry) to recognize new CLIs
- Register new sources in transcript parser registry
- Update MCP tools with expanded Literal types
- Update documentation (README, hook-schemas, sessions guide)
- Add comprehensive tests for new adapters and enum values

These editors use Claude Code's hook format, so they share the same
adapter logic via inheritance.

Co-Authored-By: Gemini <noreply@google.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ck schema [#6854]

The mock_db fixture had an outdated sessions table schema missing
the step_variables column added in migration 81.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Security fixes:
- Fix info disclosure in websocket.py by returning generic error messages

Bug fixes:
- Fix undefined 'name' variable in cli/pipelines.py when using --lobster flag
- Add None guards in workflows/actions.py for inputs and state.variables

Error handling improvements:
- Add proper error handling for file I/O and YAML parsing in lobster_compat.py
- Catch specific exceptions in _discovery.py, log unexpected errors
- Add max_iterations guard to meeseeks-box-pipeline.yaml to prevent infinite recursion

Configuration fixes:
- Remove hardcoded IP from vite.config.ts, use env variable
- Use consistent wsUrl pattern in useTerminal.ts
- Use relative URLs in usePipeline.ts instead of hardcoded localhost
- Fix state mutations in usePipeline.ts with immutable updates
- Return boolean from sendMessage in useChat.ts for connection status

Documentation and type fixes:
- Fix docstring in analyzer.py (50 chars -> 100 chars)
- Fix truncation length in github_collection.py (150 -> 100 to match prompt)
- Remove redundant hasattr check in hook_manager.py
- Fix return type annotation in routes/pipelines.py
- Use structured logging in pipelines/__init__.py
- Remove unused runId prop from Terminal.tsx
- Restrict CSS module declaration in vite-env.d.ts
- Add return type annotations to test fixtures
- Fix invalid git merge --squash -m command in meeseeks-box.yaml

Data cleanup:
- Remove invalid memory entries with empty/truncated content

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rs [#6857] [#6858] [#6859] [#6860]

Replace empty stubs with full implementations:

CopilotAdapter:
- EVENT_MAP for camelCase hooks (sessionStart, preToolUse, etc.)
- Normalize toolName/toolArgs/toolResult to snake_case
- Response uses permissionDecision format
- MCP call extraction from toolArgs

WindsurfAdapter:
- EVENT_MAP for Cascade action names (pre_read_code, post_write_code, etc.)
- TOOL_MAP normalizes actions to Read/Write/Bash/mcp_call
- Extract tool details from nested tool_info structure
- Handle read_code, write_code, run_command, mcp_tool_use actions

CursorAdapter:
- Documented stub explaining NDJSON streaming vs hook interception
- Minimal EVENT_MAP for NDJSON type:subtype patterns
- Future integration options documented

Tests: 50 comprehensive tests covering event maps, translations, and round-trips.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add isinstance checks for WorkflowDefinition vs PipelineDefinition unions
- Add proper type annotations for dict variables
- Rename tests/storage/test_pipelines.py to test_pipeline_storage.py to avoid module collision

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add nosec comments to ui.py for hardcoded npm subprocess calls
- Fix B110 try-except-pass in spawn_agent.py, _lifecycle.py, task_policy.py
- Add nosec B608 comment to pipelines.py for safe SQL construction
- Add pip>=26.0 to dev dependencies for CVE-2026-1703

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ation [#6863]

The previous CursorAdapter was based on incorrect research that claimed Cursor
uses NDJSON streaming. Cursor actually has a full hooks system very similar to
Claude Code, documented at https://cursor.com/docs/agent/hooks

Changes:
- EVENT_MAP with 17 camelCase events (sessionStart, preToolUse,
  beforeShellExecution, afterMCPExecution, etc.)
- HOOK_TO_TOOL_TYPE mapping for granular hooks (shell→Bash, MCP→mcp_call)
- Proper response formats per hook type:
  - preToolUse: decision allow/deny
  - beforeShellExecution: permission allow/deny
  - sessionStart: continue true/false with additional_context
  - stop: followup_message
- MCP info extraction for beforeMCPExecution hooks
- 27 comprehensive tests replacing NDJSON stub tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All three adapters (Cursor, Windsurf, Copilot) are now implemented.
Added status table noting the CursorAdapter fix in commit 6a532de.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Updated CLI Support table with hook counts per adapter
- Added configuration examples for Cursor, Windsurf, and Copilot
- Changed status from "Claude Code hook format" to "Native adapter"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace oversimplified curl examples with proper gobby install
instructions. The hook dispatchers capture terminal context and
handle HTTP communication properly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add install assets for new CLIs:
- cursor/hooks/hook_dispatcher.py + hooks-template.json (17 camelCase events)
- windsurf/hooks/hook_dispatcher.py + hooks-template.json (11 snake_case events)
- copilot/hooks/hook_dispatcher.py + hooks-template.json (6 camelCase events)

Each dispatcher:
- Captures terminal context (TTY, parent PID, session IDs)
- Communicates with daemon via HTTP POST to /hooks/execute
- Handles exit code 2 for blocking actions

Auto-installation via gobby install coming in a follow-up.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…6867]

Add installer modules and CLI detection for new CLIs:
- cursor.py, windsurf.py, copilot.py installer modules
- Detection functions for each platform (macOS, Windows, Linux)
- New CLI flags: --cursor, --windsurf, --copilot
- Auto-detection during `gobby install`

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rn types

FastAPI cannot generate response models from union types like
dict[str, Any] | JSONResponse. Adding response_model=None disables
automatic response model generation for these endpoints.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update cli-commands.md:
- Add --cursor, --windsurf, --copilot flags to install/uninstall
- Add table showing hook locations and config files per CLI

Update hook-schemas.md:
- Add detailed Cursor hooks section (17 events)
- Add detailed Windsurf hooks section (11 events)
- Add detailed Copilot hooks section (6 events)
- Add hook dispatcher examples for each CLI

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- tests/config/test_skills_config.py: Update test_hub_config_invalid_type to check
  for claude-plugins in error, update test_hubs_empty_default to expect built-in hubs
- tests/cli/test_pipelines.py: Add ANY import, update assertions to use project_path=ANY
- tests/agents/test_isolation.py: Add get_current_branch and has_unpushed_commits mocks
- tests/agents/test_runner.py: Add spec=WorkflowDefinition to mock workflows

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Mock task_manager in ActionExecutor for claim detection
- Fix missing workflow type/variables in lifecycle evaluation tests
- Update incorrect assertions in stuck workflow tests
- Fix broken imports and syntax errors in test files
Move nosec comments to same line as subprocess.run calls and use
space-separated format '# nosec B603 B607' instead of comma-separated
with explanation text, which was being parsed incorrectly by bandit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update tests for new CLI detection (Cursor, Windsurf, Copilot)
- Fix MCP tool tests to use claim_task/close_task instead of update_task
- Update workflow tests to use real WorkflowDefinition objects
- Fix session source count tests (6 sources: claude, gemini, codex, cursor, windsurf, copilot)
- Add MCP templates for Codex, Copilot, Cursor, Windsurf
- Deprecate unused workflow files, add new meeseeks-box workflows

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

Caution

Review failed

The pull request is closed.

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'tools', 'ignore'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

This PR introduces a comprehensive pipeline system for deterministic automation with approval gates, support for three new CLIs (Cursor, Windsurf, Copilot), consolidated skill injection into workflows, and Lobster pipeline migration tooling. It significantly expands agent capabilities, improves worktree isolation with unpushed commit detection, and adds streaming support for pipeline events and terminal output. The changes span adapters, workflows, storage, CLI, HTTP routing, and documentation.

Changes

Cohort / File(s) Summary
New CLI Adapters (Cursor, Windsurf, Copilot)
src/gobby/adapters/cursor.py, src/gobby/adapters/windsurf.py, src/gobby/adapters/copilot.py, src/gobby/adapters/__init__.py
Three new adapters translate between CLI-specific hook payloads and unified HookEvent/HookResponse models, with event type mappings and tool normalization for each CLI.
CLI Installer Support
src/gobby/cli/install.py, src/gobby/cli/installers/*.py, src/gobby/install/cursor/*, src/gobby/install/windsurf/*, src/gobby/install/copilot/*
New installation/uninstallation flows for Cursor, Windsurf, and Copilot CLIs with hook dispatcher scripts, MCP templates, and hooks templates.
Pipeline System Core
src/gobby/workflows/pipeline_executor.py, src/gobby/workflows/pipeline_state.py, src/gobby/workflows/definitions.py, src/gobby/workflows/pipeline_webhooks.py, src/gobby/workflows/lobster_compat.py
Complete pipeline execution framework with support for typed data flow, approval gates, nested pipelines, webhook integration, and Lobster format conversion.
Pipeline Storage & Management
src/gobby/storage/pipelines.py, src/gobby/storage/migrations.py, src/gobby/storage/sessions.py
New database tables for pipeline executions/step executions, session step_variables field, and LocalPipelineExecutionManager for CRUD operations.
Pipeline CLI & HTTP Routes
src/gobby/cli/pipelines.py, src/gobby/servers/routes/pipelines.py, src/gobby/mcp_proxy/tools/pipelines/*
CLI commands and HTTP endpoints for listing, running, approving, and managing pipelines; MCP tools for pipeline discovery and execution.
Workflow Definition & Loading
src/gobby/workflows/loader.py, src/gobby/agents/definitions.py, src/gobby/agents/runner.py
Support for named workflows in agent definitions, pipeline type detection, inline workflow registration, and agent spawning validation against pipeline definitions.
Multi-Workflow Orchestration
.gobby/workflows/meeseeks-box.yaml, .gobby/workflows/meeseeks-box-pipeline.yaml, .gobby/workflows/worktree-merge.yaml, .gobby/workflows/code-review.yaml, src/gobby/install/shared/workflows/*
New multi-step orchestration workflows for task discovery, worker spawning, code review, and merge pipelines; removed legacy auto-task-claude workflow.
Agent Spawning & Isolation
src/gobby/agents/spawn.py, src/gobby/agents/spawn_executor.py, src/gobby/agents/isolation.py, src/gobby/worktrees/git.py
Enhanced terminal spawn flow with step_variables propagation, unpushed commit detection for base branch selection, and new git helper methods (get_current_branch, has_unpushed_commits).
Session & Memory Management
src/gobby/sessions/session.py, src/gobby/storage/sessions.py, src/gobby/storage/memories.py, src/gobby/sync/memories.py, src/gobby/install/shared/prompts/memory/extract.md
Added step_variables field to sessions, memory ID deduplication by content only, updated memory extraction thresholds and criteria (min_importance 0.7→0.8, max_memories 5→3).
Consolidated Skill Injection
src/gobby/workflows/context_actions.py, src/gobby/install/shared/workflows/lifecycle/session-lifecycle.yaml, docs/guides/workflows.md
Moved skill discovery from hooks into workflow-based inject_context action with sources for skills, task_context, and memories; removed hook-based skill injection guidance.
Hook Management & Event Handling
src/gobby/hooks/events.py, src/gobby/hooks/hook_manager.py, src/gobby/hooks/event_handlers/_session.py, src/gobby/hooks/broadcaster.py
Added CURSOR/WINDSURF/COPILOT session sources, wired pipeline executor into HookManager, removed redundant skill injection from session start hooks.
Streaming & Terminal Output
src/gobby/agents/pty_reader.py, src/gobby/servers/websocket.py, src/gobby/llm/claude.py, src/gobby/llm/service.py
New PTYReaderManager for terminal output streaming from agents, WebSocket support for pipeline events and terminal data, LLM streaming events (TextChunk, ToolCallEvent, DoneEvent).
MCP Proxy Enhancements
src/gobby/mcp_proxy/tools/spawn_agent.py, src/gobby/mcp_proxy/tools/workflows/_lifecycle.py, src/gobby/mcp_proxy/tools/tasks/_lifecycle.py, src/gobby/mcp_proxy/registries.py
Pipeline executor integration into spawn_agent, workflow type validation (reject pipelines for step-based operations), task lifecycle improvements with UUID resolution and state updates.
Provider Support Expansion
src/gobby/mcp_proxy/tools/orchestration/orchestrate.py, src/gobby/mcp_proxy/tools/orchestration/review.py, src/gobby/mcp_proxy/tools/worktrees.py, src/gobby/agents/spawners/command_builder.py
Extended provider support to include cursor, windsurf, copilot across orchestration, review, and worktree tools; updated CLI command building.
Core Infrastructure Wiring
src/gobby/runner.py, src/gobby/app_context.py, src/gobby/servers/http.py, src/gobby/config/app.py, src/gobby/install/gemini/hooks/hook_dispatcher.py
Pipeline components initialization (WorkflowLoader, PipelineExecutor, LocalPipelineExecutionManager), bind_host config, WebSocket host binding, Gobby context propagation in hook dispatchers.
Documentation & Planning
docs/guides/pipelines.md, docs/guides/lobster-migration.md, docs/guides/workflows.md, docs/plans/completed/lobster-compatible-workflows.md, CLAUDE.md, README.md
New comprehensive pipeline documentation, Lobster migration guide, updated workflow documentation with inject_context sources, README with expanded CLI support.
Configuration & Metadata
.gitignore, pyproject.toml, .gobby/tasks_meta.json, .gobby/agents/meeseeks.yaml, .gobby/prompts/import/*
Version bump (0.2.9→0.2.10), pip dependency update, meeseeks agent reconfiguration with multi-workflow support, prompt restructuring.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTP as HTTP Server
    participant Executor as Pipeline Executor
    participant LLM as LLM Service
    participant DB as Database
    participant Webhook as Webhook Endpoint

    Client->>HTTP: POST /api/pipelines/run {name, inputs}
    HTTP->>DB: Load pipeline definition
    DB-->>HTTP: PipelineDefinition
    HTTP->>Executor: execute(pipeline, inputs, project_id)
    Executor->>DB: Create pipeline execution (RUNNING)
    DB-->>Executor: execution_id
    
    loop For each step
        Executor->>Executor: Check condition
        alt Step skipped
            Executor->>DB: Update step (SKIPPED)
        else Step approved
            alt Exec step
                Executor->>LLM: Execute command
                LLM-->>Executor: result
            else Prompt step
                Executor->>LLM: Generate with allowed_tools
                LLM-->>Executor: response
            else Nested pipeline
                Executor->>Executor: Recursive execute
            end
            
            alt Approval required
                Executor->>DB: Update execution (WAITING_APPROVAL)
                Executor->>Webhook: notify_approval_pending
                Webhook-->>Client: webhook event
                Client->>HTTP: POST /api/pipelines/approve/{token}
                HTTP->>Executor: approve(token)
                Executor->>DB: Resume execution
                Executor->>Executor: Continue remaining steps
            end
            
            Executor->>DB: Update step (COMPLETED) with output
        end
    end
    
    Executor->>DB: Update execution (COMPLETED) with outputs
    Executor->>Webhook: notify_complete
    Webhook-->>Client: completion webhook
    HTTP-->>Client: {status, execution_id, outputs}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

This PR introduces substantial new functionality with high heterogeneity across multiple subsystems: a complete pipeline execution system (executor, state models, webhooks, storage), support for three new CLI adapters with integration points throughout the codebase, significant workflow restructuring (multi-workflow support, consolidated skill injection), and enhanced agent spawning/isolation logic. While many changes follow consistent patterns (adding new CLI sources, installer implementations), the core pipeline system is dense with logic requiring careful validation, and the distributed integration touches hook handling, MCP proxies, HTTP routing, storage migrations, and CLI commands. The workflow and session-lifecycle changes introduce behavioral shifts that require understanding the broader context. However, the scope is well-organized into cohorts with clear responsibilities, and test coverage appears present for critical paths.

Possibly related PRs

  • Release v0.2.8 #4: Overlapping changes to agents, workflows, adapters, hooks, and MCP proxy subsystems including meeseeks workflow files, auto-task-claude modifications, adapter enhancements, HookManager activation paths, and spawn/registry tooling—shares the same architectural patterns and integration points.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Comment @coderabbitai help to get the list of available commands and usage tips.

# Conflicts:
#	.gobby/tasks_meta.json
@joshwilhelmi joshwilhelmi merged commit 023f787 into main Feb 3, 2026
6 of 8 checks passed
@claude
Copy link

claude bot commented Feb 3, 2026

Pull Request Review - v0.2.10

Summary

Comprehensive review of 460 changed files with 35k additions. The PR adds three new CLI adapters (Copilot, Cursor, Windsurf), a pipeline execution system, and agent/hook improvements. Overall architecture is solid with good test coverage (81%, 10,040 tests passing).


Critical Issues 🔴

1. PTY Reader Missing Test Coverage

File: src/gobby/agents/pty_reader.py (192 new lines, 0 test coverage)

This is a critical component for agent PTY management but has no tests. Missing coverage for:

  • Starting/stopping readers
  • FD closure during read
  • Concurrent reader operations
  • Resource cleanup
  • Error handling in callbacks
  • UTF-8 decode errors

Action Required: Add tests/agents/test_pty_reader.py before merge.

2. File Descriptor Leak Risk

File: src/gobby/agents/pty_reader.py:141-179

The _read_loop method catches FD errors but doesn't explicitly close master_fd. This could leak file descriptors over time.

Recommendation: Add explicit os.close(master_fd) in the finally block or document FD closure responsibility.


High Priority Issues ⚠️

3. Session Handler Complexity

File: src/gobby/hooks/event_handlers/_session.py:15-199

The handle_session_start method is 185 lines with deeply nested logic, making it hard to test all paths.

Recommendation: Refactor into smaller methods:

  • _find_existing_session()
  • _find_parent_session()
  • _register_new_session()
  • _activate_workflows()

4. Missing Adapter Edge Case Tests

File: tests/adapters/test_new_adapters.py

Good basic coverage but missing:

  • Malformed input handling (None, empty dict, wrong types)
  • Large payload handling (10MB+ outputs)
  • Concurrent request handling
  • Error recovery scenarios

Recommendation: Add negative test cases and stress tests.

5. Input Validation Gaps

File: src/gobby/cli/pipelines.py:76-81, 258-266

Pipeline inputs aren't validated for:

  • Key name format (special characters allowed)
  • Value length limits (DoS potential)
  • Content sanitization

Recommendation:

  • Validate keys are alphanumeric + underscore only
  • Add max length (e.g., 1000 chars per value)
  • Document input format constraints

Medium Priority Issues

6. Path Traversal in Pipeline Import

File: src/gobby/cli/pipelines.py:586-593

pipeline.name from Lobster files is used directly as filename without sanitization. Could allow path traversal like ../../../etc/passwd.

Fix:

dest_path = workflows_dir / Path(pipeline.name).name  # Strip directory components

7. Missing Input Validation in Adapters

Files: src/gobby/adapters/copilot.py:159-164 (and similar in cursor.py, windsurf.py)

No validation that native_event is a dict or contains required fields. Malformed input could cause exceptions.

Recommendation: Add type checks at start of translate_to_hook_event().

8. Unbounded Memory Growth in Pipelines

File: src/gobby/workflows/pipeline_executor.py:138-141, 230

All step outputs are stored in memory. Long pipelines with large outputs could consume excessive memory.

Recommendation: Add memory limits or document maximum pipeline size.


Low Priority Issues ℹ️

9. PTY Reader Race Condition

File: src/gobby/agents/pty_reader.py:88-104

Lock is released before task.cancel() completes. Concurrent start_reader() calls could create duplicate tasks.

Fix: Hold lock until after wait_for() completes.

10. Weak Type Annotations in Installers

Files: src/gobby/cli/installers/*.py

Using dict[str, Any] loses type safety. Prefer TypedDict or dataclass for return types.


Excellent Work ✅

  1. No command injection vulnerabilities - Proper use of shlex.split() and create_subprocess_exec()
  2. Comprehensive adapter documentation - Clear docstrings explaining hook mappings and conventions
  3. Consistent adapter architecture - All three CLIs follow same translation pattern
  4. Good test coverage overall - 81% coverage with 10,040 passing tests
  5. Proper async patterns - Good use of locks and cancellation

Overall Recommendation

Approve with changes requested. The architecture is solid and most implementation is high quality, but:

Must fix before merge:

  • Add test coverage for PTY reader (critical new component)
  • Fix file descriptor leak in PTY reader

Strongly recommended:

  • Add input validation to pipeline system
  • Fix path traversal in import
  • Refactor complex session handler
  • Add negative test cases for adapters

The pipeline system and adapter implementations are well-designed. Once PTY tests are added and the FD leak is fixed, this is ready to merge.


Test Results Validation: ✅ 10,040 tests passing, 81% coverage
Security Review: ⚠️ Minor issues identified above
Performance: ✅ No major concerns
Code Quality: ⚠️ Some refactoring recommended

Great work on this feature-rich release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant