Skip to content

fix(agent): reconcile skills-system review findings#4851

Merged
mmabrouk merged 1 commit into
big-agentsfrom
fix/agent-skills-system-review
Jun 25, 2026
Merged

fix(agent): reconcile skills-system review findings#4851
mmabrouk merged 1 commit into
big-agentsfrom
fix/agent-skills-system-review

Conversation

@mmabrouk

Copy link
Copy Markdown
Member

Context

A review of the skills system surfaced four small drift and coverage gaps. None changes runtime behavior; they remove a dead, diverged file, lock a contract that regressed twice, and fix a comment that misstated how forced tools work.

Changes

Delete the orphan getting-started skill. services/agent/skills/agenta-getting-started/SKILL.md was a dead, diverged copy. Nothing reads that path: the runner materializes skills from the /run wire into per-run temp dirs (engines/skills.ts), and a grep across .ts/.py/.json found no code reference to the file. The live getting-started body is api/oss/src/core/workflows/platform_catalog.py (_GETTING_STARTED_BODY), served through the _agenta.* catalog. The catalog is the single source of truth, so the orphan is deleted (not wired in, which would change the served skill body).

Pin Claude-carries-skills in the cross-language wire contract. The skills seam was only exercised for Pi. run_request.pi_core.json carried a skills field; run_request.claude.json did not, so a merge that dropped Claude's skills would pass the contract test. This regressed twice before via merge-loss. The Claude golden now carries a resolved inline skills package, and both test_wire_contract.py (Python producer) and wire-contract.test.ts (TypeScript consumer) assert it, so the Claude skill seam is now locked the same way Pi's is.

Reconcile the overclaiming forced-tools comment. The comment on AGENTA_FORCED_TOOLS in sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py claimed read/bash are forced "because Pi only renders skills when read is available." That is not how it works today. The runner never consumes request.tools / builtin_names to grant builtins (verified: no engine code reads the wire tools field for delivery). Pi gets read and bash from its own DEFAULTS, so skills render regardless of this list. The comment now states reality and notes that wiring the forced set to actually deliver builtins over the wire is deferred (it works via Pi defaults today). Comment-only change.

Guard the disable-model-invocation spelling. Added a runner test asserting the composed Pi skill frontmatter uses exactly the hyphenated disable-model-invocation: key and never the camelCase (disableModelInvocation) or snake_case (disable_model_invocation) spellings. This guards the three-spelling drop risk in engines/skills.ts.

Scope / risk

No runtime behavior changes. The deletion removes a file no code path reads; the golden + test additions only tighten the existing contract; the comment and the new test add no production code. The Claude golden gains a skills key, which the Claude config already emitted via the inherited wire_skills seam, so this only records what the wire already produces. Decisions for the deletion and the comment reconcile are logged in docs/design/agent-workflows/projects/qa/final-sweep-decisions.md (D-001, D-002).

Tests / notes

  • Runner: pnpm test in services/agent (226 tests) and pnpm run typecheck (the compile-time wire-key guard) both green.
  • SDK: the full oss/tests/pytest/unit/agents/ suite (348) plus the SDK skill-flag/catalog suites and the API test_platform_catalog.py (29) all green.
  • ruff format and ruff check clean on the touched Python files.
  • The golden JSON files are intentionally hand-formatted in a compact style (the sibling run_request.pi_core.json and run_result.ok.json also do not pass prettier --check); the new Claude skills block matches that style.

How to QA

Prerequisites: a local checkout on this branch. No running stack needed; this is a test/contract change.

Steps:

  1. Run the runner wire + skills suites:
    cd services/agent && pnpm exec vitest run tests/unit/wire-contract.test.ts tests/unit/skills.test.ts tests/unit/sandbox-agent-workspace.test.ts && pnpm run typecheck
    
  2. Run the SDK wire + agents suite:
    cd sdks/python && uv run python -m pytest oss/tests/pytest/unit/agents/ -n0 -q
    
  3. Confirm the orphan file is gone:
    ls services/agent/skills/agenta-getting-started/SKILL.md
    

Expected result: Step 1 passes (the Claude golden now asserts a skills block; the new disable-model-invocation spelling test passes). Step 2 passes (348 tests; the Claude wire-contract test asserts skills). Step 3 reports the file does not exist.

Automated tests:

cd services/agent && pnpm test
cd sdks/python && uv run python -m pytest oss/tests/pytest/unit/agents/ -n0 -q

Edge cases: The platform getting-started skill must still resolve from the catalog after the file deletion. api/oss/tests/pytest/unit/workflows/test_platform_catalog.py covers this (the slug _agenta.agenta-getting-started resolves to _GETTING_STARTED_BODY); it stays green.

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

Low-risk cleanup of skills-system review findings:

- Delete the orphan, diverged services/agent/skills/agenta-getting-started/SKILL.md.
  Nothing reads that path: the runner materializes skills from the /run wire into
  per-run temp dirs. The live getting-started body is platform_catalog.py
  (_GETTING_STARTED_BODY), so the catalog is the single source of truth.
- Pin Claude-carries-skills in the cross-language wire contract: add a skills field
  to the run_request.claude.json golden plus Python and TypeScript assertions, so the
  Claude skill seam (which regressed twice via merge-loss) is locked.
- Reconcile the overclaiming AGENTA_FORCED_TOOLS comment in agenta_builtins.py: the
  runner never consumes request.tools/builtin_names; Pi gets read/bash from its own
  DEFAULTS. Wiring the forced set to deliver builtins over the wire is deferred.
- Add a runner test asserting the composed Pi skill frontmatter uses the hyphenated
  disable-model-invocation: key, never the camelCase/snake_case spellings.

Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 25, 2026
@vercel

vercel Bot commented Jun 25, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 25, 2026 7:17pm

Request Review

@dosubot dosubot Bot added the tests label Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b884bfb3-8f1c-47cb-a04c-7c0e2c431f63

📥 Commits

Reviewing files that changed from the base of the PR and between 0c8226a and 9b24931.

📒 Files selected for processing (6)
  • sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
  • sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
  • services/agent/skills/agenta-getting-started/SKILL.md
  • services/agent/tests/unit/skills.test.ts
  • services/agent/tests/unit/wire-contract.test.ts
💤 Files with no reviewable changes (1)
  • services/agent/skills/agenta-getting-started/SKILL.md

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added support for including skill instructions in Claude-related request payloads, with the expected file and execution settings preserved.
    • Skill metadata is now rendered with consistent frontmatter naming for disabled model invocation.
  • Documentation

    • Removed outdated placeholder “getting started” skill content.
  • Tests

    • Expanded contract and unit coverage for skill payloads and frontmatter rendering.

Walkthrough

Claude wire-request fixtures now include inline release-notes skill data, tests assert the serialized skills payload on both Python and TypeScript sides, the SKILL.md materializer renders a hyphenated disable-model-invocation key, and a builtins comment clarifies forced-tools metadata.

Changes

Inline skill contract

Layer / File(s) Summary
Claude skill payload contract
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json, sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py, services/agent/tests/unit/wire-contract.test.ts
The Claude request fixture adds a release-notes skill, and the wire-contract tests now assert the serialized skills seam includes the expected skill fields, file path, executable flag, and disableModelInvocation flag.
SKILL.md frontmatter key
services/agent/tests/unit/skills.test.ts
The SKILL.md materializer test now expects disableModelInvocation to render as the hyphenated YAML key disable-model-invocation and excludes the camelCase and snake_case forms.
Forced-tools metadata note
sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
The forced builtins comment now states that the forced-tools list is configuration metadata, that request.tools/builtin_names do not grant builtins over the wire, and that read/bash come from Pi defaults.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed It clearly summarizes the main change set: reconciling skills-system review findings in the agent.
Description check ✅ Passed The description matches the PR's cleanup and contract-hardening changes and is on-topic.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/agent-skills-system-review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mmabrouk mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jun 25, 2026
@mmabrouk

Copy link
Copy Markdown
Member Author

What this PR does (maps to the skills-system review findings):

  • Finding 1 (HIGH) — verified dead, deleted. services/agent/skills/agenta-getting-started/SKILL.md was an orphan, diverged copy. Grep-confirmed no code reads that path (the runner materializes skills from the wire into temp dirs). Live source is the catalog (platform_catalog.py _GETTING_STARTED_BODY). Deleted, not wired in.
  • Finding 2 (HIGH) — already covered, verified. The Claude project-local skill-install tests already exist in sandbox-agent-workspace.test.ts (local + Daytona branches assert the skill lands at <cwd>/.claude/skills/<name>/SKILL.md). No new test needed; confirmed green.
  • Finding 3 (MEDIUM) — added. The Claude golden (run_request.claude.json) now carries a skills field, with assertions on both the Python (test_wire_contract.py) and TypeScript (wire-contract.test.ts) sides, so the Claude skill seam is locked (it regressed twice via merge-loss).
  • Finding 4 (MEDIUM) — comment reconciled. Fixed the overclaiming AGENTA_FORCED_TOOLS comment in agenta_builtins.py (the runner never reads request.tools; Pi gets read from its DEFAULTS). Wiring the forced set to deliver builtins over the wire is explicitly deferred.
  • Finding 6 (LOW) — added. New runner test pins the hyphenated disable-model-invocation: frontmatter key and asserts the camelCase/snake_case spellings do not leak.

Finding 5 (live Claude auto-discovery) is a QA check, not a code fix, and is out of scope here.

What feedback is needed: a code review of the contract additions, specifically (a) whether the Claude golden skills block is the shape you want pinned, and (b) whether the reconciled AGENTA_FORCED_TOOLS comment + the logged deferral (D-002) read correctly. Decisions D-001 (delete the orphan) and D-002 (defer the forced-tools wiring) are logged in docs/design/agent-workflows/projects/qa/final-sweep-decisions.md.

Tests: runner pnpm test 226 + typecheck green; SDK agents 348 + API platform-catalog 29 green; ruff clean.

@mmabrouk

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@mmabrouk mmabrouk left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mmabrouk mmabrouk added lgtm This PR has been approved by a maintainer and removed needs-review Agent updated; awaiting Mahmoud's review labels Jun 25, 2026
@mmabrouk mmabrouk merged commit d5d8dc7 into big-agents Jun 25, 2026
26 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant