fix(agent): reconcile skills-system review findings by mmabrouk · Pull Request #4851 · Agenta-AI/agenta

mmabrouk · 2026-06-25T19:17:09Z

Context

A review of the skills system surfaced four small drift and coverage gaps. None changes runtime behavior; they remove a dead, diverged file, lock a contract that regressed twice, and fix a comment that misstated how forced tools work.

Changes

Delete the orphan getting-started skill. services/agent/skills/agenta-getting-started/SKILL.md was a dead, diverged copy. Nothing reads that path: the runner materializes skills from the /run wire into per-run temp dirs (engines/skills.ts), and a grep across .ts/.py/.json found no code reference to the file. The live getting-started body is api/oss/src/core/workflows/platform_catalog.py (_GETTING_STARTED_BODY), served through the _agenta.* catalog. The catalog is the single source of truth, so the orphan is deleted (not wired in, which would change the served skill body).

Pin Claude-carries-skills in the cross-language wire contract. The skills seam was only exercised for Pi. run_request.pi_core.json carried a skills field; run_request.claude.json did not, so a merge that dropped Claude's skills would pass the contract test. This regressed twice before via merge-loss. The Claude golden now carries a resolved inline skills package, and both test_wire_contract.py (Python producer) and wire-contract.test.ts (TypeScript consumer) assert it, so the Claude skill seam is now locked the same way Pi's is.

Reconcile the overclaiming forced-tools comment. The comment on AGENTA_FORCED_TOOLS in sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py claimed read/bash are forced "because Pi only renders skills when read is available." That is not how it works today. The runner never consumes request.tools / builtin_names to grant builtins (verified: no engine code reads the wire tools field for delivery). Pi gets read and bash from its own DEFAULTS, so skills render regardless of this list. The comment now states reality and notes that wiring the forced set to actually deliver builtins over the wire is deferred (it works via Pi defaults today). Comment-only change.

Guard the disable-model-invocation spelling. Added a runner test asserting the composed Pi skill frontmatter uses exactly the hyphenated disable-model-invocation: key and never the camelCase (disableModelInvocation) or snake_case (disable_model_invocation) spellings. This guards the three-spelling drop risk in engines/skills.ts.

Scope / risk

No runtime behavior changes. The deletion removes a file no code path reads; the golden + test additions only tighten the existing contract; the comment and the new test add no production code. The Claude golden gains a skills key, which the Claude config already emitted via the inherited wire_skills seam, so this only records what the wire already produces. Decisions for the deletion and the comment reconcile are logged in docs/design/agent-workflows/projects/qa/final-sweep-decisions.md (D-001, D-002).

Tests / notes

Runner: pnpm test in services/agent (226 tests) and pnpm run typecheck (the compile-time wire-key guard) both green.
SDK: the full oss/tests/pytest/unit/agents/ suite (348) plus the SDK skill-flag/catalog suites and the API test_platform_catalog.py (29) all green.
ruff format and ruff check clean on the touched Python files.
The golden JSON files are intentionally hand-formatted in a compact style (the sibling run_request.pi_core.json and run_result.ok.json also do not pass prettier --check); the new Claude skills block matches that style.

How to QA

Prerequisites: a local checkout on this branch. No running stack needed; this is a test/contract change.

Steps:

Run the runner wire + skills suites:

cd services/agent && pnpm exec vitest run tests/unit/wire-contract.test.ts tests/unit/skills.test.ts tests/unit/sandbox-agent-workspace.test.ts && pnpm run typecheck

Run the SDK wire + agents suite:

cd sdks/python && uv run python -m pytest oss/tests/pytest/unit/agents/ -n0 -q

Confirm the orphan file is gone:

ls services/agent/skills/agenta-getting-started/SKILL.md

Expected result: Step 1 passes (the Claude golden now asserts a skills block; the new disable-model-invocation spelling test passes). Step 2 passes (348 tests; the Claude wire-contract test asserts skills). Step 3 reports the file does not exist.

Automated tests:

cd services/agent && pnpm test
cd sdks/python && uv run python -m pytest oss/tests/pytest/unit/agents/ -n0 -q

Edge cases: The platform getting-started skill must still resolve from the catalog after the file deletion. api/oss/tests/pytest/unit/workflows/test_platform_catalog.py covers this (the slug _agenta.agenta-getting-started resolves to _GETTING_STARTED_BODY); it stays green.

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

Low-risk cleanup of skills-system review findings: - Delete the orphan, diverged services/agent/skills/agenta-getting-started/SKILL.md. Nothing reads that path: the runner materializes skills from the /run wire into per-run temp dirs. The live getting-started body is platform_catalog.py (_GETTING_STARTED_BODY), so the catalog is the single source of truth. - Pin Claude-carries-skills in the cross-language wire contract: add a skills field to the run_request.claude.json golden plus Python and TypeScript assertions, so the Claude skill seam (which regressed twice via merge-loss) is locked. - Reconcile the overclaiming AGENTA_FORCED_TOOLS comment in agenta_builtins.py: the runner never consumes request.tools/builtin_names; Pi gets read/bash from its own DEFAULTS. Wiring the forced set to deliver builtins over the wire is deferred. - Add a runner test asserting the composed Pi skill frontmatter uses the hyphenated disable-model-invocation: key, never the camelCase/snake_case spellings. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

vercel · 2026-06-25T19:17:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 25, 2026 7:17pm

coderabbitai · 2026-06-25T19:17:19Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b884bfb3-8f1c-47cb-a04c-7c0e2c431f63

📥 Commits

Reviewing files that changed from the base of the PR and between 0c8226a and 9b24931.

📒 Files selected for processing (6)

sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
services/agent/skills/agenta-getting-started/SKILL.md
services/agent/tests/unit/skills.test.ts
services/agent/tests/unit/wire-contract.test.ts

💤 Files with no reviewable changes (1)

services/agent/skills/agenta-getting-started/SKILL.md

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added support for including skill instructions in Claude-related request payloads, with the expected file and execution settings preserved.
- Skill metadata is now rendered with consistent frontmatter naming for disabled model invocation.
Documentation
- Removed outdated placeholder “getting started” skill content.
Tests
- Expanded contract and unit coverage for skill payloads and frontmatter rendering.

Walkthrough

Claude wire-request fixtures now include inline release-notes skill data, tests assert the serialized skills payload on both Python and TypeScript sides, the SKILL.md materializer renders a hyphenated disable-model-invocation key, and a builtins comment clarifies forced-tools metadata.

Changes

Inline skill contract

Layer / File(s)	Summary
Claude skill payload contract `sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json`, `sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py`, `services/agent/tests/unit/wire-contract.test.ts`	The Claude request fixture adds a release-notes skill, and the wire-contract tests now assert the serialized skills seam includes the expected skill fields, file path, executable flag, and disableModelInvocation flag.
SKILL.md frontmatter key `services/agent/tests/unit/skills.test.ts`	The SKILL.md materializer test now expects disableModelInvocation to render as the hyphenated YAML key disable-model-invocation and excludes the camelCase and snake_case forms.
Forced-tools metadata note `sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py`	The forced builtins comment now states that the forced-tools list is configuration metadata, that request.tools/builtin_names do not grant builtins over the wire, and that read/bash come from Pi defaults.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 60.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	It clearly summarizes the main change set: reconciling skills-system review findings in the agent.
Description check	✅ Passed	The description matches the PR's cleanup and contract-hardening changes and is on-topic.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/agent-skills-system-review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

mmabrouk · 2026-06-25T19:17:42Z

What this PR does (maps to the skills-system review findings):

Finding 1 (HIGH) — verified dead, deleted. services/agent/skills/agenta-getting-started/SKILL.md was an orphan, diverged copy. Grep-confirmed no code reads that path (the runner materializes skills from the wire into temp dirs). Live source is the catalog (platform_catalog.py _GETTING_STARTED_BODY). Deleted, not wired in.
Finding 2 (HIGH) — already covered, verified. The Claude project-local skill-install tests already exist in sandbox-agent-workspace.test.ts (local + Daytona branches assert the skill lands at <cwd>/.claude/skills/<name>/SKILL.md). No new test needed; confirmed green.
Finding 3 (MEDIUM) — added. The Claude golden (run_request.claude.json) now carries a skills field, with assertions on both the Python (test_wire_contract.py) and TypeScript (wire-contract.test.ts) sides, so the Claude skill seam is locked (it regressed twice via merge-loss).
Finding 4 (MEDIUM) — comment reconciled. Fixed the overclaiming AGENTA_FORCED_TOOLS comment in agenta_builtins.py (the runner never reads request.tools; Pi gets read from its DEFAULTS). Wiring the forced set to deliver builtins over the wire is explicitly deferred.
Finding 6 (LOW) — added. New runner test pins the hyphenated disable-model-invocation: frontmatter key and asserts the camelCase/snake_case spellings do not leak.

Finding 5 (live Claude auto-discovery) is a QA check, not a code fix, and is out of scope here.

What feedback is needed: a code review of the contract additions, specifically (a) whether the Claude golden skills block is the shape you want pinned, and (b) whether the reconciled AGENTA_FORCED_TOOLS comment + the logged deferral (D-002) read correctly. Decisions D-001 (delete the orphan) and D-002 (defer the forced-tools wiring) are logged in docs/design/agent-workflows/projects/qa/final-sweep-decisions.md.

Tests: runner pnpm test 226 + typecheck green; SDK agents 348 + API platform-catalog 29 green; ruff clean.

mmabrouk · 2026-06-25T19:17:47Z

@coderabbitai review

coderabbitai · 2026-06-25T19:17:54Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

mmabrouk

lgtm

dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 25, 2026

dosubot Bot added the tests label Jun 25, 2026

mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jun 25, 2026

mmabrouk commented Jun 25, 2026

View reviewed changes

mmabrouk added lgtm This PR has been approved by a maintainer and removed needs-review Agent updated; awaiting Mahmoud's review labels Jun 25, 2026

mmabrouk merged commit d5d8dc7 into big-agents Jun 25, 2026
26 of 28 checks passed

mmabrouk mentioned this pull request Jun 25, 2026

fix(agent): restore services/agent/skills dir so the sandbox-agent image build works #4852

Merged

This was referenced Jun 28, 2026

[fix] Clean up Skill Template #4900

Merged

docs(agent): platform skills design for agent self-building #4918

Closed

feat(agent): build skills (#4918) #4924

Closed

feat(agent): build skills (#4918) #4930

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agent): reconcile skills-system review findings#4851

fix(agent): reconcile skills-system review findings#4851
mmabrouk merged 1 commit into
big-agentsfrom
fix/agent-skills-system-review

mmabrouk commented Jun 25, 2026

Uh oh!

vercel Bot commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

mmabrouk left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mmabrouk commented Jun 25, 2026

Context

Changes

Scope / risk

Tests / notes

How to QA

Uh oh!

vercel Bot commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmabrouk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading