fix(agent): reconcile skills-system review findings#4851
Conversation
Low-risk cleanup of skills-system review findings: - Delete the orphan, diverged services/agent/skills/agenta-getting-started/SKILL.md. Nothing reads that path: the runner materializes skills from the /run wire into per-run temp dirs. The live getting-started body is platform_catalog.py (_GETTING_STARTED_BODY), so the catalog is the single source of truth. - Pin Claude-carries-skills in the cross-language wire contract: add a skills field to the run_request.claude.json golden plus Python and TypeScript assertions, so the Claude skill seam (which regressed twice via merge-loss) is locked. - Reconcile the overclaiming AGENTA_FORCED_TOOLS comment in agenta_builtins.py: the runner never consumes request.tools/builtin_names; Pi gets read/bash from its own DEFAULTS. Wiring the forced set to deliver builtins over the wire is deferred. - Add a runner test asserting the composed Pi skill frontmatter uses the hyphenated disable-model-invocation: key, never the camelCase/snake_case spellings. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (6)
💤 Files with no reviewable changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughClaude wire-request fixtures now include inline release-notes skill data, tests assert the serialized skills payload on both Python and TypeScript sides, the SKILL.md materializer renders a hyphenated disable-model-invocation key, and a builtins comment clarifies forced-tools metadata. ChangesInline skill contract
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
What this PR does (maps to the skills-system review findings):
Finding 5 (live Claude auto-discovery) is a QA check, not a code fix, and is out of scope here. What feedback is needed: a code review of the contract additions, specifically (a) whether the Claude golden Tests: runner |
|
@coderabbitai review |
✅ Action performedReview finished.
|
Context
A review of the skills system surfaced four small drift and coverage gaps. None changes runtime behavior; they remove a dead, diverged file, lock a contract that regressed twice, and fix a comment that misstated how forced tools work.
Changes
Delete the orphan getting-started skill.
services/agent/skills/agenta-getting-started/SKILL.mdwas a dead, diverged copy. Nothing reads that path: the runner materializes skills from the/runwire into per-run temp dirs (engines/skills.ts), and a grep across.ts/.py/.jsonfound no code reference to the file. The live getting-started body isapi/oss/src/core/workflows/platform_catalog.py(_GETTING_STARTED_BODY), served through the_agenta.*catalog. The catalog is the single source of truth, so the orphan is deleted (not wired in, which would change the served skill body).Pin Claude-carries-skills in the cross-language wire contract. The
skillsseam was only exercised for Pi.run_request.pi_core.jsoncarried askillsfield;run_request.claude.jsondid not, so a merge that dropped Claude's skills would pass the contract test. This regressed twice before via merge-loss. The Claude golden now carries a resolved inlineskillspackage, and bothtest_wire_contract.py(Python producer) andwire-contract.test.ts(TypeScript consumer) assert it, so the Claude skill seam is now locked the same way Pi's is.Reconcile the overclaiming forced-tools comment. The comment on
AGENTA_FORCED_TOOLSinsdks/python/agenta/sdk/agents/adapters/agenta_builtins.pyclaimedread/bashare forced "because Pi only renders skills when read is available." That is not how it works today. The runner never consumesrequest.tools/builtin_namesto grant builtins (verified: no engine code reads the wiretoolsfield for delivery). Pi getsreadandbashfrom its own DEFAULTS, so skills render regardless of this list. The comment now states reality and notes that wiring the forced set to actually deliver builtins over the wire is deferred (it works via Pi defaults today). Comment-only change.Guard the
disable-model-invocationspelling. Added a runner test asserting the composed Pi skill frontmatter uses exactly the hyphenateddisable-model-invocation:key and never the camelCase (disableModelInvocation) or snake_case (disable_model_invocation) spellings. This guards the three-spelling drop risk inengines/skills.ts.Scope / risk
No runtime behavior changes. The deletion removes a file no code path reads; the golden + test additions only tighten the existing contract; the comment and the new test add no production code. The Claude golden gains a
skillskey, which the Claude config already emitted via the inheritedwire_skillsseam, so this only records what the wire already produces. Decisions for the deletion and the comment reconcile are logged indocs/design/agent-workflows/projects/qa/final-sweep-decisions.md(D-001, D-002).Tests / notes
pnpm testinservices/agent(226 tests) andpnpm run typecheck(the compile-time wire-key guard) both green.oss/tests/pytest/unit/agents/suite (348) plus the SDK skill-flag/catalog suites and the APItest_platform_catalog.py(29) all green.ruff formatandruff checkclean on the touched Python files.run_request.pi_core.jsonandrun_result.ok.jsonalso do not passprettier --check); the new Claudeskillsblock matches that style.How to QA
Prerequisites: a local checkout on this branch. No running stack needed; this is a test/contract change.
Steps:
Expected result: Step 1 passes (the Claude golden now asserts a
skillsblock; the newdisable-model-invocationspelling test passes). Step 2 passes (348 tests; the Claude wire-contract test assertsskills). Step 3 reports the file does not exist.Automated tests:
Edge cases: The platform getting-started skill must still resolve from the catalog after the file deletion.
api/oss/tests/pytest/unit/workflows/test_platform_catalog.pycovers this (the slug_agenta.agenta-getting-startedresolves to_GETTING_STARTED_BODY); it stays green.https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc