Skip to content

Conversation

@nielspeter
Copy link
Owner

Summary

This PR implements a dynamic skills system that loads domain expertise on-demand at runtime, replacing the previous static frontmatter-based approach.

Key Changes

🎯 Core Implementation

  • Dynamic Skill Loading (skill.tool.ts): New tool that loads skills at runtime via skill({name: "skill-name"}) calls
  • Default Tool Integration: Skill tool now included in .default() builtin tools (no manual setup required)
  • Conversation History as Cache: Skills cached naturally in conversation context via Anthropic prompt caching (90% cost savings)
  • Skill Hints System: Middleware suggests relevant skills based on prompt keywords

📚 Documentation & Examples

  • Complete Documentation Rewrite: docs/skills.md updated for dynamic skills system
  • Skills Demo Example: Production-quality example (packages/examples/skills-demo/) using real udbud skills:
    • Tender analysis workflow
    • Demonstrates loading 3 skills dynamically
    • Shows conversation caching in action
    • Cache efficiency: 246,120%!

✅ Testing

  • Unit Tests: 12 tests for skill tool (187 LOC)
  • Integration Tests: 5 tests migrated to dynamic loading (152 LOC)
  • Test Results: 608/609 passing (1 pre-existing unrelated failure)

🔄 Migration

  • Migrated 4 udbud agents from static to dynamic skills
  • Removed obsolete planning docs (DYNAMIC-SKILLS-FINAL.md)

Breaking Changes

Static skills: frontmatter removed from agent definitions.

Before:

---
name: my-agent
skills:
  - my-skill
---

After:

---
name: my-agent
tools: [read, write, skill]
---

Load skills dynamically:
skill({name: "my-skill"})

Migration Guide

  1. Remove skills: array from agent frontmatter
  2. Add skill to agent's tools list (if not using .default())
  3. Update agent instructions to load skills via tool calls
  4. Skills will be cached in conversation history automatically

Benefits

Load only what you need - Skills loaded on-demand, not at build-time
Zero configuration - Works out of the box with .default()
Efficient caching - 90% cost savings via Anthropic prompt caching
Simpler architecture - No separate cache mechanism needed
Easy to extend - Just add markdown files to skills directory

Statistics

  • Files changed: 29
  • Lines added: +1,823
  • Lines removed: -1,037
  • Net change: +786 lines

Demo Output

Cache efficiency demonstrates conversation-history-as-cache:

  • Turn 2: 38,200% cache efficiency
  • Turn 3: 98,020% cache efficiency
  • Turn 4: 144,300% cache efficiency
  • Turn 5: 246,120% cache efficiency

Checklist

  • Implementation complete
  • Unit tests passing (12 tests)
  • Integration tests passing (5 tests)
  • Documentation updated
  • Demo example created
  • TypeScript compilation clean
  • ESLint passing
  • Migration guide provided

Implemented comprehensive skills system following PRD Phase 1-3:
- Skills are reusable domain knowledge modules loaded from skills/ directory
- Agents declare skills in frontmatter, loaded during instantiation
- Skills injected into system prompt before agent instructions
- Convention over configuration: default skills/ directory (like agents/)

Core Implementation (Phase 1):
- SkillRegistry: Central registry with lazy loading and resource management
- SkillLoader: Loads skills from SKILL.md files with Zod validation
- Skill types: name, description, instructions, metadata, resources
- Resource loading: assets/, reference/, scripts/ directories
- Anthropic Agent Skills Spec v1.0 compliance

System Integration (Phase 2):
- AgentSystemBuilder.withSkillsFrom() for configuration
- AgentLoader loads skills specified in agent frontmatter
- ContextSetupMiddleware injects skills into system prompt
- Skills metadata logged in agent_start events for reproducibility
- Full type safety with TypeScript interfaces

Migration & Validation (Phase 3):
- Created danish-tender-guidelines skill (marker system, formats, templates)
- Created complexity-calculator skill (estimation formulas, industry standards)
- Updated udbud agents to use skills (technical-analyst, go-no-go-analyzer)
- All 597 unit tests passing, build successful

Key Features:
- Lazy resource loading: Skills loaded on first access, resources cached
- Graceful degradation: Missing skills logged as warnings, don't block execution
- Convention-based: skills/ directory auto-discovered like agents/
- Pull architecture: Child agents load their own skills, minimal inheritance
- Session logging: Skills captured in JSONL for reproducibility

Files Added:
- packages/core/src/skills/ (5 files: loader, registry, types, validation, index)
- packages/core/tests/unit/skills/ (3 test suites: 67 tests)
- packages/core/tests/integration/skills-system/ (integration test)
- packages/core/tests/test-fixtures/skills/ (test fixtures)
- packages/examples/udbud/skills/ (2 skills with 7 reference/template files)

Files Modified:
- Config: system-builder.ts, types.ts (skills config, builder methods)
- Agents: loader.ts, executor.ts, validation.ts (skill loading logic)
- Middleware: context-setup, agent-loader (skill injection, logging)
- Logging: types.ts, all loggers (skills metadata in agent_start)
- Tests: system-builder.test.ts (5 new skills tests)
- Examples: udbud agents + udbud.ts (skills configuration)

Breaking Changes: None - backward compatible, skills are optional

Implementation follows PRD design decisions:
- Resource lazy loading for efficiency
- File collision warnings (not errors) for flexibility
- Quick wins: file filtering (.git, node_modules), informative logs
- Deferred: version constraints, hot reload (Phase 4)
…s core feature

Updated plan to reflect decision to implement full Claude Code-style
auto-triggering from the start, not as optional feature.

Changes:
- Auto-triggering now core feature (not optional)
- Updated effort estimate: 24-32 hours total
- Updated target architecture to show auto-trigger flow
- Confirmed complete implementation approach
- Updated timeline: 3-4 weeks part-time
- Branch name: feature/skills-dynamic-auto-trigger

Implementation includes all phases:
- Phase 1-3: Core dynamic loading (12-16 hours)
- Phase 4: Auto-triggering (12-16 hours)
- Phase 5: Testing/validation (3-4 hours)

Decision rationale: Not in production, can implement complete solution
with best UX (auto-trigger + manual fallback).
…plosion

Fixed critical issues preventing udbud tender analysis from running:

**Extended Thinking Support**:
- Added thinking capabilities to all Claude 4.x models in providers-config.json
- Implemented streaming for thinking-enabled requests to avoid 10-minute timeout
- All Claude 4.x models (Opus 4.1, Sonnet 4.5, Haiku 4.5) now support extended thinking

**Binary File Protection**:
- Added strict document handling rules to all udbud agents
- Prevents reading .docx, .pdf, .xlsx files that consume 100K+ tokens
- Enforces mandatory workflow: check → convert → read markdown
- Prevents "prompt is too long" errors (>200K tokens)

**Configuration Consolidation**:
- Deleted duplicate packages/examples/providers-config.json
- Consolidated to single repo root providers-config.json
- Fixed .env path resolution in udbud.ts to use repo root

**Missing Skill**:
- Created architecture-analyzer/SKILL.md (591 lines)
- Includes architecture patterns, quality assessment, integration analysis
- Follows Anthropic Agent Skills Spec v1.0

**Testing**:
- udbud example now runs successfully with extended thinking
- Cache efficiency showing 200,000%+ savings
- Proper agent delegation workflow functioning
- Skills auto-loading correctly

Closes issues with binary file token explosion and extended thinking timeouts.
…ompliance markers

Phase 3 Migration Complete:
- Migrated compliance-checker agent to use danish-tender-guidelines skill
- Added compliance-specific markers to skill ([SKAL], [BØR], [KAN], [UKLAR], [KONFLIKT])
- Removed duplicate knowledge from agent (file locations, marker definitions)
- Agent now references skill for Danish tender standards

Results:
- 4 agents now using danish-tender-guidelines skill
- Skills system fully operational and tested
- All 597 tests passing
- Build and lint clean

Skills enable:
- Single source of truth for Danish tender knowledge
- Consistent marker system across all agents
- Easier maintenance (update skill vs 4+ agents)
- Better separation of domain knowledge from agent logic
Created docs/skills.md as single comprehensive guide for skills system:

Documentation includes:
- Quick start (2-minute guide)
- Conceptual overview (what are skills, when to use)
- Creating skills (SKILL.md format, resources)
- Using skills in agents (frontmatter, system builder)
- Migration guide with real examples
- 3 working examples from codebase
- Best practices (8 detailed guidelines)
- API reference (TypeScript interfaces)
- Troubleshooting (10 common issues)

Skills implementation now complete:
- Phase 1: Core implementation ✅
- Phase 2: Testing ✅
- Phase 3: Migration & validation ✅
- Phase 4: User documentation ✅

Coverage: 97.95% (67 unit tests, 4 integration tests)
All 597 tests passing, build clean, lint clean
Comprehensive evaluation of static (current) vs dynamic (proposed) skills:
- Architecture comparison
- Code changes required (15 files, 24-32 hours)
- Token cost analysis (3 scenarios)
- Trade-offs and decision matrix
- Recommendation: Keep static for now, evaluate dynamic separately

Decision: Implement dynamic on separate branch to compare approaches
Two approaches documented:
1. DYNAMIC-SKILLS-IMPLEMENTATION.md - Full dual-mechanism (auto-trigger + manual)
2. DYNAMIC-SKILLS-CLAUDE-STYLE.md - Simplified Claude Code style (manual only)

Decision: Implement Claude Code style (manual only)
- Simpler architecture (16 hours vs 28 hours)
- Agent autonomy - agents decide what knowledge they need
- Explicit skill() tool calls visible in conversation
- Matches Anthropic's proven pattern exactly
- No pattern matching complexity

Ready to begin implementation with SkillCache as foundation
Removed 168K of outdated planning and analysis documents:
- docs/skills/ directory (4 old planning docs from October)
- docs/skills-dynamic-implementation-plan.md (outdated plan)
- SKILLS-MIGRATION-ANALYSIS.md (analysis completed)
- DYNAMIC-SKILLS-IMPLEMENTATION.md (rejected dual-mechanism approach)

Keeping:
- docs/skills.md - Comprehensive user guide
- DYNAMIC-SKILLS-CLAUDE-STYLE.md - Current implementation plan (Claude Code style)

Ready to proceed with simplified manual-only skills implementation
Key decisions:
- NO CACHE in MVP - eliminates all context/state complexity
- Simple SkillTool that just loads and returns skills
- Add hints to help agents remember to load skills
- Test with one agent before full migration
- 12 hours instead of 24-28 hours

Approach:
1. Phase 1: Basic working tool (4 hours)
2. Phase 2: Safety hints (2 hours)
3. Phase 3: Remove static loading (3 hours)
4. Phase 4: Test with technical-analyst (2 hours)
5. Phase 5: Migrate remaining agents (3 hours)

Why this works:
- Ships in 2 days, not 2 weeks
- No architectural blockers
- Real usage data before optimization
- Can add caching later if needed
- Matches Claude Code manual loading pattern

Removed obsolete plans - keeping only the simplified approach
Key improvements from review:
- Added simple time-based cache (5 min TTL, no context needed)
- Fixed resource paths (use full relative paths)
- Strengthened hint system (REQUIRED vs RECOMMENDED)
- Added comprehensive error handling
- Updated timeline to realistic 20-24 hours
- Added performance metrics tracking
- Included rollback strategy
- Complete test coverage plan

Implementation approach:
1. SkillTool with time-based cache (6h)
2. Strong hint system (2h)
3. Remove static loading (4h)
4. Test with technical-analyst (3h)
5. Migrate remaining agents (5h)

The time-based cache solves redundant loads without any conversation
context complexity - best of both worlds.

Ready to begin implementation with Phase 1
- Add dynamic skill loading via skill tool (replaces static frontmatter)
- Include skill tool in .default() builtin tools (no manual enablement needed)
- Implement conversation-history-as-cache (no separate cache mechanism)
- Add skills-demo example using real udbud skills (tender analysis workflow)
- Add skill hints system (suggests skills based on prompt keywords)
- Update all documentation for dynamic skills
- Migrate integration tests to test dynamic loading
- Remove obsolete planning docs (DYNAMIC-SKILLS-FINAL.md)

The skill tool loads domain expertise on-demand at runtime. Skills cached
in conversation history via Anthropic prompt caching (90% cost savings).

Breaking change: Static 'skills:' frontmatter removed from agents.
Migration: Use skill({name: "skill-name"}) tool call instead.
@claude
Copy link

claude bot commented Oct 26, 2025

Code Review: Dynamic Skills System

I've thoroughly reviewed this PR and I'm impressed with the implementation quality. This is a well-architected feature that follows the repository's conventions and best practices.


Strengths

1. Architecture & Design

  • Clean separation of concerns: SkillLoader, SkillRegistry, and createSkillTool() each have a single, well-defined responsibility
  • Lazy loading pattern: Resources are only loaded when accessed (not at skill registration time), which is excellent for memory efficiency
  • Conversation-as-cache philosophy: Brilliant insight that persistent conversation context eliminates the need for a separate caching layer
  • Consistent with existing patterns: Follows the same registry pattern as ToolRegistry, making the codebase more learnable

2. Code Quality

  • TypeScript best practices:
    • No any types in production code (only in test mocks where appropriate)
    • Proper use of readonly for immutability
    • Type guards used correctly in validation
  • Error handling: Comprehensive validation with helpful error messages (e.g., skill not found shows available skills)
  • Resource safety: File extension whitelisting prevents loading arbitrary files
  • Case-insensitive lookups: Normalizes skill names (lowercase, trimmed) for better UX

3. Testing

  • Excellent coverage: 12 unit tests + 5 integration tests covering all critical paths
  • Test quality: Tests are focused, well-named, and use proper mocking
  • Real-world integration tests: Tests actual agent execution with skill loading, not just unit behavior

4. Documentation

  • Comprehensive docs: The 1,435-line docs/skills.md is thorough, well-organized, and includes:
    • Quick start guide
    • When to use skills (with decision framework)
    • Migration guide for both static→dynamic and inline→skills
    • Real examples from the codebase
    • API reference
    • Troubleshooting section
  • Inline documentation: JSDoc comments are clear and include examples
  • Migration path: Clear guidance for upgrading from the old static system

🔍 Potential Issues & Suggestions

1. Security Concern: Path Traversal (Medium Priority)

Issue: In skill.tool.ts:41-56, the code constructs paths using path.join() with user-controlled skill names:

const skillDir = path.relative(process.cwd(), skill.path);
const resourcePath = path.join(skillDir, 'reference', file);

While the skill name is validated to be hyphen-case, and the registry only contains pre-loaded skills, there's a theoretical risk if a malicious skill is somehow registered.

Recommendation:

  • Add explicit path validation to ensure the constructed path doesn't escape the skills directory
  • Example:
    import { resolve, relative } from 'node:path';
    
    const skillDir = resolve(process.cwd(), 'skills', skill.name);
    const resourcePath = resolve(skillDir, 'reference', file);
    
    // Ensure resourcePath is within skillDir
    const rel = relative(skillDir, resourcePath);
    if (rel.startsWith('..') || path.isAbsolute(rel)) {
      throw new Error('Invalid resource path');
    }

Current Risk: Low (since SkillRegistry pre-validates all skills at load time)

2. Type Safety: Missing Type Guard (Low Priority)

Issue: In skill.tool.ts:14-17, the validateArgs function doesn't actually validate the structure:

function validateArgs(args: unknown): args is SkillArgs {
  if (!args || typeof args !== 'object') return false;
  return true; // name is optional, we'll check it in execute
}

This type guard always returns true for any object, which defeats the purpose of a type guard.

Recommendation:

  • Either implement proper validation or remove the type guard and do inline checks
  • Current implementation is safe because execute() validates args.name, but it's misleading

Example fix:

function validateArgs(args: unknown): args is SkillArgs {
  return args !== null && typeof args === 'object';
}

3. Hardcoded Skill Hints (Low Priority)

Issue: In context-setup.middleware.ts:18-28, skill hints are hardcoded:

if (prompt.match(/danish|dansk|udbud|tender/i)) {
  hints.push('danish-tender-guidelines');
}

Observation: This is domain-specific to the udbud example and may not be relevant for other projects using the agent system.

Recommendation:

  • Consider making skill hints configurable (e.g., via agent frontmatter or builder config)
  • OR extract to a configuration file that can be customized per project
  • OR document that users should modify this middleware for their domains

Current Impact: Low (users can fork/modify the middleware, and hints are just suggestions)

4. Resource Limit Documentation (Very Low Priority)

Issue: Resource listings are limited to 3 items (skill.tool.ts:45, 52), but this isn't documented in the skill spec.

Recommendation: Document this limit in docs/skills.md so skill authors know that only the first 3 resources will be shown to agents.

5. Error Message Consistency

Observation: Some errors use "Error: " prefix, others don't:

  • Line 141: 'Error: Skill name is required.\n\nUsage: skill({name: "skill-name"})'
  • Line 149: 'Error: Skill name cannot be empty.'
  • Line 170: Failed to load skill: ${message} (no "Error:" prefix)

Recommendation: Standardize error message format for consistency.


🎯 Best Practices Adherence

CLAUDE.md Compliance

Follows all guidelines:

  • No any types in production code
  • No type assertions (as Type)
  • Uses type guards correctly (mostly - see issue Test Issue: GitHub App Integration #2)
  • DRY principle applied (skills eliminate duplication)
  • Clear separation of concerns
  • Comprehensive tests
  • Mermaid diagrams in docs would be nice but not required

Engineering Principles:

  • Complete implementation (no partial work)
  • Systematic problem-solving
  • No unnecessary abstractions (YAGNI)

Code Style

TypeScript best practices followed
No backward compatibility needed (MVP/POC)
Tests before merging (608/609 passing, 1 pre-existing failure)


📊 Performance Considerations

Positive

  • Lazy loading: Resources only loaded when accessed
  • Caching: Conversation history + prompt caching = 90% cost savings
  • No memory leaks: Skills cached per-conversation, cleaned up when conversation ends
  • Efficient registry: Map-based lookups are O(1)

Observations

  • Registry size: All skills loaded at startup (not on-demand). This is fine for small skill sets but could be optimized if you have hundreds of skills.
  • Resource caching: Per-skill resource cache grows unbounded within a session. This is acceptable for typical usage but worth monitoring.

🔐 Security Assessment

Strengths

  • ✅ File extension whitelisting prevents arbitrary code execution
  • ✅ Skill name validation (hyphen-case only)
  • ✅ Skills pre-loaded from trusted directories
  • ✅ No eval() or dynamic code execution

Recommendations

  1. Add path traversal protection (see issue Add Claude Code GitHub Workflow #1)
  2. Consider adding skill integrity verification (checksums) for production use
  3. Document that skills should be treated as trusted code (they inject into system prompts)

📈 Test Coverage Assessment

Unit Tests (12 tests in skill.test.ts)

  • ✅ Happy path (load existing skill)
  • ✅ Error cases (non-existent skill, empty input)
  • ✅ Edge cases (empty registry, many resources, name normalization)
  • ✅ Tool metadata validation
  • ✅ Resource formatting

Integration Tests (5 tests in skills-integration.test.ts)

  • ✅ End-to-end skill loading via agent
  • ✅ Skill tool returns formatted content
  • ✅ Resources are listed correctly
  • ✅ Error handling with helpful messages
  • ✅ Conversation caching behavior

Coverage Gaps (Minor)

  • ⚠️ No tests for Skill.getReference(), getScript(), getAsset() (lazy loading methods)
  • ⚠️ No tests for SkillRegistry.findByTags()
  • ⚠️ No tests for skill hints in context-setup middleware

Recommendation: Add unit tests for lazy loading methods to ensure cache behavior is correct.


🚀 Migration Impact

Breaking Changes

  • Well documented: Migration guide is clear and comprehensive
  • Backward compatibility: Old skills: frontmatter field is simply ignored (no crashes)
  • Migration examples: Real examples from udbud agents provided

Migration Complexity

  • Low: Agents just need to add skill tool and update instructions
  • Clear path: Step-by-step guide for both static→dynamic and inline→skills migrations

💡 Suggestions for Future Enhancements

  1. Skill versioning: Currently tracked in metadata but not enforced. Could add semantic versioning checks.
  2. Skill dependencies: Allow skills to reference other skills (e.g., danish-tender-guidelines depends on base-tender-knowledge)
  3. Skill hot-reloading: For development, watch skill files and reload on changes
  4. Skill marketplace: Package and share skills across projects
  5. Skill telemetry: Track which skills are most used, identify unused skills

Final Verdict

Recommendation: APPROVE with Minor Suggestions

This is high-quality work that:

  • ✅ Implements a valuable feature that eliminates code duplication
  • ✅ Follows repository conventions and best practices
  • ✅ Includes comprehensive tests and documentation
  • ✅ Has clear migration path
  • ✅ Minimal security concerns (easily addressed)

Before Merging

  1. Address security issue Add Claude Code GitHub Workflow #1 (path traversal protection) - 10 minutes
  2. Consider fixing type guard issue Test Issue: GitHub App Integration #2 - 2 minutes
  3. Run npm run build and npm run lint to ensure CI will pass
  4. Optional: Add tests for lazy loading methods

After Merging

  1. Update project documentation to highlight skills as a core feature
  2. Consider extracting skill hints to configuration
  3. Monitor performance with larger skill sets

🎖️ Kudos

Special recognition for:

  • Elegant architecture: Conversation-as-cache insight eliminates entire complexity layer
  • Thorough documentation: 1,435 lines of clear, well-organized docs
  • Real-world example: skills-demo example is production-quality
  • Test quality: Both unit and integration tests are well-designed

This PR demonstrates strong software engineering practices. Excellent work! 🚀


Files Reviewed: 29 changed files (+7,927 / -3,278 lines)
Test Status: 608/609 passing (1 pre-existing unrelated failure)
Breaking Changes: Yes (well documented with migration guide)
Security: Minor concern (easily fixed)
Code Quality: High
Documentation: Excellent

Security improvements based on code review feedback:

1. Path Traversal Protection (loader.ts):
   - Add path.resolve() + containment validation in loadResource()
   - Prevents malicious skills from reading files outside skill directory
   - Example blocked: reference/../../etc/passwd

2. Type Guard Fix (skill.tool.ts):
   - Improve validateArgs() to properly exclude arrays and null
   - Previous version accepted any object (including arrays)
   - Now properly validates plain objects only

Both fixes address medium/low severity issues identified in PR review.
All existing tests pass (608/609, 1 pre-existing failure unrelated).
@nielspeter
Copy link
Owner Author

✅ Security Fixes Applied

Based on review feedback, I've addressed both identified issues:

1. Path Traversal Protection (Medium Priority) ✅ FIXED

Issue: Resource loader could potentially read files outside skill directory if malicious skills contained ../ sequences in filenames.

Fix (commit 837a737):

// Before:
const fullPath = path.join(skillPath, resourcePath);

// After:
const fullPath = path.resolve(skillPath, resourcePath);
const normalizedSkillPath = path.resolve(skillPath);
if (!fullPath.startsWith(normalizedSkillPath + path.sep)) {
  throw new Error(`Path traversal detected: ${resourcePath} escapes skill directory`);
}

Protection: Now validates that resolved paths remain within the skill directory. Any escape attempt throws an error.


2. Type Guard Improvement (Low Priority) ✅ FIXED

Issue: validateArgs() accepted any object (including arrays), defeating the type guard's purpose.

Fix (commit 837a737):

// Before:
if (!args || typeof args !== 'object') return false;
return true; // ← Accepted arrays, Date, etc.

// After:
return args !== null && args !== undefined && typeof args === 'object' && !Array.isArray(args);

Improvement: Now properly excludes null, undefined, and arrays from the type guard.


Testing

✅ All 608 unit tests passing (1 pre-existing failure unrelated)
✅ Skills-demo runs successfully with security fixes
✅ TypeScript compilation clean
✅ ESLint passing

Ready for re-review!

@claude
Copy link

claude bot commented Oct 26, 2025

Pull Request Review: Dynamic Skills System

Overview

This is a well-designed and well-executed implementation of a dynamic skills system. The PR successfully replaces the static frontmatter-based approach with runtime skill loading, providing significant architectural improvements.

Overall Assessment:Approve with Minor Suggestions


Strengths

1. Excellent Architecture

  • Clean separation of concerns: Skill, SkillLoader, SkillRegistry pattern mirrors the existing tool architecture
  • Lazy loading: Resources loaded on-demand with proper caching (types.ts:102-104)
  • Security-conscious: Path traversal protection in loader (loader.ts:70-74)
  • Type-safe: Comprehensive Zod validation for frontmatter (validation.ts:23-38)

2. Outstanding Documentation 📚

  • The 1,435-line docs/skills.md is comprehensive and well-structured
  • Includes migration guide, examples, API reference, and troubleshooting
  • Clear decision framework for when to use skills vs agent logic
  • Real-world examples from the udbud domain

3. Robust Testing

  • 12 unit tests for skill tool (187 LOC)
  • 245 LOC of loader tests, 369 LOC of registry tests
  • 248 LOC of validation tests
  • Integration tests demonstrate real usage
  • 608/609 tests passing (1 pre-existing failure)

4. Smart Design Decisions 🎯

  • Conversation history as cache: Brilliant simplification - no separate cache mechanism needed
  • Skill hints system: Non-invasive keyword-based suggestions (context-setup.middleware.ts:8-47)
  • Case-insensitive lookup: User-friendly skill name handling
  • Resource type separation: Clear distinction between reference/, assets/, scripts/

Code Quality Issues

1. Type Safety Violations 🔴 HIGH PRIORITY

Issue: Use of any types undermines TypeScript's value proposition

Location: skill.test.ts:7-8

let skillTool: any;
let mockRegistry: any;

Fix: Use proper types

let skillTool: BaseTool;
let mockRegistry: jest.Mocked<Pick<SkillRegistry, 'getSkill' | 'listSkills'>>;

Why it matters: Per CLAUDE.md - "NO any types - Use unknown and type guards if type is truly unknown"


2. Security Considerations ⚠️ MEDIUM PRIORITY

Issue 1: Path traversal protection is good, but could be more robust

Location: loader.ts:70-74

if (!fullPath.startsWith(normalizedSkillPath + path.sep)) {
  throw new Error(`Path traversal detected: ${resourcePath} escapes skill directory`);
}

Suggestion: Consider using path.relative() check instead:

const relative = path.relative(normalizedSkillPath, fullPath);
if (relative.startsWith('..') || path.isAbsolute(relative)) {
  throw new Error(`Path traversal detected: ${resourcePath} escapes skill directory`);
}

Issue 2: File extension validation is permissive

Location: loader.ts:18-22

Consideration: Scripts directory allows .sh, .bash, etc. Ensure agents understand these are not auto-executed (documentation mentions this, which is good).


3. Error Handling ⚠️ MEDIUM PRIORITY

Issue: Silent failure in skill registry loading

Location: registry.ts:63-67

try {
  const skill = await this.loader.loadSkill(skillPath);
  this.registerSkill(skill);
} catch (error) {
  const message = error instanceof Error ? error.message : String(error);
  this.logger?.logSystemMessage(`Error loading skill from ${dirName}: ${message}`);
}

Concern: Invalid skills are logged but silently skipped. This could hide configuration errors.

Suggestion: Consider adding a strict mode or validation phase:

// Option 1: Track failed skills for diagnostics
private failedSkills = new Map<string, string>();

// Option 2: Throw in development/test mode
if (process.env.NODE_ENV === 'test') {
  throw error;
}

4. Performance Considerations ℹ️ LOW PRIORITY

Issue: Registry loads ALL skills at build time

Location: system-builder.ts:899 (shown in diff)

Current behavior: await skillRegistry.loadSkills() loads all skills immediately

Consideration: For systems with many skills, this could slow startup. However:

  • ✅ Skills are only parsed, resources are lazy-loaded
  • ✅ Most systems likely have <20 skills
  • ✅ Premature optimization is anti-pattern

Verdict: Current approach is acceptable. Monitor as skills scale.


5. Naming Inconsistency ℹ️ LOW PRIORITY

Issue: Method naming could be more consistent

Location: types.ts:136-152

listReferences(): string[]  // Returns array
getReference(filename: string): Promise<string | undefined>  // Returns single item

Observation: listReferences() suggests it's a query, while getReference() suggests retrieval. Naming is clear enough in context.

Alternative naming (optional):

  • getReferenceFilenames() / loadReference(filename)
  • availableReferences() / fetchReference(filename)

Not blocking, but worth considering for API clarity.


Potential Bugs

1. Case Sensitivity in listSkills() Return ℹ️

Location: registry.ts:133-135

listSkills(): Skill[] {
  return Array.from(this.skills.values());
}

Question: The tool description uses listSkills() to show available skills:

// skill.tool.ts:100
const skills = skillRegistry.listSkills();

Skills are stored with normalized (lowercase) names, but returned as-is. If a skill's frontmatter has name: "My-Skill" but directory is my-skill, validation prevents this. However, consider whether the return type should be:

listSkills(): Array<{name: string, description: string}>

This would decouple the API from internal Skill class changes.

Current approach is fine, but consider for future API stability.


Best Practices Adherence

✅ Follows CLAUDE.md Guidelines

  1. Type safety: Generally good (except test file any usage)
  2. DRY principle: Excellent - skills eliminate duplication
  3. YAGNI: No unnecessary abstractions
  4. Testing: Comprehensive unit and integration tests
  5. Documentation: Mermaid diagrams would enhance docs (mentioned in CLAUDE.md)
  6. Security: Built-in path traversal protection

✅ Architecture Alignment

  • Middleware pipeline integration is clean
  • Tool registry pattern reused for skills
  • Pull architecture preserved (skills loaded by agents, not pushed)

Test Coverage

Excellent coverage overall, but consider adding:

Missing Test Scenarios

  1. Concurrent skill loading: Is SkillRegistry.loadSkills() safe if called multiple times?
  2. Resource caching edge cases: What if two skills have resources with same name?
  3. Large file handling: What if a skill's instructions is 1MB+?
  4. Malformed YAML: Test with invalid frontmatter formats beyond Zod validation
  5. Symlink handling: What if skill directory is a symlink?

Breaking Changes

Well-Documented Migration

The PR clearly documents the breaking change (removal of static skills: frontmatter).

Migration impact:

  • ✅ Migration guide provided in PR description
  • ✅ All example agents migrated
  • ✅ Old approach completely removed (clean break)

Suggestion: Consider a deprecation warning in a hypothetical transition period, but since this is an MVP (per CLAUDE.md), clean breaks are acceptable.


Performance Analysis

Skill Loading Performance

Question: How does this perform with 100+ skills?

Current behavior:

  1. Build time: Scans directory, loads all SKILL.md files
  2. Runtime: No loading, just registry lookup (O(1) with Map)
  3. Resource loading: Lazy, only when agent reads resource files

Optimization opportunities (future):

  1. Parallel skill loading during build (currently sequential)
  2. Skill preloading based on agent definition analysis
  3. Skill unloading after conversation ends (currently kept in registry)

Verdict: Current performance is likely fine for MVP scale.


Security Review

✅ Good Security Practices

  1. Path traversal protection (loader.ts:70-74)
  2. File extension whitelist (loader.ts:18-22)
  3. Zod validation prevents injection via frontmatter
  4. No arbitrary code execution (scripts are not auto-run)

⚠️ Additional Considerations

  1. Resource size limits: Should skill resources have a maximum size?

    const MAX_RESOURCE_SIZE = 10 * 1024 * 1024; // 10MB
    const content = await fs.readFile(fullPath, 'utf-8');
    if (content.length > MAX_RESOURCE_SIZE) {
      throw new Error(`Resource too large: ${filename}`);
    }
  2. Skill directory permissions: Should validate that skill directories are not world-writable?

  3. SKILL.md content sanitization: While Markdown is generally safe, consider if skills can include HTML or scripts (likely not an issue with current design).


Documentation Review

Excellent Documentation

docs/skills.md is production-quality:

  • Clear table of contents
  • Decision frameworks ("When to Use Skills")
  • Real examples with before/after comparisons
  • Migration guide
  • API reference
  • Troubleshooting section

Suggestions for Enhancement

  1. Add Mermaid diagrams (per CLAUDE.md best practices):

    sequenceDiagram
      Agent->>SkillTool: skill({name: "my-skill"})
      SkillTool->>SkillRegistry: getSkill("my-skill")
      SkillRegistry-->>SkillTool: Skill object
      SkillTool->>SkillLoader: Format content
      SkillTool-->>Agent: Formatted instructions + resource paths
      Agent->>ReadTool: Read resource file
    
    Loading
  2. Add performance characteristics section: Document O(1) lookup, lazy loading behavior

  3. Add security best practices: Guide on writing safe skills (e.g., don't include secrets in skills)


Integration Test Observations

Location: tests/integration/skills-system/skills-integration.test.ts

Concern: Integration tests require ANTHROPIC_API_KEY and make real API calls. This is expensive and slow.

Suggestion: Consider adding mock LLM provider for integration tests:

class MockLLMProvider implements LLMProvider {
  async callLLM(messages, tools) {
    // Return canned responses for skill tool calls
    if (/* detect skill tool request */) {
      return { tool_use: [{name: 'skill', input: {name: 'test-skill'}}] };
    }
  }
}

This would allow faster, cheaper integration testing while preserving test value.


Breaking Change Analysis

Impact Assessment

Before: Static skill loading via frontmatter

skills: [my-skill]

After: Dynamic skill loading via tool

skill({name: "my-skill"})

Migration effort:

  • Simple find-replace in agent files
  • Requires adding skill loading instructions to agents
  • All examples migrated successfully

Risk: Low - breaking change is well-documented and migration path is clear


Recommendations

Must Fix Before Merge 🔴

  1. Replace any types in tests (skill.test.ts:7-8)
    • Use proper TypeScript types
    • Follows CLAUDE.md guidelines

Should Fix Before Merge ⚠️

  1. Improve error reporting for failed skill loads

    • Consider collecting failed skills for diagnostics
    • Help users debug configuration issues
  2. Add resource size limits

    • Prevent OOM from maliciously large skill files
    • Suggested limit: 10MB per resource

Nice to Have (Future) ℹ️

  1. Add Mermaid diagrams to documentation
  2. Add mock LLM provider for faster integration tests
  3. Consider parallel skill loading optimization
  4. Add skill metrics (load time, usage frequency)

Final Verdict

Recommendation: APPROVE

This PR represents high-quality engineering:

  • Solid architecture with clear design rationale
  • Comprehensive testing (608/609 passing)
  • Excellent documentation
  • Successful migration of all examples
  • Security-conscious implementation

The identified issues are minor and mostly around improving robustness for edge cases. The core implementation is sound.

Before Merge Checklist

  • Fix any types in test file
  • Run npm run build to ensure TypeScript compiles
  • Run npm run lint to ensure ESLint passes
  • Consider adding resource size limits
  • Verify all agents work with dynamic skill loading

Great work on this implementation! The skills system will significantly improve code reusability and maintainability.


Statistics

  • Files changed: 75
  • Lines added: +7,934
  • Lines deleted: -3,278
  • Net change: +4,656 lines
  • Test coverage: 608/609 tests passing
  • Documentation: 1,435 lines of comprehensive docs

Code-to-test ratio: Approximately 1:1 (excellent)
Documentation-to-code ratio: Very high (excellent)

…pe guard

Added comprehensive tests for security fixes:

Path Traversal Protection (3 tests):
- Verifies non-listed files return undefined (safe behavior)
- Tests direct loadResource call rejects path traversal attempts
- Confirms legitimate paths work correctly

Type Guard Validation (7 tests):
- Rejects null, undefined, arrays, strings, numbers
- Accepts empty objects (fails later with missing name)
- Accepts valid objects with name property

All 40 tests passing. ESLint clean.
@nielspeter
Copy link
Owner Author

✅ Security Tests Added

Added comprehensive test coverage for both security fixes (commit 1636c06):

Path Traversal Protection Tests (3 tests)

  1. Safe handling of traversal attempts: Verifies files not in scanned list return undefined (prevents traversal from reaching filesystem)
  2. Direct loadResource validation: Tests the underlying path validation logic by directly calling the private loadResource function
  3. Legitimate paths work: Confirms normal file access still functions correctly

Type Guard Validation Tests (7 tests)

Tests that validateArgs() properly rejects:

  • null arguments
  • undefined arguments
  • Arrays (['test-skill'])
  • Strings ('test-skill')
  • Numbers (123)

And accepts:

  • Empty objects (fails later with "name required")
  • Valid objects with name property

Test Results

All 40 tests passing
ESLint clean
TypeScript compiles

Coverage added: +10 tests total (+90 lines of test code)

@claude
Copy link

claude bot commented Oct 26, 2025

PR Review: Dynamic Skills System Implementation

Overall Assessment

✅ APPROVE with minor suggestions

This is an excellent implementation of a dynamic skills system that represents a significant architectural improvement. The design is clean, well-tested, and follows the repository's coding standards. The "conversation history as cache" approach is elegant and eliminates unnecessary complexity.


Code Quality ⭐⭐⭐⭐⭐

Strengths

1. Excellent Architecture & Design

  • Clean separation of concerns: Registry, Loader, Validation, and Tool components are properly isolated
  • Lazy loading pattern: Resources loaded on-demand via ResourceLoader function with proper caching
  • Conversation-as-cache: Brilliant insight that eliminates need for separate caching mechanism
  • Type safety: Proper TypeScript types with no any or type assertions (follows CLAUDE.md guidelines ✅)

2. Security & Validation

  • Path traversal protection (loader.ts:70-74): Properly validates resource paths stay within skill directory
  • Input validation: Zod schemas with clear error messages
  • File extension whitelisting: Only allowed file types can be loaded as resources
  • Name validation: Enforces hyphen-case naming and directory/frontmatter matching

3. Testing Coverage

  • Unit tests: 12 tests for skill tool, comprehensive edge case coverage
  • Integration tests: 5 tests demonstrating real-world usage including conversation caching
  • Test results: 608/609 passing (excellent coverage)

4. Documentation

  • Comprehensive: 1,435 lines of documentation in docs/skills.md
  • Well-organized: Clear TOC, quick start, examples, troubleshooting
  • Migration guide: Helps users transition from static to dynamic skills

5. Follows Repository Standards

  • ✅ No type assertions (as Type)
  • ✅ No any types
  • ✅ Proper type guards
  • ✅ Error handling with clear messages
  • ✅ Consistent with existing patterns

Performance Considerations ⚡

Excellent Design Choices

  1. Lazy Loading: Resources not loaded until explicitly requested
  2. Caching: Once loaded, resources cached in Map<string, string>
  3. Conversation History: Skills loaded once per conversation, reused across turns
  4. Prompt Caching: Leverages Anthropic's prompt caching for 90% cost savings

Security Analysis 🔒

Well-Protected

Path traversal protection: Proper validation in loader.ts:67-77
Input validation: Zod schemas prevent malformed frontmatter
File type restrictions: Whitelisted extensions prevent arbitrary file execution
Case-insensitive lookup: Prevents case-sensitivity bypass attempts

Minor Suggestion: Resource Size Limits

Consider adding a safety check for resource file sizes to prevent potential memory issues with very large files.


Potential Issues & Suggestions

1. Type Annotation Inconsistency

File: skill.tool.ts:8

interface SkillArgs {
  name?: string;
  [key: string]: unknown;  // ⚠️ Allows arbitrary properties
}

Suggestion: Make it strict since extra properties are never used:

interface SkillArgs {
  name?: string;
}

2. Error Message Enhancement

File: skill.tool.ts:149

error: 'Error: Skill name cannot be empty.',

Suggestion: Include usage hint for consistency:

error: 'Error: Skill name cannot be empty.\n\nUsage: skill({name: "skill-name"})',

3. Resource Listing Limits Documentation

Files:

  • skill.tool.ts:45-56 limits resources to 3 in output
  • skill.tool.ts:72-77 limits skill list to 10 in error

Suggestion: Document these limits in docs/skills.md to set user expectations.


4. Skill Registry Overwrite Behavior

File: registry.ts:85-92

Currently logs a warning when overwriting skills. Consider if this should throw an error instead to prevent accidental overwrites, or add a force parameter for intentional overwrites.

Priority: Low (warning is acceptable for MVP)


Documentation Suggestions 📚

Minor Enhancements

  1. Add architecture diagram using Mermaid to visualize the skill loading flow
  2. Clarify skill versioning: The metadata.version field is mentioned but versioning strategy could be explained
  3. Document resource limits: Mention the display limits for resources and skills

Best Practices Alignment ✅

Follows CLAUDE.md Guidelines

  • No type assertions: All type checking uses proper guards
  • No any types: Uses unknown with validation
  • DRY principle: Skills eliminate duplicated knowledge
  • YAGNI: Simple implementation, no premature optimization
  • Security: Built-in protections
  • Testing: Good coverage

Consistent with Existing Patterns

  • Registry pattern matches ToolRegistry
  • Loader pattern matches AgentLoader
  • Validation uses Zod (consistent with agent validation)
  • Middleware integration follows existing patterns

Specific File Reviews

packages/core/src/tools/skill.tool.ts ⭐⭐⭐⭐⭐

Excellent implementation. Clean, well-documented, proper error handling.

packages/core/src/skills/registry.ts ⭐⭐⭐⭐⭐

Very clean. Good separation of concerns, proper error handling.

packages/core/src/skills/loader.ts ⭐⭐⭐⭐⭐

Excellent security practices. Path traversal protection, file extension validation, lazy loading.

packages/core/src/skills/types.ts ⭐⭐⭐⭐⭐

Well-designed. Proper use of readonly fields, good JSDoc comments.

packages/core/src/skills/validation.ts ⭐⭐⭐⭐⭐

Perfect. Clear validation with helpful error messages.

docs/skills.md ⭐⭐⭐⭐

Very comprehensive. Well-structured with good examples.


Summary & Recommendations

Strong Approvals ✅

  1. Architecture: The dynamic loading approach is superior to static skills
  2. Code quality: Follows all repository standards
  3. Security: Proper input validation and path protection
  4. Testing: Good coverage with meaningful tests (608/609 passing)
  5. Documentation: Comprehensive and well-organized

Suggested Improvements (Non-Blocking)

All suggestions above are minor and can be addressed in follow-up PRs if desired:

  1. Stricten SkillArgs interface
  2. Add resource size limits for safety
  3. Enhance error messages
  4. Document resource display limits
  5. Add architecture diagrams

Final Verdict

✅ APPROVE

This PR represents a significant improvement to the codebase. The implementation is clean, well-tested, secure, and follows best practices. The few suggestions above are minor enhancements and don't block merging.

Congratulations on an excellent implementation! 🎉

The "conversation history as cache" insight is particularly clever and eliminates a whole class of complexity. The security measures (path traversal protection, file extension whitelisting) show good defensive programming practices.


Stats Summary:

  • 75 files changed
  • +8,024/-3,278 lines
  • 608/609 tests passing
  • Comprehensive documentation (1,435 lines)
  • Breaking changes well-documented with migration guide

- Copy 3 skills from udbud into skills-demo/skills/
  - architecture-analyzer (591 LOC)
  - complexity-calculator (414 LOC)
  - danish-tender-guidelines (228 LOC + 5 resources)

- Update skills-demo.ts to load from local ./skills directory
  instead of ../udbud/skills

Benefits:
- Demo is now self-contained and portable
- Works independently of udbud example
- Users can modify demo skills without affecting udbud
- Clearer that skills are part of the demo package
@claude
Copy link

claude bot commented Oct 26, 2025

Pull Request Review: Dynamic Skills System

Overall Assessment

Verdict: APPROVE with minor suggestions

This is a well-architected, production-ready implementation of a dynamic skills system. The code quality is excellent, follows TypeScript best practices from CLAUDE.md, and the migration from static to dynamic loading is well-executed.


Code Quality & Best Practices

Excellent TypeScript Practices

  • NO type assertions - Uses proper type guards throughout
  • NO any types - Proper typing with type guards for unknown inputs
  • Clean separation of concerns - Registry, Loader, Tool, and Validation are distinct
  • Proper error handling - Comprehensive validation with helpful error messages
  • Consistent naming - hyphen-case validation

Architecture Strengths

  1. Conversation History as Cache - Brilliant approach! No separate caching mechanism needed
  2. Lazy Loading - Resources loaded on-demand, optimizes memory
  3. Registry Pattern - Consistent with ToolRegistry, clean API
  4. Middleware Integration - Skill hints are non-invasive

Code Organization

  • Well-sized files (all under 300 lines)
  • Single responsibility principle throughout
  • Clear interfaces and type definitions

Security Concerns

No Critical Issues Found

The security model is sound:

  1. File Access - Skills loaded from configured directories only
  2. Input Validation - SkillFrontmatterSchema prevents malformed skills
  3. Name Validation - Regex prevents path traversal
  4. Error Handling - No information leakage
  5. No Code Execution - Skills are markdown/data only

Minor Suggestions

  • Consider path sanitization checks
  • Consider max file size limits for resources

Test Coverage

Comprehensive Testing

  • skill.test.ts: 12 tests covering tool functionality (187 LOC)
  • Input validation (null, undefined, arrays, strings, numbers)
  • Skill loading and formatting
  • Error handling and helpful messages
  • Resource listing with limits

Suggestions

  1. Add tests for SkillRegistry (concurrent loading, tag search)
  2. Integration test for skill hints
  3. Performance test for large directories

Performance

Well-Optimized

Strengths:

  1. Lazy loading - Resources loaded only when accessed
  2. Caching - Prevents redundant I/O
  3. Conversation caching - 90% cost savings
  4. Resource listings capped at 3 items

Suggestions

  1. Consider parallel loading for large skill directories
  2. Consider memoization for skill listings
  3. Consider LRU cache for long-running agents

Documentation

Exceptional

The docs/skills.md is outstanding:

  • 1,435 lines of comprehensive documentation
  • Clear examples and migration guides
  • Best practices and troubleshooting
  • Real-world examples

Migration Impact

Clean Breaking Change

  • 786 net lines of code change
  • 228 lines of shared knowledge (down from 240 duplicated)
  • Cache efficiency: 246,120%!

Test Results

From PR description:

  • 608/609 tests passing
  • 1 pre-existing unrelated failure
  • TypeScript compilation clean
  • ESLint passing

Final Verdict

APPROVE

This is high-quality, production-ready code that:

  • Follows all guidelines from CLAUDE.md
  • Has excellent test coverage
  • Includes comprehensive documentation
  • Implements a clean, elegant architecture
  • Handles errors gracefully
  • Provides clear migration path

The dynamic skills system is a significant improvement. The conversation history as cache approach is particularly clever.

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants