-
Notifications
You must be signed in to change notification settings - Fork 0
feat(skills): implement dynamic skills system with demo example #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implemented comprehensive skills system following PRD Phase 1-3: - Skills are reusable domain knowledge modules loaded from skills/ directory - Agents declare skills in frontmatter, loaded during instantiation - Skills injected into system prompt before agent instructions - Convention over configuration: default skills/ directory (like agents/) Core Implementation (Phase 1): - SkillRegistry: Central registry with lazy loading and resource management - SkillLoader: Loads skills from SKILL.md files with Zod validation - Skill types: name, description, instructions, metadata, resources - Resource loading: assets/, reference/, scripts/ directories - Anthropic Agent Skills Spec v1.0 compliance System Integration (Phase 2): - AgentSystemBuilder.withSkillsFrom() for configuration - AgentLoader loads skills specified in agent frontmatter - ContextSetupMiddleware injects skills into system prompt - Skills metadata logged in agent_start events for reproducibility - Full type safety with TypeScript interfaces Migration & Validation (Phase 3): - Created danish-tender-guidelines skill (marker system, formats, templates) - Created complexity-calculator skill (estimation formulas, industry standards) - Updated udbud agents to use skills (technical-analyst, go-no-go-analyzer) - All 597 unit tests passing, build successful Key Features: - Lazy resource loading: Skills loaded on first access, resources cached - Graceful degradation: Missing skills logged as warnings, don't block execution - Convention-based: skills/ directory auto-discovered like agents/ - Pull architecture: Child agents load their own skills, minimal inheritance - Session logging: Skills captured in JSONL for reproducibility Files Added: - packages/core/src/skills/ (5 files: loader, registry, types, validation, index) - packages/core/tests/unit/skills/ (3 test suites: 67 tests) - packages/core/tests/integration/skills-system/ (integration test) - packages/core/tests/test-fixtures/skills/ (test fixtures) - packages/examples/udbud/skills/ (2 skills with 7 reference/template files) Files Modified: - Config: system-builder.ts, types.ts (skills config, builder methods) - Agents: loader.ts, executor.ts, validation.ts (skill loading logic) - Middleware: context-setup, agent-loader (skill injection, logging) - Logging: types.ts, all loggers (skills metadata in agent_start) - Tests: system-builder.test.ts (5 new skills tests) - Examples: udbud agents + udbud.ts (skills configuration) Breaking Changes: None - backward compatible, skills are optional Implementation follows PRD design decisions: - Resource lazy loading for efficiency - File collision warnings (not errors) for flexibility - Quick wins: file filtering (.git, node_modules), informative logs - Deferred: version constraints, hot reload (Phase 4)
…s core feature Updated plan to reflect decision to implement full Claude Code-style auto-triggering from the start, not as optional feature. Changes: - Auto-triggering now core feature (not optional) - Updated effort estimate: 24-32 hours total - Updated target architecture to show auto-trigger flow - Confirmed complete implementation approach - Updated timeline: 3-4 weeks part-time - Branch name: feature/skills-dynamic-auto-trigger Implementation includes all phases: - Phase 1-3: Core dynamic loading (12-16 hours) - Phase 4: Auto-triggering (12-16 hours) - Phase 5: Testing/validation (3-4 hours) Decision rationale: Not in production, can implement complete solution with best UX (auto-trigger + manual fallback).
…plosion Fixed critical issues preventing udbud tender analysis from running: **Extended Thinking Support**: - Added thinking capabilities to all Claude 4.x models in providers-config.json - Implemented streaming for thinking-enabled requests to avoid 10-minute timeout - All Claude 4.x models (Opus 4.1, Sonnet 4.5, Haiku 4.5) now support extended thinking **Binary File Protection**: - Added strict document handling rules to all udbud agents - Prevents reading .docx, .pdf, .xlsx files that consume 100K+ tokens - Enforces mandatory workflow: check → convert → read markdown - Prevents "prompt is too long" errors (>200K tokens) **Configuration Consolidation**: - Deleted duplicate packages/examples/providers-config.json - Consolidated to single repo root providers-config.json - Fixed .env path resolution in udbud.ts to use repo root **Missing Skill**: - Created architecture-analyzer/SKILL.md (591 lines) - Includes architecture patterns, quality assessment, integration analysis - Follows Anthropic Agent Skills Spec v1.0 **Testing**: - udbud example now runs successfully with extended thinking - Cache efficiency showing 200,000%+ savings - Proper agent delegation workflow functioning - Skills auto-loading correctly Closes issues with binary file token explosion and extended thinking timeouts.
…ompliance markers Phase 3 Migration Complete: - Migrated compliance-checker agent to use danish-tender-guidelines skill - Added compliance-specific markers to skill ([SKAL], [BØR], [KAN], [UKLAR], [KONFLIKT]) - Removed duplicate knowledge from agent (file locations, marker definitions) - Agent now references skill for Danish tender standards Results: - 4 agents now using danish-tender-guidelines skill - Skills system fully operational and tested - All 597 tests passing - Build and lint clean Skills enable: - Single source of truth for Danish tender knowledge - Consistent marker system across all agents - Easier maintenance (update skill vs 4+ agents) - Better separation of domain knowledge from agent logic
Created docs/skills.md as single comprehensive guide for skills system: Documentation includes: - Quick start (2-minute guide) - Conceptual overview (what are skills, when to use) - Creating skills (SKILL.md format, resources) - Using skills in agents (frontmatter, system builder) - Migration guide with real examples - 3 working examples from codebase - Best practices (8 detailed guidelines) - API reference (TypeScript interfaces) - Troubleshooting (10 common issues) Skills implementation now complete: - Phase 1: Core implementation ✅ - Phase 2: Testing ✅ - Phase 3: Migration & validation ✅ - Phase 4: User documentation ✅ Coverage: 97.95% (67 unit tests, 4 integration tests) All 597 tests passing, build clean, lint clean
Comprehensive evaluation of static (current) vs dynamic (proposed) skills: - Architecture comparison - Code changes required (15 files, 24-32 hours) - Token cost analysis (3 scenarios) - Trade-offs and decision matrix - Recommendation: Keep static for now, evaluate dynamic separately Decision: Implement dynamic on separate branch to compare approaches
Two approaches documented: 1. DYNAMIC-SKILLS-IMPLEMENTATION.md - Full dual-mechanism (auto-trigger + manual) 2. DYNAMIC-SKILLS-CLAUDE-STYLE.md - Simplified Claude Code style (manual only) Decision: Implement Claude Code style (manual only) - Simpler architecture (16 hours vs 28 hours) - Agent autonomy - agents decide what knowledge they need - Explicit skill() tool calls visible in conversation - Matches Anthropic's proven pattern exactly - No pattern matching complexity Ready to begin implementation with SkillCache as foundation
Removed 168K of outdated planning and analysis documents: - docs/skills/ directory (4 old planning docs from October) - docs/skills-dynamic-implementation-plan.md (outdated plan) - SKILLS-MIGRATION-ANALYSIS.md (analysis completed) - DYNAMIC-SKILLS-IMPLEMENTATION.md (rejected dual-mechanism approach) Keeping: - docs/skills.md - Comprehensive user guide - DYNAMIC-SKILLS-CLAUDE-STYLE.md - Current implementation plan (Claude Code style) Ready to proceed with simplified manual-only skills implementation
Key decisions: - NO CACHE in MVP - eliminates all context/state complexity - Simple SkillTool that just loads and returns skills - Add hints to help agents remember to load skills - Test with one agent before full migration - 12 hours instead of 24-28 hours Approach: 1. Phase 1: Basic working tool (4 hours) 2. Phase 2: Safety hints (2 hours) 3. Phase 3: Remove static loading (3 hours) 4. Phase 4: Test with technical-analyst (2 hours) 5. Phase 5: Migrate remaining agents (3 hours) Why this works: - Ships in 2 days, not 2 weeks - No architectural blockers - Real usage data before optimization - Can add caching later if needed - Matches Claude Code manual loading pattern Removed obsolete plans - keeping only the simplified approach
Key improvements from review: - Added simple time-based cache (5 min TTL, no context needed) - Fixed resource paths (use full relative paths) - Strengthened hint system (REQUIRED vs RECOMMENDED) - Added comprehensive error handling - Updated timeline to realistic 20-24 hours - Added performance metrics tracking - Included rollback strategy - Complete test coverage plan Implementation approach: 1. SkillTool with time-based cache (6h) 2. Strong hint system (2h) 3. Remove static loading (4h) 4. Test with technical-analyst (3h) 5. Migrate remaining agents (5h) The time-based cache solves redundant loads without any conversation context complexity - best of both worlds. Ready to begin implementation with Phase 1
- Add dynamic skill loading via skill tool (replaces static frontmatter)
- Include skill tool in .default() builtin tools (no manual enablement needed)
- Implement conversation-history-as-cache (no separate cache mechanism)
- Add skills-demo example using real udbud skills (tender analysis workflow)
- Add skill hints system (suggests skills based on prompt keywords)
- Update all documentation for dynamic skills
- Migrate integration tests to test dynamic loading
- Remove obsolete planning docs (DYNAMIC-SKILLS-FINAL.md)
The skill tool loads domain expertise on-demand at runtime. Skills cached
in conversation history via Anthropic prompt caching (90% cost savings).
Breaking change: Static 'skills:' frontmatter removed from agents.
Migration: Use skill({name: "skill-name"}) tool call instead.
Code Review: Dynamic Skills SystemI've thoroughly reviewed this PR and I'm impressed with the implementation quality. This is a well-architected feature that follows the repository's conventions and best practices. ✅ Strengths1. Architecture & Design
2. Code Quality
3. Testing
4. Documentation
🔍 Potential Issues & Suggestions1. Security Concern: Path Traversal (Medium Priority)Issue: In const skillDir = path.relative(process.cwd(), skill.path);
const resourcePath = path.join(skillDir, 'reference', file);While the skill name is validated to be hyphen-case, and the registry only contains pre-loaded skills, there's a theoretical risk if a malicious skill is somehow registered. Recommendation:
Current Risk: Low (since SkillRegistry pre-validates all skills at load time) 2. Type Safety: Missing Type Guard (Low Priority)Issue: In function validateArgs(args: unknown): args is SkillArgs {
if (!args || typeof args !== 'object') return false;
return true; // name is optional, we'll check it in execute
}This type guard always returns Recommendation:
Example fix: function validateArgs(args: unknown): args is SkillArgs {
return args !== null && typeof args === 'object';
}3. Hardcoded Skill Hints (Low Priority)Issue: In if (prompt.match(/danish|dansk|udbud|tender/i)) {
hints.push('danish-tender-guidelines');
}Observation: This is domain-specific to the udbud example and may not be relevant for other projects using the agent system. Recommendation:
Current Impact: Low (users can fork/modify the middleware, and hints are just suggestions) 4. Resource Limit Documentation (Very Low Priority)Issue: Resource listings are limited to 3 items ( Recommendation: Document this limit in 5. Error Message ConsistencyObservation: Some errors use "Error: " prefix, others don't:
Recommendation: Standardize error message format for consistency. 🎯 Best Practices AdherenceCLAUDE.md Compliance✅ Follows all guidelines:
✅ Engineering Principles:
Code Style✅ TypeScript best practices followed 📊 Performance ConsiderationsPositive
Observations
🔐 Security AssessmentStrengths
Recommendations
📈 Test Coverage AssessmentUnit Tests (12 tests in
|
Security improvements based on code review feedback: 1. Path Traversal Protection (loader.ts): - Add path.resolve() + containment validation in loadResource() - Prevents malicious skills from reading files outside skill directory - Example blocked: reference/../../etc/passwd 2. Type Guard Fix (skill.tool.ts): - Improve validateArgs() to properly exclude arrays and null - Previous version accepted any object (including arrays) - Now properly validates plain objects only Both fixes address medium/low severity issues identified in PR review. All existing tests pass (608/609, 1 pre-existing failure unrelated).
✅ Security Fixes AppliedBased on review feedback, I've addressed both identified issues: 1. Path Traversal Protection (Medium Priority) ✅ FIXEDIssue: Resource loader could potentially read files outside skill directory if malicious skills contained Fix (commit 837a737): // Before:
const fullPath = path.join(skillPath, resourcePath);
// After:
const fullPath = path.resolve(skillPath, resourcePath);
const normalizedSkillPath = path.resolve(skillPath);
if (!fullPath.startsWith(normalizedSkillPath + path.sep)) {
throw new Error(`Path traversal detected: ${resourcePath} escapes skill directory`);
}Protection: Now validates that resolved paths remain within the skill directory. Any escape attempt throws an error. 2. Type Guard Improvement (Low Priority) ✅ FIXEDIssue: Fix (commit 837a737): // Before:
if (!args || typeof args !== 'object') return false;
return true; // ← Accepted arrays, Date, etc.
// After:
return args !== null && args !== undefined && typeof args === 'object' && !Array.isArray(args);Improvement: Now properly excludes Testing✅ All 608 unit tests passing (1 pre-existing failure unrelated) Ready for re-review! |
Pull Request Review: Dynamic Skills SystemOverviewThis is a well-designed and well-executed implementation of a dynamic skills system. The PR successfully replaces the static frontmatter-based approach with runtime skill loading, providing significant architectural improvements. Overall Assessment: ✅ Approve with Minor Suggestions Strengths1. Excellent Architecture ✨
2. Outstanding Documentation 📚
3. Robust Testing ✅
4. Smart Design Decisions 🎯
Code Quality Issues1. Type Safety Violations 🔴 HIGH PRIORITYIssue: Use of Location: let skillTool: any;
let mockRegistry: any;Fix: Use proper types let skillTool: BaseTool;
let mockRegistry: jest.Mocked<Pick<SkillRegistry, 'getSkill' | 'listSkills'>>;Why it matters: Per CLAUDE.md - "NO 2. Security Considerations
|
…pe guard Added comprehensive tests for security fixes: Path Traversal Protection (3 tests): - Verifies non-listed files return undefined (safe behavior) - Tests direct loadResource call rejects path traversal attempts - Confirms legitimate paths work correctly Type Guard Validation (7 tests): - Rejects null, undefined, arrays, strings, numbers - Accepts empty objects (fails later with missing name) - Accepts valid objects with name property All 40 tests passing. ESLint clean.
✅ Security Tests AddedAdded comprehensive test coverage for both security fixes (commit 1636c06): Path Traversal Protection Tests (3 tests)
Type Guard Validation Tests (7 tests)Tests that
And accepts:
Test Results✅ All 40 tests passing Coverage added: +10 tests total (+90 lines of test code) |
PR Review: Dynamic Skills System ImplementationOverall Assessment✅ APPROVE with minor suggestions This is an excellent implementation of a dynamic skills system that represents a significant architectural improvement. The design is clean, well-tested, and follows the repository's coding standards. The "conversation history as cache" approach is elegant and eliminates unnecessary complexity. Code Quality ⭐⭐⭐⭐⭐Strengths1. Excellent Architecture & Design
2. Security & Validation
3. Testing Coverage
4. Documentation
5. Follows Repository Standards
Performance Considerations ⚡Excellent Design Choices
Security Analysis 🔒Well-Protected✅ Path traversal protection: Proper validation in Minor Suggestion: Resource Size LimitsConsider adding a safety check for resource file sizes to prevent potential memory issues with very large files. Potential Issues & Suggestions1. Type Annotation InconsistencyFile: interface SkillArgs {
name?: string;
[key: string]: unknown; // ⚠️ Allows arbitrary properties
}Suggestion: Make it strict since extra properties are never used: interface SkillArgs {
name?: string;
}2. Error Message EnhancementFile: error: 'Error: Skill name cannot be empty.',Suggestion: Include usage hint for consistency: error: 'Error: Skill name cannot be empty.\n\nUsage: skill({name: "skill-name"})',3. Resource Listing Limits DocumentationFiles:
Suggestion: Document these limits in 4. Skill Registry Overwrite BehaviorFile: Currently logs a warning when overwriting skills. Consider if this should throw an error instead to prevent accidental overwrites, or add a Priority: Low (warning is acceptable for MVP) Documentation Suggestions 📚Minor Enhancements
Best Practices Alignment ✅Follows CLAUDE.md Guidelines
Consistent with Existing Patterns
Specific File Reviews
|
- Copy 3 skills from udbud into skills-demo/skills/ - architecture-analyzer (591 LOC) - complexity-calculator (414 LOC) - danish-tender-guidelines (228 LOC + 5 resources) - Update skills-demo.ts to load from local ./skills directory instead of ../udbud/skills Benefits: - Demo is now self-contained and portable - Works independently of udbud example - Users can modify demo skills without affecting udbud - Clearer that skills are part of the demo package
Pull Request Review: Dynamic Skills SystemOverall AssessmentVerdict: APPROVE with minor suggestions This is a well-architected, production-ready implementation of a dynamic skills system. The code quality is excellent, follows TypeScript best practices from CLAUDE.md, and the migration from static to dynamic loading is well-executed. Code Quality & Best PracticesExcellent TypeScript Practices
Architecture Strengths
Code Organization
Security ConcernsNo Critical Issues FoundThe security model is sound:
Minor Suggestions
Test CoverageComprehensive Testing
Suggestions
PerformanceWell-OptimizedStrengths:
Suggestions
DocumentationExceptionalThe docs/skills.md is outstanding:
Migration ImpactClean Breaking Change
Test ResultsFrom PR description:
Final VerdictAPPROVE This is high-quality, production-ready code that:
The dynamic skills system is a significant improvement. The conversation history as cache approach is particularly clever. Great work! |
Summary
This PR implements a dynamic skills system that loads domain expertise on-demand at runtime, replacing the previous static frontmatter-based approach.
Key Changes
🎯 Core Implementation
skill.tool.ts): New tool that loads skills at runtime viaskill({name: "skill-name"})calls.default()builtin tools (no manual setup required)📚 Documentation & Examples
docs/skills.mdupdated for dynamic skills systempackages/examples/skills-demo/) using real udbud skills:✅ Testing
🔄 Migration
Breaking Changes
Static
skills:frontmatter removed from agent definitions.Before:
After:
Migration Guide
skills:array from agent frontmatterskillto agent'stoolslist (if not using.default())Benefits
✅ Load only what you need - Skills loaded on-demand, not at build-time
✅ Zero configuration - Works out of the box with
.default()✅ Efficient caching - 90% cost savings via Anthropic prompt caching
✅ Simpler architecture - No separate cache mechanism needed
✅ Easy to extend - Just add markdown files to skills directory
Statistics
Demo Output
Cache efficiency demonstrates conversation-history-as-cache:
Checklist