Skip to content

Conversation

@alkispoly-db
Copy link
Collaborator

Summary

  • Add 4 new reference documentation files with content curated from PR Add MLflow agent-evaluation skill #2
  • Enhance 3 existing reference files with advanced patterns
  • Update SKILL.md with organized references section

New Files Added

  • GOTCHAS.md: 15+ common mistakes that cause evaluation failures
  • CRITICAL-interfaces.md: Implementation details not in official docs (data schema, return types, filter syntax)
  • patterns-context-optimization.md: Token/latency optimization strategies for agents
  • user-journeys.md: High-level workflow guides (strategy alignment, regression detection, performance)

Enhanced Files

  • dataset-preparation.md: Added patterns for production traces, diverse sampling, per-row guidelines
  • scorers.md: Added multi-metric, trace-based, tool selection, and latency scorer patterns
  • tracing-integration.md: Added trace analysis patterns for debugging

Test plan

  • Verify all new files are properly formatted
  • Verify no duplicate content between files
  • Verify CRITICAL-interfaces.md doesn't hardcode info available via llms.txt
  • Verify all scripts still work

🤖 Generated with Claude Code

Add new reference documentation:
- GOTCHAS.md: 15+ common mistakes that cause evaluation failures
- CRITICAL-interfaces.md: Implementation details not in official docs
- patterns-context-optimization.md: Token/latency optimization strategies
- user-journeys.md: High-level workflow guides

Enhance existing references with advanced patterns:
- dataset-preparation.md: Production traces, diverse sampling, per-row guidelines
- scorers.md: Multi-metric, trace-based, tool selection, latency scorers
- tracing-integration.md: Trace analysis patterns for debugging

Update SKILL.md with references to new documentation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant