-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Part of #10
Build a realistic test corpus that mirrors how agents actually use memory in production.
Requirements
- Anonymized markdown files covering all the patterns agents use:
MEMORY.md— curated long-term facts, preferences, decisionsmemory/YYYY-MM-DD.md— daily notes with events, checks, conversationsmemory/tasks/*.md— structured tasks with schema fieldsmemory/topics/*.md— research notes, competitive analysismemory/people/*.md— person notes with relations
- Realistic size: ~30-50 files, ~50-100KB total
- Must exercise all query categories: exact facts, semantic, temporal, relational, cross-note, needle-in-haystack
- Include evolving facts (same topic updated across multiple daily notes)
- Include BM-specific features: observations
[key] value, relationsrelates_to [[Entity]], frontmatter schemas
Deliverable
benchmark/corpus/with all filesbenchmark/queries.jsonwith 50+ annotated queries- Ground truth: which files/chunks contain the answer for each query
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request