Skip to content

Benchmark: build realistic test corpus from agent memory patterns #11

@bm-clawd

Description

@bm-clawd

Part of #10

Build a realistic test corpus that mirrors how agents actually use memory in production.

Requirements

  • Anonymized markdown files covering all the patterns agents use:
    • MEMORY.md — curated long-term facts, preferences, decisions
    • memory/YYYY-MM-DD.md — daily notes with events, checks, conversations
    • memory/tasks/*.md — structured tasks with schema fields
    • memory/topics/*.md — research notes, competitive analysis
    • memory/people/*.md — person notes with relations
  • Realistic size: ~30-50 files, ~50-100KB total
  • Must exercise all query categories: exact facts, semantic, temporal, relational, cross-note, needle-in-haystack
  • Include evolving facts (same topic updated across multiple daily notes)
  • Include BM-specific features: observations [key] value, relations relates_to [[Entity]], frontmatter schemas

Deliverable

  • benchmark/corpus/ with all files
  • benchmark/queries.json with 50+ annotated queries
  • Ground truth: which files/chunks contain the answer for each query

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions