-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Part of #10
Build the eval harness that runs queries against real Basic Memory and scores results.
Requirements
- TypeScript, runs via
bun - Uses real
bm searchCLI (not mocked) - Indexes the test corpus into a BM project
- Runs all queries from
queries.json - Scores: Recall@5, Recall@10, MRR, Precision@5
- Groups results by query category
- Outputs results as JSON + human-readable table
just benchmarkin the justfile
Implementation notes
- Set up a temporary BM project for the corpus
- Run
bm searchvia CLI subprocess for each query - Parse JSON output, compare against ground truth
- Also test
bm contextfor relational queries (unique to BM) - Measure latency per query (secondary metric)
- Clean up temp project after run
Stretch
- Compare composited memory_search (MEMORY.md grep + BM search + task scan) vs BM search alone
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request