Benchmark: eval harness for Basic Memory search

## Part of #10

Build the eval harness that runs queries against real Basic Memory and scores results.

## Requirements

- TypeScript, runs via `bun`
- Uses real `bm search` CLI (not mocked)
- Indexes the test corpus into a BM project
- Runs all queries from `queries.json`
- Scores: Recall@5, Recall@10, MRR, Precision@5
- Groups results by query category
- Outputs results as JSON + human-readable table
- `just benchmark` in the justfile

## Implementation notes

- Set up a temporary BM project for the corpus
- Run `bm search` via CLI subprocess for each query
- Parse JSON output, compare against ground truth
- Also test `bm context` for relational queries (unique to BM)
- Measure latency per query (secondary metric)
- Clean up temp project after run

## Stretch

- Compare composited memory_search (MEMORY.md grep + BM search + task scan) vs BM search alone


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark: eval harness for Basic Memory search #12

Part of #10

Requirements

Implementation notes

Stretch

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmark: eval harness for Basic Memory search #12

Description

Part of #10

Requirements

Implementation notes

Stretch

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions