Context
We already have a benchmark comparison against Mem0 that shows MenteDB winning on retrieval quality and latency. Graphiti is the other major player in the space and arguably the most architecturally sophisticated alternative (temporal knowledge graphs, episode based ingestion).
Proposed Work
- Set up a reproducible benchmark suite that runs both MenteDB and Graphiti on the same workloads
- Key dimensions to measure:
- Retrieval accuracy — given a conversation history, how relevant are the returned memories?
- Temporal reasoning — can the system correctly answer "what changed?" and "what was true at time T?"
- Ingestion latency — time to process a conversation turn
- Query latency — time to retrieve relevant context
- Storage efficiency — disk/memory usage for equivalent workloads
- Use realistic multi turn conversation datasets (not synthetic)
- Publish results in the repo (benchmarks/ directory) with reproduction instructions
Why
A public benchmark builds credibility and helps users make informed choices. It also highlights where MenteDB needs to improve (likely temporal reasoning until bi-temporal validity lands).
Context
We already have a benchmark comparison against Mem0 that shows MenteDB winning on retrieval quality and latency. Graphiti is the other major player in the space and arguably the most architecturally sophisticated alternative (temporal knowledge graphs, episode based ingestion).
Proposed Work
Why
A public benchmark builds credibility and helps users make informed choices. It also highlights where MenteDB needs to improve (likely temporal reasoning until bi-temporal validity lands).