A performance benchmarking tool for evaluating response time characteristics across modern search APIs.
cp .env.example .env
# Edit .env with your API keysCredentials needed:
EXA_API_KEY- Exa search APIBRAVE_API_KEY- Brave search APIPPLX_API_KEY- Perplexity search APIOPENAI_API_KEY- For query generation (optional)
The repository includes 250 sample queries from MS MARCO in sample_queries/msmarco.jsonl to get started quickly.
# Test with included MS MARCO queries
uv run bench local --file sample_queries/queries_msmarco.jsonl --api all
# Test a single API
uv run bench local --file queries.jsonl --api exa-auto
# Sample subset of queries
uv run bench local --file queries.jsonl --num-queries 50 --api all# Parallel execution for higher throughput
uv run bench local --file queries.jsonl --api all --parallel --max-workers 20# Generate synthetic queries with GPT-5-mini
uv run bench gen --count 100 --api all --parallel# Benchmark with MS MARCO queries
uv run bench dataset --name microsoft/ms_marco --config v2.1 --num-queries 1000 --api all
# Any HuggingFace dataset
uv run bench dataset \
--name <dataset-name> \
--query-field <field-name> \
--num-queries 100 \
--api exa-autouv run bench local \
--file queries.jsonl \
--api all \
--num-queries 100 \
--num-results 10 \
--parallel \
--max-workers 20 \
--output resultsexa-auto- Exa with auto modeexa-fast- Exa with fast modebrave- Brave Searchperplexity- Perplexity Searchall- Run all APIs sequentially
Supports JSON and JSONL query files:
["query 1", "query 2", "query 3"]{"query": "query 1"}
{"query": "query 2"}Benchmarks generate timestamped JSON files with detailed performance metrics:
results/
├── exa-auto_results_20250110_143052.json
├── exa-fast_results_20250110_143052.json
├── brave_results_20250110_143052.json
└── perplexity_results_20250110_143052.json
Each result file includes:
- Latency percentiles (P50, P90, P95, P99)
- Aggregate statistics (min, max, mean)
- Individual query timings
- Success/failure counts
- Execution metadata
from search_latency_bench import ExaSearchEngine, run_benchmark
from search_latency_bench.engines.exa import SearchType
engine = ExaSearchEngine(type=SearchType.AUTO)
result = await run_benchmark(
engine=engine,
queries=["quantum computing", "climate change solutions"],
num_results=10,
api_name="exa-auto",
parallel=True,
)
print(f"P50 latency: {result.summary.latency.p50:.1f}ms")
print(f"P95 latency: {result.summary.latency.p95:.1f}ms")
print(f"Success rate: {result.summary.successful_queries}/{result.summary.total_queries}")