Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,11 +268,12 @@ tested on GCP c2-standard-8 (8 vCPU Intel Xeon @ 3.10GHz). see [bench/README.md]
\*redis-benchmark, 64B values, P=16, 8 threads. take these comparisons with a grain of salt — ember is a small indie project; Redis and Dragonfly are battle-tested systems built by large teams over many years. see [bench/README.md](bench/README.md) for important caveats.

**highlights**:
- sharded mode: 1.24M SET/sec, 1.56M GET/sec (redis-benchmark, P=16)
- concurrent mode: 1.77M SET/sec, 2.16M GET/sec (redis-benchmark, P=16)
- p99 latency: 0.61ms SET, 0.56ms GET (P=1, concurrent mode)
- vector queries: 1.2k queries/sec with 4-6x less memory than chromadb
- memory: ~166 bytes/key (redis: ~105 bytes/key)
- sharded mode: 1.19M SET/sec, 1.48M GET/sec (redis-benchmark, P=16)
- concurrent mode: 963k SET/sec, 1.10M GET/sec (redis-benchmark, P=16)
- p99 latency: 0.60ms SET, 0.60ms GET (P=1, concurrent mode)
- vector queries: 1.4k queries/sec (gRPC) with 3-5x less memory than alternatives
- pub/sub: 11k msg/s single subscriber, 27k msg/s fan-out at 100 subscribers
- encryption at rest: 16-44% SET overhead, near-zero GET overhead

```bash
./bench/bench-quick.sh # quick sanity check
Expand Down
63 changes: 32 additions & 31 deletions bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,28 +104,29 @@ persistence overhead is minimal with everysec fsync. `appendfsync always` has si

AES-256-GCM encryption at rest (AOF and snapshots). requires building with `--features encryption`:

| mode | SET throughput | overhead vs plaintext |
|------|----------------|----------------------|
| ember concurrent | — | — |
| ember sharded | — | — |
| test | plaintext | encrypted | overhead |
|------|-----------|-----------|----------|
| SET (P=16) | 848,245/s | 467,523/s | 44% |
| GET (P=16) | 1,496,716/s | 1,252,950/s | 16% |
| SET (P=1) | 160,000/s | 133,155/s | 16% |
| GET (P=1) | 159,872/s | 159,872/s | 0% |

*results pending — run `bench/bench-encryption.sh` on a dedicated VM to populate.*

note: encryption only affects persistence writes. GET throughput should be unchanged since reads come from the in-memory keyspace.
note: encryption only affects persistence writes. GET overhead at P=1 is effectively zero since reads come from the in-memory keyspace. the higher P=16 GET overhead comes from increased memory pressure during concurrent persistence flushes.

### vector similarity

ember vs chromadb vs pgvector. 100k random vectors, 128 dimensions, cosine metric, k=10 kNN search.
HNSW index: M=16, ef_construction=64 for all systems. tested on GCP c2-standard-8.

| metric | ember | chromadb | pgvector | qdrant |
|--------|-------|----------|----------|--------|
| insert (vectors/sec) | 917 | **3,738** | 1,562 | — |
| query (queries/sec) | **1,214** | 376 | 882 | — |
| query p99 (ms) | **1.09ms** | 2.90ms | 1.52ms | — |
| memory (MB) | **30 MB** | 122 MB | 178 MB | — |
| metric | ember (RESP) | ember (gRPC) | chromadb | pgvector | qdrant |
|--------|--------------|--------------|----------|----------|--------|
| insert (vectors/sec) | 945 | 995 | **3,837** | 1,528 | 7,616 |
| query (queries/sec) | 1,214 | **1,372** | 382 | 823 | 561 |
| query p50 (ms) | 0.82ms | **0.73ms** | 2.61ms | 1.19ms | 1.78ms |
| query p99 (ms) | 0.96ms | **0.89ms** | 2.86ms | 1.58ms | 2.02ms |
| memory (MB) | **36 MB** | — | 123 MB | 178 MB | 127 MB |

ember's query throughput is 3.2x chromadb and 1.4x pgvector, with 4-6x lower memory usage. insert throughput is lower due to per-vector RESP protocol overhead — batched pipelining helps but each VADD is still a separate command.
ember's query throughput is 3.2x chromadb, 1.5x pgvector, and 2.2x qdrant — with 3-5x lower memory usage. gRPC gives an additional 13% boost over RESP3 for queries. insert throughput is lower due to per-vector protocol overhead — qdrant's batch API gives it a significant edge on inserts.

#### SIFT1M recall accuracy (128-dim, 1M vectors, 10k queries)

Expand All @@ -152,31 +153,31 @@ standard SET/GET operations comparing RESP3 (redis-py) against gRPC (ember-py).

| test | ops/sec | p50 (ms) | p99 (ms) |
|------|---------|----------|----------|
| RESP3 SET (sequential) | | | |
| RESP3 GET (sequential) | | | |
| RESP3 SET (pipelined) | | | |
| RESP3 GET (pipelined) | | | |
| gRPC SET (unary) | | | |
| gRPC GET (unary) | | | |
| RESP3 SET (sequential) | 15,414 | 0.061 | 0.100 |
| RESP3 GET (sequential) | 17,573 | 0.055 | 0.085 |
| RESP3 SET (pipelined) | **83,069** | 0.012 | 0.014 |
| RESP3 GET (pipelined) | **108,781** | 0.009 | 0.010 |
| gRPC SET (unary) | 6,274 | 0.149 | 0.236 |
| gRPC GET (unary) | 6,282 | 0.150 | 0.237 |

*results pending — run `bench/bench-grpc.sh` to populate. requires `--features grpc` and ember-py client installed.*
RESP3 with pipelining dominates for standard key-value operations — the protocol is simpler and redis-py's pipeline batching is very efficient. gRPC unary calls have higher per-request overhead but provide typed APIs, streaming, and cross-language codegen. for vector operations, gRPC's advantage is more pronounced (see vector section above).

### pub/sub throughput

publish throughput and fan-out delivery rate across subscriber counts and message sizes. 10k messages per test.

| test | pub msg/s | fanout msg/s | p99 (ms) |
|------|-----------|--------------|----------|
| 1 sub, 64B, SUBSCRIBE | | | |
| 10 sub, 64B, SUBSCRIBE | | | |
| 100 sub, 64B, SUBSCRIBE | | | |
| 1 sub, 1KB, SUBSCRIBE | | | |
| 10 sub, 1KB, SUBSCRIBE | | | |
| 100 sub, 1KB, SUBSCRIBE | | | |
| 10 sub, 64B, PSUBSCRIBE | | | |
| 100 sub, 64B, PSUBSCRIBE | | | |

*results pending — run `bench/bench-pubsub.sh` to populate.*
| 1 sub, 64B, SUBSCRIBE | **11,006** | 11,006 | 0.19 |
| 10 sub, 64B, SUBSCRIBE | 1,893 | 18,929 | 3.07 |
| 100 sub, 64B, SUBSCRIBE | 388 | **27,202** | 26.82 |
| 1 sub, 1KB, SUBSCRIBE | **10,235** | 10,235 | 0.21 |
| 10 sub, 1KB, SUBSCRIBE | 1,832 | 18,320 | 3.19 |
| 100 sub, 1KB, SUBSCRIBE | 389 | **26,189** | 27.54 |
| 10 sub, 64B, PSUBSCRIBE | 1,837 | 18,370 | 3.19 |
| 100 sub, 64B, PSUBSCRIBE | 391 | 26,221 | 27.33 |

single-subscriber throughput is 11k msg/s with sub-millisecond p99 latency. fan-out scales well — 100 subscribers deliver 27k total messages/sec. message size (64B vs 1KB) has minimal impact. PSUBSCRIBE pattern matching adds negligible overhead vs exact SUBSCRIBE.

### protobuf storage overhead

Expand Down