Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,12 +271,16 @@ tested on GCP c2-standard-8 (8 vCPU Intel Xeon @ 3.10GHz). see [bench/README.md]
- sharded mode: 1.24M SET/sec, 1.56M GET/sec (redis-benchmark, P=16)
- concurrent mode: 1.77M SET/sec, 2.16M GET/sec (redis-benchmark, P=16)
- p99 latency: 0.61ms SET, 0.56ms GET (P=1, concurrent mode)
- vector queries: 1.2k queries/sec with 4-6x less memory than chromadb
- memory: ~166 bytes/key (redis: ~105 bytes/key)

```bash
./bench/bench-quick.sh # quick sanity check
./bench/compare-redis.sh # redis-benchmark comparison
./bench/bench-memtier.sh # memtier_benchmark comparison
./bench/bench-vector.sh # vector similarity (ember vs chromadb vs pgvector vs qdrant)
./bench/bench-grpc.sh # gRPC vs RESP3
./bench/bench-all.sh # run everything
```

## architecture
Expand All @@ -299,7 +303,7 @@ contributions welcome — see [CONTRIBUTING.md](CONTRIBUTING.md).
| 4 | clustering (raft, gossip, slots, migration) | ✅ complete |
| 5 | developer experience (observability, CLI, clients) | 🚧 in progress |

**current**: 101 commands, 796+ tests, ~22k lines of code (excluding tests)
**current**: 106 commands, 796+ tests, ~31k lines of code (excluding tests)

## security

Expand Down
125 changes: 108 additions & 17 deletions bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ tested on GCP c2-standard-8 (8 vCPU Intel Xeon @ 3.10GHz), Ubuntu 22.04, 2.4M to

dragonfly in particular offers features ember simply doesn't have:

- full Redis API compatibility (200+ commands vs ember's ~101)
- full Redis API compatibility (200+ commands vs ember's ~106)
- sophisticated memory management (dashtable for ~25% of Redis memory usage)
- transactional semantics (MULTI/EXEC, Lua scripting)
- fork-free snapshotting
Expand Down Expand Up @@ -118,24 +118,24 @@ note: encryption only affects persistence writes. GET throughput should be uncha
ember vs chromadb vs pgvector. 100k random vectors, 128 dimensions, cosine metric, k=10 kNN search.
HNSW index: M=16, ef_construction=64 for all systems. tested on GCP c2-standard-8.

| metric | ember | chromadb | pgvector |
|--------|-------|----------|----------|
| insert (vectors/sec) | 917 | **3,738** | 1,562 |
| query (queries/sec) | **1,214** | 376 | 882 |
| query p99 (ms) | **1.09ms** | 2.90ms | 1.52ms |
| memory (MB) | **30 MB** | 122 MB | 178 MB |
| metric | ember | chromadb | pgvector | qdrant |
|--------|-------|----------|----------|--------|
| insert (vectors/sec) | 917 | **3,738** | 1,562 | — |
| query (queries/sec) | **1,214** | 376 | 882 | — |
| query p99 (ms) | **1.09ms** | 2.90ms | 1.52ms | — |
| memory (MB) | **30 MB** | 122 MB | 178 MB | — |

ember's query throughput is 3.2x chromadb and 1.4x pgvector, with 4-6x lower memory usage. insert throughput is lower due to per-vector RESP protocol overhead — batched pipelining helps but each VADD is still a separate command.

#### SIFT1M recall accuracy (128-dim, 1M vectors, 10k queries)

| metric | ember | chromadb | pgvector |
|--------|-------|----------|----------|
| recall@10 | — | — | — |
| insert (vectors/sec) | — | — | — |
| query p99 (ms) | — | — | — |
| metric | ember | chromadb | pgvector | qdrant |
|--------|-------|----------|----------|--------|
| recall@10 | — | — | — | — |
| insert (vectors/sec) | — | — | — | — |
| query p99 (ms) | — | — | — | — |

*results pending — requires a larger VM (c2-standard-16 or higher) since the 1M-vector HNSW index exceeds 16GB RAM during construction. run `bench/bench-vector.sh --sift` to populate.*
*results pending — run `bench/bench-vector.sh --sift --qdrant` on a c2-standard-8 (32GB) to populate. SIFT1M is 1M × 128 × 4B = 512MB raw data; ember HNSW peaks around 1.5GB RSS, well within 32GB.*

### scaling efficiency

Expand All @@ -146,6 +146,66 @@ ember's query throughput is 3.2x chromadb and 1.4x pgvector, with 4-6x lower mem

sharded mode scales super-linearly with cores for pipelined workloads thanks to the dispatch-collect pipeline pattern. concurrent mode uses a global DashMap and doesn't scale with core count but has lower per-request overhead.

### gRPC vs RESP3

standard SET/GET operations comparing RESP3 (redis-py) against gRPC (ember-py). 100k requests, 64B values.

| test | ops/sec | p50 (ms) | p99 (ms) |
|------|---------|----------|----------|
| RESP3 SET (sequential) | — | — | — |
| RESP3 GET (sequential) | — | — | — |
| RESP3 SET (pipelined) | — | — | — |
| RESP3 GET (pipelined) | — | — | — |
| gRPC SET (unary) | — | — | — |
| gRPC GET (unary) | — | — | — |

*results pending — run `bench/bench-grpc.sh` to populate. requires `--features grpc` and ember-py client installed.*

### pub/sub throughput

publish throughput and fan-out delivery rate across subscriber counts and message sizes. 10k messages per test.

| test | pub msg/s | fanout msg/s | p99 (ms) |
|------|-----------|--------------|----------|
| 1 sub, 64B, SUBSCRIBE | — | — | — |
| 10 sub, 64B, SUBSCRIBE | — | — | — |
| 100 sub, 64B, SUBSCRIBE | — | — | — |
| 1 sub, 1KB, SUBSCRIBE | — | — | — |
| 10 sub, 1KB, SUBSCRIBE | — | — | — |
| 100 sub, 1KB, SUBSCRIBE | — | — | — |
| 10 sub, 64B, PSUBSCRIBE | — | — | — |
| 100 sub, 64B, PSUBSCRIBE | — | — | — |

*results pending — run `bench/bench-pubsub.sh` to populate.*

### protobuf storage overhead

PROTO.* commands vs raw SET/GET with identical data. measures the cost of server-side schema validation and field-level access. 100k requests, bench.User message (~30 bytes).

| test | ops/sec | p50 (ms) | p99 (ms) |
|------|---------|----------|----------|
| raw SET | — | — | — |
| PROTO.SET | — | — | — |
| raw GET | — | — | — |
| PROTO.GET | — | — | — |
| PROTO.GETFIELD | — | — | — |
| PROTO.SETFIELD | — | — | — |

*results pending — run `bench/bench-proto.sh` to populate. requires `--features protobuf` and `protoc` on PATH.*

### memory by data type

per-key memory overhead across data types. string: 1M keys, 64B values. hash/zset: 100k keys. vector: 100k 128-dim vectors.

| data type | ember concurrent | ember sharded | redis |
|-----------|------------------|---------------|-------|
| string (64B) | — | — | — |
| hash (5 fields) | — | — | — |
| sorted set | — | — | — |
| vector (128-dim) | — | — | — |

*results pending — run `bench/bench-memory.sh --vector` to populate.*

## execution modes

ember offers two modes with different tradeoffs:
Expand Down Expand Up @@ -198,8 +258,25 @@ cargo build --release -p ember-server --features jemalloc,vector
# vector benchmark (ember only, no docker required)
./bench/bench-vector.sh --ember-only

# vector benchmark with qdrant
./bench/bench-vector.sh --qdrant

# SIFT1M recall accuracy
./bench/bench-vector.sh --sift

# gRPC vs RESP3 comparison (requires --features grpc + ember-py)
cargo build --release -p ember-server --features jemalloc,grpc
./bench/bench-grpc.sh

# pub/sub throughput
./bench/bench-pubsub.sh

# protobuf storage overhead (requires --features protobuf + protoc)
cargo build --release -p ember-server --features jemalloc,protobuf
./bench/bench-proto.sh

# run everything (builds with all features automatically)
./bench/bench-all.sh
```

### cloud VM benchmarking
Expand All @@ -217,12 +294,22 @@ gcloud compute instances create ember-bench \
# bootstrap (installs rust, redis, memtier_benchmark, dragonfly)
gcloud compute ssh ember-bench --zone=us-central1-a -- 'bash -s' < ./bench/setup-vm.sh

# run benchmarks
# setup for vector benchmarks (docker, python deps, qdrant)
gcloud compute ssh ember-bench --zone=us-central1-a -- 'bash -s' < ./bench/setup-vm-vector.sh

# run all benchmarks
gcloud compute ssh ember-bench --zone=us-central1-a
cd ember
./bench/bench-all.sh # runs everything sequentially

# or run individual suites
./bench/compare-redis.sh # redis-benchmark suite
./bench/bench-memtier.sh # memtier_benchmark suite
./bench/bench-memory.sh # memory comparison
./bench/bench-vector.sh --qdrant # vector comparison
./bench/bench-grpc.sh # gRPC vs RESP3
./bench/bench-pubsub.sh # pub/sub
./bench/bench-proto.sh # protobuf overhead

# cleanup
gcloud compute instances delete ember-bench --zone=us-central1-a
Expand All @@ -232,13 +319,17 @@ gcloud compute instances delete ember-bench --zone=us-central1-a

| script | description |
|--------|-------------|
| `bench-all.sh` | run all benchmarks sequentially (builds with all features) |
| `bench.sh` | full benchmark: ember (sharded + concurrent) vs redis |
| `bench-quick.sh` | quick sanity check (~10 seconds) |
| `bench-memory.sh` | memory usage with 1M keys |
| `bench-memory.sh` | memory usage across data types (string, hash, zset, vector) |
| `compare-redis.sh` | comprehensive comparison using redis-benchmark |
| `bench-memtier.sh` | comprehensive comparison using memtier_benchmark |
| `bench-encryption.sh` | encryption at rest overhead (plaintext vs AES-256-GCM) |
| `bench-vector.sh` | vector similarity: ember vs chromadb vs pgvector |
| `bench-vector.sh` | vector similarity: ember vs chromadb vs pgvector vs qdrant |
| `bench-grpc.sh` | gRPC vs RESP3 standard operations |
| `bench-pubsub.sh` | pub/sub throughput and fan-out latency |
| `bench-proto.sh` | protobuf storage overhead (PROTO.* vs raw SET/GET) |
| `setup-vm.sh` | bootstrap dependencies on fresh ubuntu VM |
| `setup-vm-vector.sh` | additional dependencies for vector benchmarks |

Expand All @@ -252,7 +343,7 @@ BENCH_REQUESTS=1000000 BENCH_THREADS=16 ./bench/compare-redis.sh
MEMTIER_THREADS=8 MEMTIER_CLIENTS=16 MEMTIER_REQUESTS=20000 ./bench/bench-memtier.sh

# customize memory test
KEY_COUNT=5000000 VALUE_SIZE=128 ./bench/bench-memory.sh
STRING_KEYS=5000000 VALUE_SIZE=128 ./bench/bench-memory.sh
```

## environment variables
Expand Down