perf: vector insert performance optimization by kacy · Pull Request #271 · kacy/ember

kacy · 2026-02-24T14:46:03Z

summary

fix O(n²) memory tracking in VectorSet::memory_usage() — cached data_bytes field replaces per-call iteration over all element names. with 100k vectors across 200 batches, this eliminates ~10M hashmap iterations
switch VectorSet hashmaps from std HashMap (SipHash) to AHashMap for faster hashing on the insert hot path
add add_pre_validated() insert path — vadd_batch validates all vectors upfront, then uses the fast path that skips redundant per-element NaN/infinity checks (eliminates 38.4M redundant float checks for 100k × 128-dim)
incremental memory tracking in vadd_batch via grow_by() instead of rescanning the full set after each batch
benchmark methodology: increase default batch size 500→2000, add redis pipelining support (--pipeline-depth), add multi-key sharding (--shards), remove unnecessary decode_responses=True

what was tested

all 24 vector unit tests pass (cargo test -p emberkv-core --features vector -- vector)
clippy clean (cargo clippy -p emberkv-core --features vector -- -D warnings)
full server build verified (cargo check -p ember-server --features vector)
new tests added: data_bytes_tracks_names, add_pre_validated_works, memory_usage_consistent

design considerations

the dominant bottleneck was memory_usage() being O(n) and called after every batch insert — making the total cost O(n²) across batches. the fix follows the same data_bytes caching pattern already used by SortedSet (sorted_set.rs:75). the cached field is maintained incrementally in add_inner() and remove() using saturating_sub for safety. per_element_bytes() helper enables callers to compute deltas without calling memory_usage() at all.

- cache data_bytes in VectorSet for O(1) memory_usage() instead of iterating all element names on every call (was O(n²) across batches) - switch VectorSet hashmaps from std HashMap (SipHash) to AHashMap - add pre-validated insert path (add_pre_validated) that skips redundant NaN/infinity checks — vadd_batch validates upfront then uses fast path - incremental memory tracking in vadd_batch via grow_by() instead of rescanning the full set with memory::value_size() after each batch the O(n²) memory_usage() was the dominant bottleneck: with 200 batches of 500 vectors, it accumulated ~10M hashmap iterations. now it's a single arithmetic expression regardless of set size.

- increase default batch size from 500 to 2000 (configurable via BATCH_SIZE env var) to reduce round-trip overhead per vector - add redis pipeline support (--pipeline-depth N) to send multiple batches concurrently, better saturating ember's async pipeline - add multi-key sharding (--shards N) to distribute vectors across N keys, leveraging ember's thread-per-core architecture - remove decode_responses=True from ember client — avoids unnecessary UTF-8 decoding overhead on RESP responses during inserts - add BATCH_SIZE and PIPELINE_DEPTH env vars to bench-vector.sh

kacy added 2 commits February 24, 2026 09:43

kacy merged commit 7314722 into main Feb 24, 2026
6 of 7 checks passed

kacy deleted the perf/vector-insert-optimization branch February 24, 2026 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: vector insert performance optimization#271

perf: vector insert performance optimization#271
kacy merged 2 commits intomainfrom
perf/vector-insert-optimization

kacy commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kacy commented Feb 24, 2026

summary

what was tested

design considerations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant