Skip to content

perf: vector insert performance optimization#271

Merged
kacy merged 2 commits intomainfrom
perf/vector-insert-optimization
Feb 24, 2026
Merged

perf: vector insert performance optimization#271
kacy merged 2 commits intomainfrom
perf/vector-insert-optimization

Conversation

@kacy
Copy link
Owner

@kacy kacy commented Feb 24, 2026

summary

  • fix O(n²) memory tracking in VectorSet::memory_usage() — cached data_bytes field replaces per-call iteration over all element names. with 100k vectors across 200 batches, this eliminates ~10M hashmap iterations
  • switch VectorSet hashmaps from std HashMap (SipHash) to AHashMap for faster hashing on the insert hot path
  • add add_pre_validated() insert path — vadd_batch validates all vectors upfront, then uses the fast path that skips redundant per-element NaN/infinity checks (eliminates 38.4M redundant float checks for 100k × 128-dim)
  • incremental memory tracking in vadd_batch via grow_by() instead of rescanning the full set after each batch
  • benchmark methodology: increase default batch size 500→2000, add redis pipelining support (--pipeline-depth), add multi-key sharding (--shards), remove unnecessary decode_responses=True

what was tested

  • all 24 vector unit tests pass (cargo test -p emberkv-core --features vector -- vector)
  • clippy clean (cargo clippy -p emberkv-core --features vector -- -D warnings)
  • full server build verified (cargo check -p ember-server --features vector)
  • new tests added: data_bytes_tracks_names, add_pre_validated_works, memory_usage_consistent

design considerations

the dominant bottleneck was memory_usage() being O(n) and called after every batch insert — making the total cost O(n²) across batches. the fix follows the same data_bytes caching pattern already used by SortedSet (sorted_set.rs:75). the cached field is maintained incrementally in add_inner() and remove() using saturating_sub for safety. per_element_bytes() helper enables callers to compute deltas without calling memory_usage() at all.

kacy added 2 commits February 24, 2026 09:43
- cache data_bytes in VectorSet for O(1) memory_usage() instead of
  iterating all element names on every call (was O(n²) across batches)
- switch VectorSet hashmaps from std HashMap (SipHash) to AHashMap
- add pre-validated insert path (add_pre_validated) that skips redundant
  NaN/infinity checks — vadd_batch validates upfront then uses fast path
- incremental memory tracking in vadd_batch via grow_by() instead of
  rescanning the full set with memory::value_size() after each batch

the O(n²) memory_usage() was the dominant bottleneck: with 200 batches
of 500 vectors, it accumulated ~10M hashmap iterations. now it's a
single arithmetic expression regardless of set size.
- increase default batch size from 500 to 2000 (configurable via
  BATCH_SIZE env var) to reduce round-trip overhead per vector
- add redis pipeline support (--pipeline-depth N) to send multiple
  batches concurrently, better saturating ember's async pipeline
- add multi-key sharding (--shards N) to distribute vectors across
  N keys, leveraging ember's thread-per-core architecture
- remove decode_responses=True from ember client — avoids unnecessary
  UTF-8 decoding overhead on RESP responses during inserts
- add BATCH_SIZE and PIPELINE_DEPTH env vars to bench-vector.sh
@kacy kacy merged commit 7314722 into main Feb 24, 2026
6 of 7 checks passed
@kacy kacy deleted the perf/vector-insert-optimization branch February 24, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant