perf: vector insert performance optimization#271
Merged
Conversation
- cache data_bytes in VectorSet for O(1) memory_usage() instead of iterating all element names on every call (was O(n²) across batches) - switch VectorSet hashmaps from std HashMap (SipHash) to AHashMap - add pre-validated insert path (add_pre_validated) that skips redundant NaN/infinity checks — vadd_batch validates upfront then uses fast path - incremental memory tracking in vadd_batch via grow_by() instead of rescanning the full set with memory::value_size() after each batch the O(n²) memory_usage() was the dominant bottleneck: with 200 batches of 500 vectors, it accumulated ~10M hashmap iterations. now it's a single arithmetic expression regardless of set size.
- increase default batch size from 500 to 2000 (configurable via BATCH_SIZE env var) to reduce round-trip overhead per vector - add redis pipeline support (--pipeline-depth N) to send multiple batches concurrently, better saturating ember's async pipeline - add multi-key sharding (--shards N) to distribute vectors across N keys, leveraging ember's thread-per-core architecture - remove decode_responses=True from ember client — avoids unnecessary UTF-8 decoding overhead on RESP responses during inserts - add BATCH_SIZE and PIPELINE_DEPTH env vars to bench-vector.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
summary
VectorSet::memory_usage()— cacheddata_bytesfield replaces per-call iteration over all element names. with 100k vectors across 200 batches, this eliminates ~10M hashmap iterationsHashMap(SipHash) toAHashMapfor faster hashing on the insert hot pathadd_pre_validated()insert path —vadd_batchvalidates all vectors upfront, then uses the fast path that skips redundant per-element NaN/infinity checks (eliminates 38.4M redundant float checks for 100k × 128-dim)vadd_batchviagrow_by()instead of rescanning the full set after each batch--pipeline-depth), add multi-key sharding (--shards), remove unnecessarydecode_responses=Truewhat was tested
cargo test -p emberkv-core --features vector -- vector)cargo clippy -p emberkv-core --features vector -- -D warnings)cargo check -p ember-server --features vector)data_bytes_tracks_names,add_pre_validated_works,memory_usage_consistentdesign considerations
the dominant bottleneck was
memory_usage()being O(n) and called after every batch insert — making the total cost O(n²) across batches. the fix follows the samedata_bytescaching pattern already used bySortedSet(sorted_set.rs:75). the cached field is maintained incrementally inadd_inner()andremove()usingsaturating_subfor safety.per_element_bytes()helper enables callers to compute deltas without callingmemory_usage()at all.