perf: hot-path quick wins — fuse lookups, cache sizes, pack Entry#226
Merged
perf: hot-path quick wins — fuse lookups, cache sizes, pack Entry#226
Conversation
add auth, election, and raft_transport to the modules table and features list. fix emberkv-core typo → ember-core in the related crates table.
hot-path quick wins from performance audit (phase A): - fuse double hash probe on GET/GET_STRING into single lookup that checks expiry inline, eliminating ~2M redundant probes/sec - cache value_size in Entry struct so memory accounting is O(1) instead of walking entire collections on every mutation - shrink last_access from u64 ms to u32 secs, saving 4 bytes/entry and improving cache-line packing for the hot Entry struct - replace allocating peek_command_name (3 heap allocs per MULTI/EXEC command) with zero-allocation eq_ignore_ascii_case comparisons
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
summary
phase A of the performance audit: four hot-path optimizations shipped together since they touch overlapping code (especially the Entry struct). estimated +15–30% throughput improvement on GET-heavy workloads.
fuse double hash lookup on GET — every GET previously did
remove_if_expired(key)(one hash probe) thenentries.get_mut(key)(second probe). now uses a singleget_mut()that checks expiry inline. at 2M+ ops/sec this eliminates millions of redundant probes per second.cache value_size in Entry — added
cached_value_size: usizeto Entry. mutations update it incrementally instead of recomputing viavalue_size()which walks entire collections. HSET on a 1000-field hash no longer iterates all fields twice for memory accounting.Entry cache-line packing — shrunk
last_access_ms: u64tolast_access_secs: u32(seconds since process start, wraps at ~136 years). saves 4 bytes per entry and keeps hot fields within the first cache line.zero-alloc peek_command_name — replaced 3 heap allocations per MULTI/EXEC command (to_vec → String::from_utf8 → to_ascii_uppercase) with inline
eq_ignore_ascii_casereturning&'static str.item 5 (SmallVec for parsed arrays) was investigated and intentionally skipped — Frame is ~32 bytes, so
SmallVec<[Frame; 6]>would add ~200 bytes inline, bloating the Frame enum for all variants. net negative.what was tested
cached_value_sizecorrectlydesign considerations
remove_expired_entrycached_value_sizeis maintained incrementally on all mutation paths (list push/pop, hash set/del/incrby, set add/rem, sorted set add/rem, vector add/rem) rather than lazily recomputednow_secs()uses a process-start epoch viaOnceLock<Instant>to avoid syscalls — monotonic and wraps at ~136 years which is acceptable for LRU ordering