Remove internal server object pointer overhead in small strings#2516
Remove internal server object pointer overhead in small strings#2516madolson merged 8 commits intovalkey-io:unstablefrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR implements a memory optimization for small strings by reusing the 8-byte robj->ptr memory to store embedded data. This change reduces per-item memory overhead by approximately 20-30% for affected value sizes while maintaining performance.
Key changes include:
- Replacing direct
robj->ptraccess with getter/setter methods throughout the codebase - Modifying the embedding logic to overwrite the ptr location with embedded data
- Increasing embedding size thresholds to maximize memory savings
Reviewed Changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/server.h | Updates serverObject struct to support embedded values and adds accessor function declarations |
| src/unit/test_object.c | Adds comprehensive tests for embedded string functionality with keys, expiration, and value operations |
| tests/unit/type/string.tcl | Updates memory usage tests to reflect new embedding behavior and memory layout |
| src/unit/test_files.h | Registers new test functions for embedded string functionality |
| Multiple t_*.c files | Replaces direct ptr access with objectGetVal/objectSetVal calls throughout data type implementations |
| src/server.c | Updates hash and comparison functions to use new object accessor methods |
| src/tracking.c | Updates tracking functionality to use new object accessor methods |
| src/valkey-check-rdb.c | Updates RDB checking code to use new object accessor methods |
| src/sort.c | Updates sorting functionality to use new object accessor methods |
Comments suppressed due to low confidence (6)
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
We probably also want a performance test for IO threading as well either that or a deep pipeline. It will tease out a performance regression more clearly. |
zuiderkwast
left a comment
There was a problem hiding this comment.
Great! I'm aligned with this change. We have been chatting about it before.
I posted a few random comments.
4710276 to
db2e800
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #2516 +/- ##
============================================
+ Coverage 74.27% 74.31% +0.04%
============================================
Files 129 129
Lines 70923 70974 +51
============================================
+ Hits 52675 52745 +70
+ Misses 18248 18229 -19
🚀 New features to boost your workflow:
|
db2e800 to
91568dc
Compare
|
I updated the test results using a pipeline depth of 100, and I've also rebased the branch, appended some minor fixes, and addressed PR feedback. :) |
zuiderkwast
left a comment
There was a problem hiding this comment.
Not a full review; just follow-up on previous comments.
Also take a look at the failing test on 32-bit x86:
*** [err]: Memory usage of embedded string value in tests/unit/type/string.tcl
Expected '6' to be equal to '10' (context: type eval line 24 cmd {assert_equal 6 $sds_overhead} proc ::test)
fcc304a to
7d41e05
Compare
7d41e05 to
d8400e3
Compare
|
Benchmark ran on this commit: Benchmark Comparison by CommandGET | cluster disabled | tls disabled
GET | cluster enabled | tls disabled
MGET | cluster disabled | tls disabled
MSET | cluster disabled | tls disabled
SET | cluster disabled | tls disabled
SET | cluster enabled | tls disabled
|
|
"Reduce small string memory overhead by ~20-30%" This PR title isn't perfect. It describes the benefit, which is important, but a PR title should instead primarily summarize the change itself. I mean, reducing memory overhead can be done in other ways too, so it's not immediately clear from the title what this change is. Naming isn't easy. We also want to avoid going into code details, because these titles will end up in release notes and be read by users. How about something like "Remove internal pointer overhead in small strings"? |
|
Small note: we should definitely take that, but it would probably impact our ability to simply backport fixed to older versions. Maybe we could just backport the function objectGetVal as a macro or something which will use the ptr in older versions? (These are the cases I miss c++ operator overloading :)) |
|
For anyone wondering what I'm waiting for: In my latest tests I'm seeing a 4-5% regression with and without LTO for SET with 512B values on ARM architecture. So I'm investigating further to see if this came from adding the pure attribute and how smaller data sizes were affected. |
madolson
left a comment
There was a problem hiding this comment.
Curious about the performance delta, but besides that I think all the code is fine.
Signed-off-by: Rain Valentine <rsg000@gmail.com>
…ds. No manual edits. Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Rain Valentine <rsg000@gmail.com>
…->ptr to val_ptr to ensure any code not updated fails to compile Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Rain Valentine <rsg000@gmail.com>
f8bc389 to
e737021
Compare
|
I've rebased again. This time I've omitted the My conclusions from the performance tests:
|
This reverts commit e737021.
|
After discussion with @madolson I've reverted the increase in size threshold for embedding values for now, since for the sizes affected we see 25% worse SET performance, which might break some customer use cases. (Though we do see 20% better GET.) I'll investigate the difference in SET between O2 and O3/LTO compiler options and perhaps we can revisit the threshold change in the future. :) |
Signed-off-by: Rain Valentine <rsg000@gmail.com>
|
Merging this so that it stops accumulating merge conflicts. There was a comment about doing this in 2 commits, but I think 1 is OK. |
enjoy-binbin
left a comment
There was a problem hiding this comment.
Another memory optimization!
Just leave the comment, good job!
|
Great, finally!
So the memory usage is the green line in the "Memory efficiency comparison" diagram. Makes sense. For a 16B key, it means we only embed values up to 33 bytes, instead of up to 97 bytes. Many string values fall in the range 34-97 bytes, so it a little sad that we can't apply this memory saving to these values. Hopefully we can solve the performence issue in the future and increase the threshold later. |
…ey-io#2516) Achieved 20-30% reduction in overhead. This was accomplished by reusing the 8B robj->ptr memory to store embedded data. For 16B key and 16B value, this reduces the per-item memory overhead by ~20%. Overall performance was not measurably changed. Raising the threshold for embedding strings produces ~30% reduction in per-item overhead for affected value sizes. Here's what I did: 1. Modify all `robj->ptr` access to use get/set methods. Most of this was done with a script (code listed below), with a few manual touches. Manual and programmatic changes are in separate commits to make review easier. 2. Next I changed `objectGetVal()` to calculate the location instead of using `robj->ptr`, and modified the embedding code to start writing embedded data 8B earlier, overwriting the ptr location. I changed `objectSetVal()` to assert that no value is embedded - all of the code that assigned `o->ptr` would have been incorrect for embedded strings anyway. (Except for one instance in module.c which I made a separate method for.) --------- Signed-off-by: Rain Valentine <rsg000@gmail.com>
…urn type (#3184) This fixes a regression where PFADD returned 1 instead of INVALIDOBJ for corrupted sparse HLL payloads. In small strings refactor (#2516) we incorrectly changed the datatype from int to bool. That function still uses -1 as its error signal. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
…urn type (valkey-io#3184) This fixes a regression where PFADD returned 1 instead of INVALIDOBJ for corrupted sparse HLL payloads. In small strings refactor (valkey-io#2516) we incorrectly changed the datatype from int to bool. That function still uses -1 as its error signal. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>


Achieved 20-30% reduction in overhead.
This was accomplished by reusing the 8B robj->ptr memory to store embedded data. For 16B key and 16B value, this reduces the per-item memory overhead by ~20%. Overall performance was not measurably changed. Raising the threshold for embedding strings produces ~30% reduction in per-item overhead for affected value sizes.
Here's what I did:
robj->ptraccess to use get/set methods. Most of this was done with a script (code listed below), with a few manual touches. Manual and programmatic changes are in separate commits to make review easier.objectGetVal()to calculate the location instead of usingrobj->ptr, and modified the embedding code to start writing embedded data 8B earlier, overwriting the ptr location. I changedobjectSetVal()to assert that no value is embedded - all of the code that assignedo->ptrwould have been incorrect for embedded strings anyway. (Except for one instance in module.c which I made a separate method for.)Memory efficiency comparison:

Performance is not significantly changed (0-2% faster):

For each test I collected 65 minutes of data and discarded the first 5 minutes as warmup. I did not use cluster mode, iothreads=1, pipelining=100, and key size was 16B. Note that for 64B values valkey:unstable did not embed and my branch did embed because I raised the size threshold.
Appendix 1 - refactoring script
Script output (I addressed all of these manually in the following manual commit):
Appendix 2 - micro benchmarking results
Appendix 3 - rebasing
I cherry picked and merged each commit in sequence. For the programmatic commit I regenerated it instead of cherry picking. (So it contains no manual edits still.)