-
Notifications
You must be signed in to change notification settings - Fork 667
iobuf::append
improvements part 2
#25540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
iobuf::append
improvements p2iobuf::append
improvements part 2
} | ||
|
||
append(std::move(f).unoptimized_release()); | ||
append(f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that since we're not trimming f
here test_next_chunk_allocation_append_iobuf
will fail as is since the appended data will now map to 322 fragments instead of 323. If we trim f
the test will pass and the behavior will be identical to what it is today. However, I really don't think that behavior is optimal since if we are taking ownership of the fragments we may as well use any free space rather than just trimming it away. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes we should change the test instead of realloc here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brandon do you know why we were trimming before here? Seems weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the trimming was ever intentional. However, before this PR appending an existing io_fragment
to an iobuf
would entail;
- The underlying
temporary_buffer
in theio_fragment
to be appended was shared up to the used bytes. This is basically what is acting as a trim. - A new
io_fragment
was allocated and given the sharedtemporary_buffer
. - This new fragment was appended to the
iobuf
CI test resultstest results on build#63618
test results on build#64009
test results on build#64059
|
It seems to me that the iobuf microbench space is too small for all the various cases that exist to append things to an iobuf. Feels like we are potentially missing out on cases. |
This is selected to backport to 25.1.x? Is that right? |
@@ -251,6 +251,7 @@ class iobuf { | |||
size_t last_allocation_size() const; | |||
|
|||
bool try_copy_append(const char* ptr, size_t size); | |||
void append(fragment*); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing comment, especially necessary here as there is a raw pointer involved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will add one.
No, I believe we decide against backporting it in our last meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM outside of the comment about docs. Not sure if it was discussed out of band about adding more benchmark patterns per Stephan's comment
src/v/bytes/iobuf.h
Outdated
} | ||
o.clear(); | ||
o._size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This drops the the update_generation()
call which we shouldn't do (I guess there is no meta test for this since one didn't fail here).
Is this noticeably faster than clear()
? If so maybe we just add the update_generation()
call back in explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check if it's any faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The effects appear to be negligible, will switch back to using iobuf::clear()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems more robust to future changes also.
Instead of allocating a new fragment this PR re-uses the allocated fragments from the iobuf being appended. This reduces the iobuf::append benchmark from; | test | iterations | median | mad | min | max | allocs | tasks | inst | | - | - | - | - | - | - | - | - | - | | iobuf.append_bench_small | 33010000 | 8.845ns | 0.009ns | 8.834ns | 8.879ns | 0.008 | 0.000 | 180.1 | | iobuf.append_bench_medium | 9000 | 2.321us | 15.924ns | 2.305us | 2.595us | 0.627 | 0.000 | 2257.0 | | iobuf.append_bench_large | 1000 | 7.376us | 32.411ns | 7.344us | 7.454us | 4.000 | 0.000 | 6345.5 | To; | test | iterations | median | mad | min | max | allocs | tasks | inst | | - | - | - | - | - | - | - | - | - | | iobuf.append_bench_small | 32347000 | 8.838ns | 0.005ns | 8.829ns | 8.854ns | 0.008 | 0.000 | 174.1 | | iobuf.append_bench_medium | 9000 | 2.151us | 3.486ns | 2.132us | 2.269us | 0.626 | 0.000 | 2251.5 | | iobuf.append_bench_large | 1000 | 7.313us | 13.044ns | 7.240us | 7.368us | 1.750 | 0.000 | 6039.5 |
iobuf::append(fragment) already handles trimming the last fragment before appending. Therefore there is no need to repeat this work in iobuf::append(iobuf&&) or iobuf::append(temporary_buffer). This reduces the append microbench from; | test | iterations | median | mad | min | max | allocs | tasks | inst | | - | - | - | - | - | - | - | - | - | | iobuf.append_bench_small | 32347000 | 8.838ns | 0.005ns | 8.829ns | 8.854ns | 0.008 | 0.000 | 174.1 | | iobuf.append_bench_medium | 9000 | 2.151us | 3.486ns | 2.132us | 2.269us | 0.626 | 0.000 | 2251.5 | | iobuf.append_bench_large | 1000 | 7.313us | 13.044ns | 7.240us | 7.368us | 1.750 | 0.000 | 6039.5 | To; | test | iterations | median | mad | min | max | allocs | tasks | inst | | - | - | - | - | - | - | - | - | - | | iobuf.append_bench_small | 31733000 | 8.226ns | 0.003ns | 8.217ns | 8.253ns | 0.008 | 0.000 | 160.1 | | iobuf.append_bench_medium | 9000 | 2.146us | 18.402ns | 2.124us | 2.247us | 0.626 | 0.000 | 2237.9 | | iobuf.append_bench_large | 1000 | 7.346us | 14.547ns | 7.320us | 7.362us | 1.750 | 0.000 | 6017.9 |
66e0b98
to
e67f565
Compare
This is true (that we lack test cases), but is it a problem for this change? It looks to me like this isn't really a "tradeoff" type change, but just strictly better. Are you thinking about the change from trim to no-trim in |
I mean I am no expert in this code but that's immediately obvious from the change? Especially given that with all the iobuf stuff we have run into a lot of unexpected things already. Also more fundamentally I think we should write the benchmarks/tests first before we start optimizing things. |
Was chatting with Stephan about these points in DMs. My understanding is as follows;
|
/backport v25.1.x |
Changes
iobuf::append(iobuf&&)
to reduce fragment allocations by re-using fragments from the moved iobuf. Also reduces the number of duplicate checks. See commit messages for benchmarking results.Backports Required
Release Notes