-
Notifications
You must be signed in to change notification settings - Fork 5k
http trace: improve perf to avoid str copy when building response #39541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Boteng Yao <boteng@google.com>
Signed-off-by: Boteng Yao <boteng@google.com>
Signed-off-by: Boteng Yao <boteng@google.com>
/retest |
/assign-from @envoyproxy/envoy-maintainers |
@envoyproxy/envoy-maintainers assignee is @ravenblackx |
switch (code) { | ||
case 200: | ||
return HttpResponseCode200; | ||
case 404: | ||
return HttpResponseCode404; | ||
case 500: | ||
return HttpResponseCode500; | ||
case 502: | ||
return HttpResponseCode502; | ||
case 503: | ||
return HttpResponseCode503; | ||
default: | ||
// Only allocate if code is uncommon | ||
out_buffer = std::to_string(code); | ||
return out_buffer; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to see a benchmark to justify this. I would expect std::string wouldn't actually do any allocations during this because std::string has an internal buffer for short strings, and these are all short strings.
(Existing conversation on this topic.)
Edit: realizing the thing is "to avoid str copy" not "to avoid allocations", but is a 6-branch switch actually faster than a 4-byte copy or a number-format? Might be heavier on branch prediction misses. Main concern is it's probably not that different, and it's a bunch of extra code and special cases and weirdness of dual return values, that seems unlikely to be worth it. And the original perf comment was "avoid string creations".
Edit2: another possible contender for a good performance/simplicity balance might be still returning std::string, with a single special case for 200 (return "200"
) to make it just a memcpy rather than a number-to-string-conversion in the hottest path. But a quick benchmark there suggests even without a branch, to_string
is actually faster than a simple copy! Even with a string_view input.
Edit3: those benchmarks were over-optimized, this is more realistic, so to_string is indeed slightly worse, and both are doing some amount of operation. I don't think there's a good way to judge the real impact short of benchmarking on the actual function. I think my preference would be to balance readability and performance by doing only the "hottest path" optimization, inline, rather than having the dual-buffer abstraction that causes cognitive overhead about buffer lifespan wrt. string_view.
i.e. get rid of buildResponseCode
entirely, and just have in the main function
if (!stream_info.responseCode()) {
span.setTag(Tracing::Tags::get().HttpStatusCode, HttpResponseCode0);
} else {
const uint16_t code = stream_info.responseCode().value();
if (code == 200) {
// Optimization for the hottest path to avoid string conversion.
span.setTag(Tracing::Tags::get().HttpStatusCode, HttpResponseCode200);
} else {
span.setTag(Tracing::Tags::get().HttpStatusCode, std::to_string(code));
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, if you want to keep it in a helper function, you could avoid the buffer lifespan horridness by making it a static void setSpanStatusCode(Span& span, const StreamInfo& info)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ravenblackx for all the thoughts! Yes, agree, I was doing some micro benchmarks in quick-bench, and the multiple branch with miss is indeed much worse. Changed to the if else
approach with only 0
and 200
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StreamInfo
arg needs namespacing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also interesting is how extremely differently these benchmarks play out if you change which STL is in use - not calling to_string
is way more valuable against libc++
. And to really benchmark it you'd want to have like 80% 200s and 20% mixed other things in a single benchmark run, to account for CPU predictive adaptation, haha.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I’ve seen std::to_string
hit significantly harder on libc++ too. I think it mainly comes down to how libc++
handles small string optimization and heap allocations. And you are right for the prediction set-up.
Signed-off-by: Boteng Yao <boteng@google.com>
Signed-off-by: Boteng Yao <boteng@google.com>
/retest |
Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes: