Skip to content

http trace: improve perf to avoid str copy when building response #39541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 27, 2025

Conversation

botengyao
Copy link
Member

@botengyao botengyao commented May 20, 2025

Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:

botengyao added 2 commits May 20, 2025 04:26
Signed-off-by: Boteng Yao <boteng@google.com>
Signed-off-by: Boteng Yao <boteng@google.com>
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #39541 was opened by botengyao.

see: more, trace.

Signed-off-by: Boteng Yao <boteng@google.com>
@botengyao
Copy link
Member Author

/retest

@botengyao botengyao marked this pull request as ready for review May 20, 2025 17:27
@botengyao
Copy link
Member Author

/assign-from @envoyproxy/envoy-maintainers

Copy link

@envoyproxy/envoy-maintainers assignee is @ravenblackx

🐱

Caused by: a #39541 (comment) was created by @botengyao.

see: more, trace.

Comment on lines 48 to 63
switch (code) {
case 200:
return HttpResponseCode200;
case 404:
return HttpResponseCode404;
case 500:
return HttpResponseCode500;
case 502:
return HttpResponseCode502;
case 503:
return HttpResponseCode503;
default:
// Only allocate if code is uncommon
out_buffer = std::to_string(code);
return out_buffer;
}
Copy link
Contributor

@ravenblackx ravenblackx May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to see a benchmark to justify this. I would expect std::string wouldn't actually do any allocations during this because std::string has an internal buffer for short strings, and these are all short strings.
(Existing conversation on this topic.)
Edit: realizing the thing is "to avoid str copy" not "to avoid allocations", but is a 6-branch switch actually faster than a 4-byte copy or a number-format? Might be heavier on branch prediction misses. Main concern is it's probably not that different, and it's a bunch of extra code and special cases and weirdness of dual return values, that seems unlikely to be worth it. And the original perf comment was "avoid string creations".

Edit2: another possible contender for a good performance/simplicity balance might be still returning std::string, with a single special case for 200 (return "200") to make it just a memcpy rather than a number-to-string-conversion in the hottest path. But a quick benchmark there suggests even without a branch, to_string is actually faster than a simple copy! Even with a string_view input.

Edit3: those benchmarks were over-optimized, this is more realistic, so to_string is indeed slightly worse, and both are doing some amount of operation. I don't think there's a good way to judge the real impact short of benchmarking on the actual function. I think my preference would be to balance readability and performance by doing only the "hottest path" optimization, inline, rather than having the dual-buffer abstraction that causes cognitive overhead about buffer lifespan wrt. string_view.

i.e. get rid of buildResponseCode entirely, and just have in the main function

if (!stream_info.responseCode()) {
  span.setTag(Tracing::Tags::get().HttpStatusCode, HttpResponseCode0);
} else {
  const uint16_t code = stream_info.responseCode().value();
  if (code == 200) {
    // Optimization for the hottest path to avoid string conversion.
    span.setTag(Tracing::Tags::get().HttpStatusCode, HttpResponseCode200);
  } else {
    span.setTag(Tracing::Tags::get().HttpStatusCode, std::to_string(code));
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, if you want to keep it in a helper function, you could avoid the buffer lifespan horridness by making it a static void setSpanStatusCode(Span& span, const StreamInfo& info)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ravenblackx for all the thoughts! Yes, agree, I was doing some micro benchmarks in quick-bench, and the multiple branch with miss is indeed much worse. Changed to the if else approach with only 0 and 200.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamInfo arg needs namespacing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also interesting is how extremely differently these benchmarks play out if you change which STL is in use - not calling to_string is way more valuable against libc++. And to really benchmark it you'd want to have like 80% 200s and 20% mixed other things in a single benchmark run, to account for CPU predictive adaptation, haha.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I’ve seen std::to_string hit significantly harder on libc++ too. I think it mainly comes down to how libc++ handles small string optimization and heap allocations. And you are right for the prediction set-up.

Signed-off-by: Boteng Yao <boteng@google.com>
Signed-off-by: Boteng Yao <boteng@google.com>
Signed-off-by: Boteng Yao <boteng@google.com>
ravenblackx
ravenblackx previously approved these changes May 23, 2025
Signed-off-by: Boteng Yao <boteng@google.com>
@botengyao
Copy link
Member Author

/retest

@botengyao botengyao merged commit d741713 into envoyproxy:main May 27, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants