http trace: improve perf to avoid str copy when building response #39541

botengyao · 2025-05-20T04:30:55Z

Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:

Signed-off-by: Boteng Yao <boteng@google.com>

repokitteh-read-only · 2025-05-20T04:31:00Z

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #39541 was opened by botengyao.

see: more, trace.

Signed-off-by: Boteng Yao <boteng@google.com>

botengyao · 2025-05-20T12:51:25Z

/retest

botengyao · 2025-05-20T17:30:11Z

/assign-from @envoyproxy/envoy-maintainers

repokitteh-read-only · 2025-05-20T17:30:17Z

@envoyproxy/envoy-maintainers assignee is @ravenblackx

🐱

Caused by: a #39541 (comment) was created by @botengyao.

see: more, trace.

ravenblackx · 2025-05-20T20:48:10Z

source/common/tracing/http_tracer_impl.cc

+  switch (code) {
+  case 200:
+    return HttpResponseCode200;
+  case 404:
+    return HttpResponseCode404;
+  case 500:
+    return HttpResponseCode500;
+  case 502:
+    return HttpResponseCode502;
+  case 503:
+    return HttpResponseCode503;
+  default:
+    // Only allocate if code is uncommon
+    out_buffer = std::to_string(code);
+    return out_buffer;
+  }


Would like to see a benchmark to justify this. I would expect std::string wouldn't actually do any allocations during this because std::string has an internal buffer for short strings, and these are all short strings.
(Existing conversation on this topic.)
Edit: realizing the thing is "to avoid str copy" not "to avoid allocations", but is a 6-branch switch actually faster than a 4-byte copy or a number-format? Might be heavier on branch prediction misses. Main concern is it's probably not that different, and it's a bunch of extra code and special cases and weirdness of dual return values, that seems unlikely to be worth it. And the original perf comment was "avoid string creations".

Edit2: another possible contender for a good performance/simplicity balance might be still returning std::string, with a single special case for 200 (return "200") to make it just a memcpy rather than a number-to-string-conversion in the hottest path. But a quick benchmark there suggests even without a branch, to_string is actually faster than a simple copy! Even with a string_view input.

Edit3: those benchmarks were over-optimized, this is more realistic, so to_string is indeed slightly worse, and both are doing some amount of operation. I don't think there's a good way to judge the real impact short of benchmarking on the actual function. I think my preference would be to balance readability and performance by doing only the "hottest path" optimization, inline, rather than having the dual-buffer abstraction that causes cognitive overhead about buffer lifespan wrt. string_view.

i.e. get rid of buildResponseCode entirely, and just have in the main function

if (!stream_info.responseCode()) { span.setTag(Tracing::Tags::get().HttpStatusCode, HttpResponseCode0); } else { const uint16_t code = stream_info.responseCode().value(); if (code == 200) { // Optimization for the hottest path to avoid string conversion. span.setTag(Tracing::Tags::get().HttpStatusCode, HttpResponseCode200); } else { span.setTag(Tracing::Tags::get().HttpStatusCode, std::to_string(code)); } }

Or, if you want to keep it in a helper function, you could avoid the buffer lifespan horridness by making it a static void setSpanStatusCode(Span& span, const StreamInfo& info)

Thanks @ravenblackx for all the thoughts! Yes, agree, I was doing some micro benchmarks in quick-bench, and the multiple branch with miss is indeed much worse. Changed to the if else approach with only 0 and 200.

StreamInfo arg needs namespacing.

(Also interesting is how extremely differently these benchmarks play out if you change which STL is in use - not calling to_string is way more valuable against libc++. And to really benchmark it you'd want to have like 80% 200s and 20% mixed other things in a single benchmark run, to account for CPU predictive adaptation, haha.)

yea, I’ve seen std::to_string hit significantly harder on libc++ too. I think it mainly comes down to how libc++ handles small string optimization and heap allocations. And you are right for the prediction set-up.

Signed-off-by: Boteng Yao <boteng@google.com>

botengyao · 2025-05-27T02:43:38Z

/retest

botengyao added 2 commits May 20, 2025 04:26

improve perf

0bf2e6d

Signed-off-by: Boteng Yao <boteng@google.com>

fix format

9ea6de2

Signed-off-by: Boteng Yao <boteng@google.com>

use uint16

0f1b3ff

Signed-off-by: Boteng Yao <boteng@google.com>

botengyao marked this pull request as ready for review May 20, 2025 17:27

repokitteh-read-only bot assigned ravenblackx May 20, 2025

ravenblackx reviewed May 20, 2025

View reviewed changes

ravenblackx added the waiting:any label May 20, 2025

avoid branch

eba818e

Signed-off-by: Boteng Yao <boteng@google.com>

repokitteh-read-only bot removed the waiting:any label May 22, 2025

ravenblackx added the waiting:any label May 22, 2025

use direct http status span

7476390

Signed-off-by: Boteng Yao <boteng@google.com>

repokitteh-read-only bot removed the waiting:any label May 22, 2025

fix build

f65d64e

Signed-off-by: Boteng Yao <boteng@google.com>

ravenblackx previously approved these changes May 23, 2025

View reviewed changes

fix build

fee6a2f

Signed-off-by: Boteng Yao <boteng@google.com>

botengyao dismissed ravenblackx’s stale review via fee6a2f May 23, 2025 19:42

ravenblackx approved these changes May 27, 2025

View reviewed changes

botengyao merged commit d741713 into envoyproxy:main May 27, 2025
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

http trace: improve perf to avoid str copy when building response #39541

http trace: improve perf to avoid str copy when building response #39541

Uh oh!

botengyao commented May 20, 2025 •

edited

Loading

Uh oh!

repokitteh-read-only bot commented May 20, 2025

Uh oh!

botengyao commented May 20, 2025

Uh oh!

botengyao commented May 20, 2025

Uh oh!

repokitteh-read-only bot commented May 20, 2025

Uh oh!

ravenblackx May 20, 2025 •

edited

Loading

Uh oh!

ravenblackx May 22, 2025

Uh oh!

botengyao May 22, 2025

Uh oh!

ravenblackx May 23, 2025

Uh oh!

ravenblackx May 23, 2025

Uh oh!

botengyao May 23, 2025

Uh oh!

botengyao commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

http trace: improve perf to avoid str copy when building response #39541

http trace: improve perf to avoid str copy when building response #39541

Uh oh!

Conversation

botengyao commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

repokitteh-read-only bot commented May 20, 2025

Uh oh!

botengyao commented May 20, 2025

Uh oh!

botengyao commented May 20, 2025

Uh oh!

repokitteh-read-only bot commented May 20, 2025

Uh oh!

ravenblackx May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ravenblackx May 22, 2025

Choose a reason for hiding this comment

Uh oh!

botengyao May 22, 2025

Choose a reason for hiding this comment

Uh oh!

ravenblackx May 23, 2025

Choose a reason for hiding this comment

Uh oh!

ravenblackx May 23, 2025

Choose a reason for hiding this comment

Uh oh!

botengyao May 23, 2025

Choose a reason for hiding this comment

Uh oh!

botengyao commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

botengyao commented May 20, 2025 •

edited

Loading

ravenblackx May 20, 2025 •

edited

Loading