Skip to content

Conversation

@hawkw
Copy link
Member

@hawkw hawkw commented Oct 21, 2020

Motivation

We've tried very hard to make sure that entering and exiting spans is
lightweight...in the tracing-core core machinery. Unfortunately, we
haven't done any benchmarking of how subscriber implementations actually
handle enter/exit events. It turns out that in tracing-subscriber's
Registry, there's actually significant overhead for entering a span:
calling span.enter() may take as long as 69 ns (on my machine).

Solution

I've written some microbenchmarks for entering and exiting enabled spans
using tracing-subscriber::fmt, comparing them with the overhead of
calling enter on an enabled span. Based on this, I've made some
performance improvements. These optimizations include:

  • Removing the HashSet that's used for tracking previously entered
    span IDs, in favor of linear search. Span stacks probably never deep
    enough for a hashmap to be faster than iterating over a couple of
    vec indices.
  • Preallocating the vec used for the span stack to hold at least 64
    elements. This means we'll never have a lag spike from reallocating,
    as I think it'll almost never be deeper than 64 IDs.
  • Only cloning/dropping an ID's ref count for the first ID in the stack.

This makes entering and exiting enabled spans significantly faster:
image

It would be nice to continue optimizing this, but this might be about
the best it gets given the other requirements that we're now making
assertions about.

hawkw added 5 commits October 21, 2020 14:40
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw added crate/subscriber Related to the `tracing-subscriber` crate kind/perf Performance improvements labels Oct 21, 2020
@hawkw hawkw requested a review from a team as a code owner October 21, 2020 22:59
@hawkw
Copy link
Member Author

hawkw commented Oct 21, 2020

I'll backport this to v0.1.x when it lands.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Copy link
Member

@davidbarsky davidbarsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how tiny this change is and how large the impact is.

@hawkw hawkw merged commit 67340e9 into master Oct 21, 2020
hawkw added a commit that referenced this pull request Oct 21, 2020
## Motivation

We've tried very hard to make sure that entering and exiting spans is
lightweight...in the `tracing-core` core machinery. Unfortunately, we
haven't done any benchmarking of how subscriber implementations actually
handle enter/exit events. It turns out that in `tracing-subscriber`'s
`Registry`, there's actually significant overhead for entering a span:
calling `span.enter()` may take as long as 69 ns (on my machine).

## Solution

I've written some microbenchmarks for entering and exiting enabled spans
using `tracing-subscriber::fmt`, comparing them with the overhead of
calling `enter` on an enabled span. Based on this, I've made some
performance improvements. These optimizations include:

- Removing the `HashSet` that's used for tracking previously entered
  span IDs, in favor of linear search. Span stacks probably never deep
  enough for a hashmap to be faster than iterating over a couple of
  vec indices.
- Preallocating the vec used for the span stack to hold at least 64
  elements. This means we'll never have a lag spike from reallocating,
  as I think it'll almost never be deeper than 64 IDs.
- Only cloning/dropping an ID's ref count for the _first_ ID in the stack.

This makes entering and exiting enabled spans significantly faster:
![image](https://user-images.githubusercontent.com/2796466/96798681-3fc85000-13b6-11eb-9e85-7602d918ee09.png)

It would be nice to continue optimizing this, but this might be about
the best it gets given the other requirements that we're now making
assertions about.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 22, 2020
…spatch-as-owned-tokio-rs#455

* upstream/master: (34 commits)
  subscriber: remove TraceLogger (tokio-rs#1052)
  subscriber: make Registry::enter/exit much faster (tokio-rs#1058)
  chore(deps): update env_logger requirement from 0.7 to 0.8 (tokio-rs#1050)
  chore: fix tracing-macros::dbg (tokio-rs#1054)
  chore(deps): update pin-project requirement from 0.4 to 1.0 (tokio-rs#1038)
  chore: remove duplicated section from tracing/README.md (tokio-rs#1046)
  opentelemetry: prepare for 0.8.0 release (tokio-rs#1036)
  docs: add favicon for extra pretty docs (tokio-rs#1033)
  subscriber: fix `reload` ergonomics (tokio-rs#1035)
  chore(deps): update crossbeam-channel requirement from 0.4.2 to 0.5.0 (tokio-rs#1031)
  opentelemetry: Assign default ids if missing (tokio-rs#1027)
  chore: remove deprecated add-path from CI (tokio-rs#1026)
  attributes:  fix `#[instrument(err)]` in case of early returns (tokio-rs#1006)
  core: remove mandatory liballoc dependency with no-std (tokio-rs#1017)
  chore(deps): update cfg-if requirement from 0.1.10 to 1.0.0 (tokio-rs#1023)
  subscriber: warn if trying to enable a statically disabled level (tokio-rs#990)
  subscriber: use macros for module declarations (tokio-rs#1009)
  chore: remove `stdlib.rs` (tokio-rs#1008)
  core: fix linked list tests reusing `Registration`s (tokio-rs#1016)
  subscriber: support dash in target names (tokio-rs#1012)
  ...
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 22, 2020
…spatch-as-ref-tokio-rs#455

* upstream/master:
  subscriber: remove TraceLogger (tokio-rs#1052)
  subscriber: make Registry::enter/exit much faster (tokio-rs#1058)
  chore(deps): update env_logger requirement from 0.7 to 0.8 (tokio-rs#1050)
  chore: fix tracing-macros::dbg (tokio-rs#1054)
  chore(deps): update pin-project requirement from 0.4 to 1.0 (tokio-rs#1038)
  chore: remove duplicated section from tracing/README.md (tokio-rs#1046)
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 22, 2020
* upstream/master:
  subscriber: remove TraceLogger (tokio-rs#1052)
  subscriber: make Registry::enter/exit much faster (tokio-rs#1058)
  chore(deps): update env_logger requirement from 0.7 to 0.8 (tokio-rs#1050)
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 22, 2020
* dp-target-is-cow:
  subscriber: remove TraceLogger (tokio-rs#1052)
  subscriber: make Registry::enter/exit much faster (tokio-rs#1058)
  Use impl Into<Cow<'a, str>
  chore(deps): update env_logger requirement from 0.7 to 0.8 (tokio-rs#1050)
  No need for extra lifetime
  Add constructor for dynamic data
  chore: fix tracing-macros::dbg (tokio-rs#1054)
  Feature gate usage of Cow in Metadata
  chore(deps): update pin-project requirement from 0.4 to 1.0 (tokio-rs#1038)
  chore: remove duplicated section from tracing/README.md (tokio-rs#1046)
hawkw added a commit that referenced this pull request Oct 22, 2020
This backports #1058 to v0.1.x. This was already approved on master.

## Motivation

We've tried very hard to make sure that entering and exiting spans is
lightweight...in the `tracing-core` core machinery. Unfortunately, we
haven't done any benchmarking of how subscriber implementations actually
handle enter/exit events. It turns out that in `tracing-subscriber`'s
`Registry`, there's actually significant overhead for entering a span:
calling `span.enter()` may take as long as 69 ns (on my machine).

## Solution

I've written some microbenchmarks for entering and exiting enabled spans
using `tracing-subscriber::fmt`, comparing them with the overhead of
calling `enter` on an enabled span. Based on this, I've made some
performance improvements. These optimizations include:

- Removing the `HashSet` that's used for tracking previously entered
  span IDs, in favor of linear search. Span stacks probably never deep
  enough for a hashmap to be faster than iterating over a couple of
  vec indices.
- Preallocating the vec used for the span stack to hold at least 64
  elements. This means we'll never have a lag spike from reallocating,
  as I think it'll almost never be deeper than 64 IDs.
- Only cloning/dropping an ID's ref count for the _first_ ID in the stack.

This makes entering and exiting enabled spans significantly faster:
![image](https://user-images.githubusercontent.com/2796466/96798681-3fc85000-13b6-11eb-9e85-7602d918ee09.png)

It would be nice to continue optimizing this, but this might be about
the best it gets given the other requirements that we're now making
assertions about.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added a commit that referenced this pull request Oct 22, 2020
Fixed

- **registry**: Fixed `Registry::new` allocating an excessively large
  amount of memory, most of which would never be used ([#1064])

Changed

- **registry**: Improved `new_span` performance by reusing `HashMap`
  allocations for `Extensions` ([#1064])
- **registry**: Significantly improved the performance of
  `Registry::enter` and `Registry::exit` ([#1058])

[#1064]: #1064
[#1058]: #1058
hawkw added a commit that referenced this pull request Oct 22, 2020
Fixed

- **registry**: Fixed `Registry::new` allocating an excessively large
  amount of memory, most of which would never be used ([#1064])

Changed

- **registry**: Improved `new_span` performance by reusing `HashMap`
  allocations for `Extensions` ([#1064])
- **registry**: Significantly improved the performance of
  `Registry::enter` and `Registry::exit` ([#1058])

[#1064]: #1064
[#1058]: #1058
hawkw added a commit to linkerd/linkerd2-proxy that referenced this pull request Oct 22, 2020
This picks up the following upstream changes:

* tokio-rs/tracing#1064, which fixes a bug where creating a subscriber
  would allocate a large amount of memory that was not actually used

* tokio-rs/tracing#1058, which significantly reduces the overhead of
  entering and exiting spans

Together, these changes should improve proxy performance when tracing is
enabled.
olix0r pushed a commit to linkerd/linkerd2-proxy that referenced this pull request Oct 23, 2020
This picks up the following upstream changes:

* tokio-rs/tracing#1064, which fixes a bug where creating a subscriber
  would allocate a large amount of memory that was not actually used

* tokio-rs/tracing#1058, which significantly reduces the overhead of
  entering and exiting spans

Together, these changes should improve proxy performance when tracing is
enabled.
kaffarell pushed a commit to kaffarell/tracing that referenced this pull request May 22, 2024
Fixed

- **registry**: Fixed `Registry::new` allocating an excessively large
  amount of memory, most of which would never be used ([tokio-rs#1064])

Changed

- **registry**: Improved `new_span` performance by reusing `HashMap`
  allocations for `Extensions` ([tokio-rs#1064])
- **registry**: Significantly improved the performance of
  `Registry::enter` and `Registry::exit` ([tokio-rs#1058])

[tokio-rs#1064]: tokio-rs#1064
[tokio-rs#1058]: tokio-rs#1058
@jplatte jplatte deleted the eliza/fmt-span-enter-bench branch June 3, 2025 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crate/subscriber Related to the `tracing-subscriber` crate kind/perf Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants