Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More profiler pprof encoding performance improvements #1512

Merged
merged 5 commits into from
May 27, 2021

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented May 21, 2021

A second round of performance improvements to pprof encoding. Compared to dd-trace-rb 0.49.0, there's a 3.7x speedup, or 2.27x speedup when comparing with current master.

This takes care of the low-hanging fruit I could identify in pprof encoding.

exporter objectset-yield
                         10.028  (± 0.0%) i/s -    101.000  in  10.079207s

Comparison:
exporter objectset-yield:       10.0 i/s
exporter locations-avoid-method-missing:        9.9 i/s - 1.01x  (± 0.00) slower
exporter locations-hashmap:        9.2 i/s - 1.09x  (± 0.00) slower
exporter cached-build-location-v2 (current master):        4.4 i/s - 2.30x  (± 0.00) slower
exporter baseline (before any work):        2.7 i/s - 3.72x  (± 0.00) slower

…hash

One of the things I've learned since I started doing some performance
work in the profiler, is how expensive is to hash arrays of items,
as `MessageSet` (`ObjectSet`) does.

While looking for improvements to pprof encoding, I realized that the
locations `MessageSet` was implementing a hashmap of
`Profiling::BacktraceLocation`s to pprof `Location` objects in a
rather roundabout way -- it used the full contents of the
`BacktraceLocation` objects as a key, rather than the object itself.

By replacing this arrangement with a regular Ruby hash that used
the `BacktraceLocation` as a key, I was able to quite improve the
performance of pprof encoding.

In the <benchmarks/profiler_submission.rb> benchmark
(Ruby 2.7.3/macOS 10.15.7/i7-1068NG7) I'm getting a > 2x speedup
when compared with current master (or 3.4x speedup over dd-trace-rb 0.49.0):

```
exporter locations-hashmap
                          9.241  (± 0.0%) i/s -     93.000  in  10.081019s

Comparison:
exporter locations-hashmap:        9.2 i/s
exporter cached-build-location-v2 (current master):        4.4 i/s - 2.12x  (± 0.00) slower
exporter baseline (before any work):        2.7 i/s - 3.42x  (± 0.00) slower
```
When trying to read values out of protobuf objects, there are no
regular getters -- only `[]` and `method_missing`.

Using `location['id']` instead of `location.id` yields a small but
noticeable speedup in the <benchmarks/profiler_submission.rb> benchmark
(Ruby 2.7.3/macOS 10.15.7/i7-1068NG7):

```
exporter locations-avoid-method-missing
                          9.944  (± 0.0%) i/s -    100.000  in  10.065687s

Comparison:
exporter locations-avoid-method-missing:        9.9 i/s
exporter locations-hashmap:        9.2 i/s - 1.08x  (± 0.00) slower
exporter cached-build-location-v2 (current master):        4.4 i/s - 2.28x  (± 0.00) slower
exporter baseline (before any work):        2.7 i/s - 3.68x  (± 0.00) slower
```
Performance improvement is negligible, but why not:

```
exporter objectset-yield
                         10.028  (± 0.0%) i/s -    101.000  in  10.079207s

Comparison:
exporter objectset-yield:       10.0 i/s
exporter locations-avoid-method-missing:        9.9 i/s - 1.01x  (± 0.00) slower
exporter locations-hashmap:        9.2 i/s - 1.09x  (± 0.00) slower
exporter cached-build-location-v2 (current master):        4.4 i/s - 2.30x  (± 0.00) slower
exporter baseline (before any work):        2.7 i/s - 3.72x  (± 0.00) slower
```
@ivoanjo ivoanjo requested a review from a team May 21, 2021 13:58
@ivoanjo
Copy link
Member Author

ivoanjo commented May 24, 2021

I've benchmarked these improvements (together with my earlier PR #1511) and their impact on p100 latency on discourse. On this benchmark, pprof encoding was having a non-trivial impact on latency. The spikes you see on the left are once per minute -- the request(s) that are concurrent with pprof encoding.

With this PR + #1511 (right), there still are measurable latency spikes, but the magnitude of spikes is very reduced. (Datadog-internal links to full results: before, after).

Screen Shot 2021-05-24 at 09 23 36

"#{omitted} #{desc}",
&@build_location
)
locations << @locations[Profiling::BacktraceLocation.new(''.freeze, 0, "#{omitted} #{desc}")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend trying to remove #freeze, and instead adding the 'freeze string literals' magic comment atop the file.

My experience is that the magic comment is measurably faster than invoking #freeze, due to its compilation to a specialized instruction.

I understand that this only works with 2.3+, but I don't think it makes sense to penalize the performance for 2.3+ users, and also it's too hard to maintain two version of this method (one with #freeze and the other without).

Copy link
Member Author

@ivoanjo ivoanjo May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in c01d6dd . At least for the empty string, there's no change in the benchmark

    exporter inline-freeze
                              9.208  (± 0.0%) i/s -     92.000  in  10.001706s
    exporter pragma-freeze
                              9.181  (± 0.0%) i/s -     92.000  in  10.030500s

    Comparison:
    exporter inline-freeze:        9.2 i/s
    exporter pragma-freeze:        9.2 i/s - 1.00x  (± 0.00) slower

but I don't mind either :)

Seems to not make any difference in the
<benchmarks/profiler_submission.rb> benchmark
(Ruby 2.7.3/macOS 10.15.7/i7-1068NG7):

```
exporter inline-freeze
                          9.208  (± 0.0%) i/s -     92.000  in  10.001706s
exporter pragma-freeze
                          9.181  (± 0.0%) i/s -     92.000  in  10.030500s

Comparison:
exporter inline-freeze:        9.2 i/s
exporter pragma-freeze:        9.2 i/s - 1.00x  (± 0.00) slower
```

but I like to use the more modern approach anyway.
@ivoanjo ivoanjo force-pushed the ivoanjo/more-pprof-performance-improvements branch from c01d6dd to 7c29190 Compare May 26, 2021 09:34
@ivoanjo ivoanjo merged commit 22ff990 into master May 27, 2021
@ivoanjo ivoanjo deleted the ivoanjo/more-pprof-performance-improvements branch May 27, 2021 07:47
@github-actions github-actions bot added this to the 0.50.0 milestone May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants