More profiler pprof encoding performance improvements #1512

ivoanjo · 2021-05-21T13:58:12Z

A second round of performance improvements to pprof encoding. Compared to dd-trace-rb 0.49.0, there's a 3.7x speedup, or 2.27x speedup when comparing with current master.

This takes care of the low-hanging fruit I could identify in pprof encoding.

exporter objectset-yield
                         10.028  (± 0.0%) i/s -    101.000  in  10.079207s

Comparison:
exporter objectset-yield:       10.0 i/s
exporter locations-avoid-method-missing:        9.9 i/s - 1.01x  (± 0.00) slower
exporter locations-hashmap:        9.2 i/s - 1.09x  (± 0.00) slower
exporter cached-build-location-v2 (current master):        4.4 i/s - 2.30x  (± 0.00) slower
exporter baseline (before any work):        2.7 i/s - 3.72x  (± 0.00) slower

…hash One of the things I've learned since I started doing some performance work in the profiler, is how expensive is to hash arrays of items, as `MessageSet` (`ObjectSet`) does. While looking for improvements to pprof encoding, I realized that the locations `MessageSet` was implementing a hashmap of `Profiling::BacktraceLocation`s to pprof `Location` objects in a rather roundabout way -- it used the full contents of the `BacktraceLocation` objects as a key, rather than the object itself. By replacing this arrangement with a regular Ruby hash that used the `BacktraceLocation` as a key, I was able to quite improve the performance of pprof encoding. In the <benchmarks/profiler_submission.rb> benchmark (Ruby 2.7.3/macOS 10.15.7/i7-1068NG7) I'm getting a > 2x speedup when compared with current master (or 3.4x speedup over dd-trace-rb 0.49.0): ``` exporter locations-hashmap 9.241 (± 0.0%) i/s - 93.000 in 10.081019s Comparison: exporter locations-hashmap: 9.2 i/s exporter cached-build-location-v2 (current master): 4.4 i/s - 2.12x (± 0.00) slower exporter baseline (before any work): 2.7 i/s - 3.42x (± 0.00) slower ```

When trying to read values out of protobuf objects, there are no regular getters -- only `[]` and `method_missing`. Using `location['id']` instead of `location.id` yields a small but noticeable speedup in the <benchmarks/profiler_submission.rb> benchmark (Ruby 2.7.3/macOS 10.15.7/i7-1068NG7): ``` exporter locations-avoid-method-missing 9.944 (± 0.0%) i/s - 100.000 in 10.065687s Comparison: exporter locations-avoid-method-missing: 9.9 i/s exporter locations-hashmap: 9.2 i/s - 1.08x (± 0.00) slower exporter cached-build-location-v2 (current master): 4.4 i/s - 2.28x (± 0.00) slower exporter baseline (before any work): 2.7 i/s - 3.68x (± 0.00) slower ```

Performance improvement is negligible, but why not: ``` exporter objectset-yield 10.028 (± 0.0%) i/s - 101.000 in 10.079207s Comparison: exporter objectset-yield: 10.0 i/s exporter locations-avoid-method-missing: 9.9 i/s - 1.01x (± 0.00) slower exporter locations-hashmap: 9.2 i/s - 1.09x (± 0.00) slower exporter cached-build-location-v2 (current master): 4.4 i/s - 2.30x (± 0.00) slower exporter baseline (before any work): 2.7 i/s - 3.72x (± 0.00) slower ```

ivoanjo · 2021-05-24T08:27:55Z

I've benchmarked these improvements (together with my earlier PR #1511) and their impact on p100 latency on discourse. On this benchmark, pprof encoding was having a non-trivial impact on latency. The spikes you see on the left are once per minute -- the request(s) that are concurrent with pprof encoding.

With this PR + #1511 (right), there still are measurable latency spikes, but the magnitude of spikes is very reduced. (Datadog-internal links to full results: before, after).

marcotc · 2021-05-21T15:18:48Z

lib/ddtrace/profiling/pprof/builder.rb

-              "#{omitted} #{desc}",
-              &@build_location
-            )
+            locations << @locations[Profiling::BacktraceLocation.new(''.freeze, 0, "#{omitted} #{desc}")]


I recommend trying to remove #freeze, and instead adding the 'freeze string literals' magic comment atop the file.

My experience is that the magic comment is measurably faster than invoking #freeze, due to its compilation to a specialized instruction.

I understand that this only works with 2.3+, but I don't think it makes sense to penalize the performance for 2.3+ users, and also it's too hard to maintain two version of this method (one with #freeze and the other without).

Changed in c01d6dd . At least for the empty string, there's no change in the benchmark

exporter inline-freeze 9.208 (± 0.0%) i/s - 92.000 in 10.001706s exporter pragma-freeze 9.181 (± 0.0%) i/s - 92.000 in 10.030500s Comparison: exporter inline-freeze: 9.2 i/s exporter pragma-freeze: 9.2 i/s - 1.00x (± 0.00) slower

but I don't mind either :)

Seems to not make any difference in the <benchmarks/profiler_submission.rb> benchmark (Ruby 2.7.3/macOS 10.15.7/i7-1068NG7): ``` exporter inline-freeze 9.208 (± 0.0%) i/s - 92.000 in 10.001706s exporter pragma-freeze 9.181 (± 0.0%) i/s - 92.000 in 10.030500s Comparison: exporter inline-freeze: 9.2 i/s exporter pragma-freeze: 9.2 i/s - 1.00x (± 0.00) slower ``` but I like to use the more modern approach anyway.

ivoanjo added 4 commits May 21, 2021 11:58

Remove unneeded Rubocop 2.0-incompatible rules

1fb9c49

ivoanjo requested a review from a team May 21, 2021 13:58

marcotc reviewed May 25, 2021

View reviewed changes

ivoanjo force-pushed the ivoanjo/more-pprof-performance-improvements branch from c01d6dd to 7c29190 Compare May 26, 2021 09:34

marcotc approved these changes May 26, 2021

View reviewed changes

ivoanjo merged commit 22ff990 into master May 27, 2021

ivoanjo deleted the ivoanjo/more-pprof-performance-improvements branch May 27, 2021 07:47

github-actions bot added this to the 0.50.0 milestone May 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More profiler pprof encoding performance improvements #1512

More profiler pprof encoding performance improvements #1512

ivoanjo commented May 21, 2021

ivoanjo commented May 24, 2021

marcotc May 21, 2021

ivoanjo May 26, 2021 •

edited

Loading

More profiler pprof encoding performance improvements #1512

More profiler pprof encoding performance improvements #1512

Conversation

ivoanjo commented May 21, 2021

ivoanjo commented May 24, 2021

marcotc May 21, 2021

Choose a reason for hiding this comment

ivoanjo May 26, 2021 • edited Loading

Choose a reason for hiding this comment

ivoanjo May 26, 2021 •

edited

Loading