[Metrics SDK] Performance improvement in measurement processing #1993

lalitb · 2023-02-21T08:13:17Z

This PR attempts to improve the measurement processing throughput - with benchmark as the time taken by SDK to process 1 million measurements with multiple dimensions and high cardinality.

The change is to remove the unnecessary memory copy operations done for each input measurement. The measurement is now copied to OrderedAttributeMap only if it is not already existing in the AttributesHashMap.

Existing flow:
instrument::Add(value, attributes) -> Copy attributes to OrderedAttributeMap -> Calculate hash for attributes in OrderedAttributeMap -> Add attributes to AttributesHashMap if not existing (by comparing hash calculated in step 3) -> Aggregate(value)

New flow:
instrument::Add(value, attributes) -> Calculate hash for attributes in KeyValueIterable -> Copy attributes to OrderedAttributeMap, and then add to AttributesHashMap if not existing (by comparing hash calculated in step 2) -> Aggregate(value)

The processing time for 1 million measurements is reduced from 3.7 secs to 1.9 secs ( on CPU: Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz, 20 processors, Memory: 32GB)

The test results are for below scenario:

Number of dimensions/attributes : 3
Cardinality of each dimension : 10
Total possible dimensions: 10 * 10 * 10 = 1000
Total number of measurements : 1 million

(The benchmark code is added as part of this PR - measurements_benchmark.cc:

Benchmark with this PR changes:

Running ./measurements_benchmark
Run on (20 X 2808.01 MHz CPU s)
CPU Caches:
  L1 Data 32K (x10)
  L1 Instruction 32K (x10)
  L2 Unified 256K (x10)
  L3 Unified 20480K (x1)
Load Average: 8.23, 3.09, 1.54
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_MeasurementsTest 1988578400 ns   1988543700 ns            1

Benchmark with existing code:

2023-02-20 22:23:53
Running ./measurements_benchmark
Run on (20 X 2808.01 MHz CPU s)
CPU Caches:
  L1 Data 32K (x10)
  L1 Instruction 32K (x10)
  L2 Unified 256K (x10)
  L3 Unified 20480K (x1)
Load Average: 2.84, 2.51, 1.45
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_MeasurementsTest 3794032100 ns   3794023000 ns            1

Changes

Please provide a brief description of the changes here.

For significant contributions please make sure you have completed the following items:

CHANGELOG.md updated for non-trivial changes
Unit tests have been added
Changes in public API reviewed

codecov · 2023-02-21T08:32:12Z

Codecov Report

Merging #1993 (7d7755b) into main (649829f) will increase coverage by 0.01%.
The diff coverage is 89.42%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1993      +/-   ##
==========================================
+ Coverage   87.32%   87.32%   +0.01%     
==========================================
  Files         166      166              
  Lines        4673     4723      +50     
==========================================
+ Hits         4080     4124      +44     
- Misses        593      599       +6

Impacted Files	Coverage Δ
...ntelemetry/sdk/metrics/view/attributes_processor.h	`90.00% <50.00%> (-10.00%)`	⬇️
...clude/opentelemetry/sdk/common/attributemap_hash.h	`83.88% <84.22%> (+7.41%)`	⬆️
sdk/src/metrics/state/temporal_metric_storage.cc	`98.31% <87.50%> (+0.06%)`	⬆️
...ntelemetry/sdk/metrics/state/sync_metric_storage.h	`86.37% <88.89%> (-1.13%)`	⬇️
...entelemetry/sdk/metrics/state/attributes_hashmap.h	`95.24% <96.67%> (-0.59%)`	⬇️
...telemetry/sdk/metrics/state/async_metric_storage.h	`86.49% <100.00%> (+0.38%)`	⬆️

…to att-copy-fix

lalitb · 2023-02-22T06:00:29Z

This is ready for review now. @ThomsonTan , @esigo please help in review this PR. We can enable exemplar once this is merged, as exemplar has it's own performance cost.

sdk/include/opentelemetry/sdk/common/attributemap_hash.h

Co-authored-by: Tom Tan <lilotom@gmail.com>

…to att-copy-fix

lalitb · 2023-02-25T07:49:13Z

Added few more changes to reduce the lock contention (by removing AttributeHashMap hash calculation outside of mutex lock). Gained few more seconds, and better scaling with threads:

Number of threads : 1

$ ./measurements_benchmark
2023-02-24 23:09:48
Running ./measurements_benchmark
Run on (20 X 2808.01 MHz CPU s)
CPU Caches:
  L1 Data 32K (x10)
  L1 Instruction 32K (x10)
  L2 Unified 256K (x10)
  L3 Unified 20480K (x1)
Load Average: 0.33, 1.90, 1.20
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_MeasurementsTest 1582235960 ns       104430 ns           10

Number of threads: 5

$ ./measurements_benchmark
2023-02-24 23:15:19
Running ./measurements_benchmark
Run on (20 X 2808.01 MHz CPU s)
CPU Caches:
  L1 Data 32K (x10)
  L1 Instruction 32K (x10)
  L2 Unified 256K (x10)
  L3 Unified 20480K (x1)
Load Average: 0.20, 0.70, 0.87
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_MeasurementsTest  991755670 ns       306610 ns           10

Number of threads: 19

$ ./measurements_benchmark
2023-02-24 23:18:25
Running ./measurements_benchmark
Run on (20 X 2808.01 MHz CPU s)
CPU Caches:
  L1 Data 32K (x10)
  L1 Instruction 32K (x10)
  L2 Unified 256K (x10)
  L3 Unified 20480K (x1)
Load Average: 0.65, 0.67, 0.81
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_MeasurementsTest  926599920 ns       878030 ns           10

So, 1.5 secs to process 1 million measurements from single thread, .98 secs to process 1 million measurements from 5 threads, .92 secs to process 1 million measurements from 19 threads. This is on machine with 20 processors. So, while the library scales well from 1 to 5 threads, it reaches its performance limit beyond that. As valgrind/callgrind cpu profiling result shows, the limit is reached for the synchronized access to std::unordered_map used inside AttributesHashMap across all these threads. I think we can live with current performance for now unless there are any cross-platform lock-free hashmap data structure we can easily used in our code.

sdk/include/opentelemetry/sdk/common/attributemap_hash.h

sdk/include/opentelemetry/sdk/metrics/state/attributes_hashmap.h

lalitb · 2023-03-01T07:41:24Z

The results are more promising while building the application in Release mode (similar results on Windows and Linux).
So to process 1 million measurements:
Time taken (1 thread) : ~130 ms
Time taken (5 threads): ~147 ms
Time taken (19 threads): ~165 ms

Number of threads : 1

$ ./measurements_benchmark
2023-02-28 23:32:31
Running ./measurements_benchmark
Run on (20 X 2808.01 MHz CPU s)
CPU Caches:
  L1 Data 32K (x10)
  L1 Instruction 32K (x10)
  L2 Unified 256K (x10)
  L3 Unified 20480K (x1)
Load Average: 0.16, 0.98, 0.88
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_MeasurementsTest  133399288 ns       132125 ns          100

Number of threads: 5

$ ./measurements_benchmark
2023-02-28 23:36:06
Running ./measurements_benchmark
Run on (20 X 2808.01 MHz CPU s)
CPU Caches:
  L1 Data 32K (x10)
  L1 Instruction 32K (x10)
  L2 Unified 256K (x10)
  L3 Unified 20480K (x1)
Load Average: 0.62, 0.71, 0.78
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_MeasurementsTest  147010309 ns       359316 ns          100

Number of threads: 19

$ ./measurements_benchmark
2023-02-28 23:37:36
Running ./measurements_benchmark
Run on (20 X 2808.01 MHz CPU s)
CPU Caches:
  L1 Data 32K (x10)
  L1 Instruction 32K (x10)
  L2 Unified 256K (x10)
  L3 Unified 20480K (x1)
Load Average: 0.33, 0.60, 0.73
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_MeasurementsTest  165881775 ns      1205977 ns          100

esigo

LGTM
Thanks for the PR :)
sorry for the delay

lalitb added 9 commits February 20, 2023 18:26

fix

625f306

fix

b243da6

fix

5000317

fix

a724644

fix

44f7b44

fix build error

9e80106

fix

635df61

fix

bb8f6b5

fix

8e87139

lalitb requested a review from a team February 21, 2023 08:13

Merge branch 'main' into att-copy-fix

6ec205c

lalitb and others added 10 commits February 21, 2023 08:12

fix

9b3f8cd

Merge branch 'att-copy-fix' of github.com:lalitb/opentelemetry-cpp in…

6dec0c7

…to att-copy-fix

fix

08b0da0

fix

552a7f5

fix

0faff5b

Fix

e4a29f6

fix

11a57c8

fix CI timeout

29f730b

Merge branch 'att-copy-fix' of github.com:lalitb/opentelemetry-cpp in…

025da35

…to att-copy-fix

Merge branch 'main' into att-copy-fix

72da98b

esigo assigned ThomsonTan and esigo Feb 22, 2023

Merge branch 'main' into att-copy-fix

879e0ad

ThomsonTan reviewed Feb 22, 2023

View reviewed changes

sdk/include/opentelemetry/sdk/common/attributemap_hash.h Outdated Show resolved Hide resolved

lalitb and others added 3 commits February 22, 2023 18:51

Update sdk/include/opentelemetry/sdk/common/attributemap_hash.h

5c4daee

Co-authored-by: Tom Tan <lilotom@gmail.com>

improve mutex lock

f76b7c1

Merge branch 'att-copy-fix' of github.com:lalitb/opentelemetry-cpp in…

1fe461d

…to att-copy-fix

lalitb added 3 commits February 24, 2023 16:20

fix

4153560

fix warning

0c0e28f

fix

43b9ea2

lalitb added 2 commits February 24, 2023 23:56

fix

60d6161

fix

d9653a3

esigo added the size/L Denotes a PR that changes 100-499 lines. label Feb 26, 2023

ThomsonTan reviewed Feb 27, 2023

View reviewed changes

sdk/include/opentelemetry/sdk/common/attributemap_hash.h Outdated Show resolved Hide resolved

ThomsonTan reviewed Feb 27, 2023

View reviewed changes

sdk/include/opentelemetry/sdk/common/attributemap_hash.h Show resolved Hide resolved

ThomsonTan reviewed Feb 27, 2023

View reviewed changes

sdk/include/opentelemetry/sdk/metrics/state/attributes_hashmap.h Outdated Show resolved Hide resolved

fix

4f227af

lalitb mentioned this pull request Feb 27, 2023

Prepare release v.1.9.0 (tentative: 3rd March) #2007

Closed

lalitb added this to the OpenTelemetry C++ Release v1.9.2 (Feb Release) milestone Feb 27, 2023

Merge branch 'main' into att-copy-fix

7fccdc0

esigo approved these changes Mar 3, 2023

View reviewed changes

esigo added the ok-to-merge The PR is ok to merge (has two approves or raised by a maintainer/approver and has one approve) label Mar 3, 2023

lalitb added 2 commits March 3, 2023 15:34

Merge branch 'main' into att-copy-fix

ea4cac4

Merge branch 'main' into att-copy-fix

7d7755b

lalitb enabled auto-merge (squash) March 4, 2023 04:11

lalitb merged commit da333f8 into open-telemetry:main Mar 4, 2023

alangy98 mentioned this pull request Sep 12, 2024

Hash collision risk of metric data aggregation #3060

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metrics SDK] Performance improvement in measurement processing #1993

[Metrics SDK] Performance improvement in measurement processing #1993

lalitb commented Feb 21, 2023 •

edited

Loading

codecov bot commented Feb 21, 2023 •

edited

Loading

lalitb commented Feb 22, 2023

lalitb commented Feb 25, 2023 •

edited

Loading

lalitb commented Mar 1, 2023

esigo left a comment

[Metrics SDK] Performance improvement in measurement processing #1993

[Metrics SDK] Performance improvement in measurement processing #1993

Conversation

lalitb commented Feb 21, 2023 • edited Loading

Changes

codecov bot commented Feb 21, 2023 • edited Loading

Codecov Report

lalitb commented Feb 22, 2023

lalitb commented Feb 25, 2023 • edited Loading

lalitb commented Mar 1, 2023

esigo left a comment

Choose a reason for hiding this comment

lalitb commented Feb 21, 2023 •

edited

Loading

codecov bot commented Feb 21, 2023 •

edited

Loading

lalitb commented Feb 25, 2023 •

edited

Loading