Skip to content

Conversation

dashpole
Copy link
Contributor

@dashpole dashpole commented Aug 14, 2025

Related: #7037

Currently, when we record a measurement, we Lock a single lock (non-rw) for the whole instrument, and then we update the value and record exemplars.

This PR uses a RWMutex for the map[attribute]->value, and splits this into 4 lock stages to reduce contention:

  • Use a RW lock and RLock when looking up the value for a measurement.
  • Lock when creating the value when we see an attribute set for the first time (rare).
  • Use atomics to update values.
  • Aquire a lock for exemplars only when the exemplar is not filtered out.

It also borrows an optimization from the prometheus client, since incrementing an integer is possible with atomics, but floats can't be incremented atomically as easily. See https://github.com/prometheus/client_golang/blob/14ccb93091c00f86b85af7753100aa372d63602b/prometheus/counter.go#L108. This is added in atomic.go.

I experimented with using atomic.Pointer to be able to use CompareAndSwap with histogram and exponential histogram types, but found that the result was slower than locking.

Multi-threaded benchmarks:

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
                                                                │   main.txt    │               new.txt               │
                                                                │    sec/op     │    sec/op     vs base               │
SyncMeasure/NoView/Int64Counter/Attributes/0-24                    307.3n ± 23%   106.6n ±  6%  -65.31% (p=0.002 n=6)
SyncMeasure/NoView/Int64Counter/Attributes/1-24                    354.5n ± 11%   101.9n ± 13%  -71.24% (p=0.002 n=6)
SyncMeasure/NoView/Int64Counter/Attributes/10-24                   372.6n ± 10%   101.8n ± 27%  -72.67% (p=0.002 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/0-24                  310.7n ± 32%   102.2n ± 17%  -67.10% (p=0.002 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/1-24                  323.2n ± 18%   102.4n ±  5%  -68.31% (p=0.002 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/10-24                 315.5n ± 12%   103.4n ±  3%  -67.24% (p=0.002 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/0-24              271.6n ± 14%   103.4n ±  7%  -61.94% (p=0.002 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/1-24              289.8n ±  3%   106.5n ±  9%  -63.25% (p=0.002 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/10-24             312.7n ±  9%   103.8n ±  2%  -66.79% (p=0.002 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/0-24            302.2n ± 12%   102.8n ±  5%  -66.00% (p=0.002 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/1-24            291.1n ± 22%   104.8n ± 15%  -63.99% (p=0.002 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/10-24          314.85n ± 11%   99.79n ±  3%  -68.31% (p=0.002 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/0-24                  295.8n ± 18%   124.5n ± 19%  -57.93% (p=0.002 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/1-24                  294.6n ± 11%   110.5n ±  6%  -62.48% (p=0.002 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/10-24                 277.5n ±  7%   142.1n ±  6%  -48.78% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/0-24                311.9n ± 18%   127.4n ±  3%  -59.17% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/1-24                317.2n ± 14%   108.5n ±  3%  -65.80% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/10-24               292.2n ±  5%   141.9n ±  4%  -51.43% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialInt64Histogram/Attributes/0-24       337.4n ± 12%   212.4n ±  4%  -37.05% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialInt64Histogram/Attributes/1-24       379.4n ± 19%   158.0n ±  4%  -58.36% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialInt64Histogram/Attributes/10-24      365.6n ± 15%   207.6n ±  5%  -43.23% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/0-24     324.2n ± 16%   154.5n ± 29%  -52.34% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/1-24     325.3n ±  5%   209.9n ±  3%  -35.47% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/10-24    397.4n ±  8%   152.8n ±  8%  -61.54% (p=0.002 n=6)
geomean                                                            318.7n         124.6n        -60.91%

Single-threaded benchmarks:

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
                                                             │   main1.txt   │               new1.txt               │
                                                             │    sec/op     │    sec/op      vs base               │
SyncMeasure/NoView/Int64Counter/Attributes/0                    98.53n ±  1%    93.79n ±  2%   -4.82% (p=0.002 n=6)
SyncMeasure/NoView/Int64Counter/Attributes/1                    103.9n ±  0%    101.2n ± 67%        ~ (p=0.372 n=6)
SyncMeasure/NoView/Int64Counter/Attributes/10                   105.0n ±  1%    102.0n ±  2%   -2.90% (p=0.002 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/0                  98.12n ±  2%    90.41n ±  9%   -7.86% (p=0.026 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/1                 107.00n ±  1%    97.45n ±  2%   -8.93% (p=0.002 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/10                106.10n ± 80%    96.66n ±  1%   -8.89% (p=0.002 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/0              97.47n ±  8%    93.42n ± 12%        ~ (p=0.180 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/1              104.5n ±  2%    100.7n ± 15%        ~ (p=0.065 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/10             105.4n ±  1%    101.0n ±  1%   -4.18% (p=0.002 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/0            97.87n ±  1%    89.00n ± 10%   -9.06% (p=0.015 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/1           106.50n ±  8%    99.08n ± 11%   -6.97% (p=0.026 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/10          106.25n ±  2%    97.79n ±  1%   -7.97% (p=0.002 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/0                  88.89n ±  2%   124.10n ±  2%  +39.62% (p=0.002 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/1                  96.98n ±  1%   135.25n ±  2%  +39.46% (p=0.002 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/10                 97.39n ±  1%   132.70n ±  1%  +36.26% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/0                89.51n ±  4%   121.20n ±  2%  +35.40% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/1                95.25n ±  1%   128.00n ±  2%  +34.38% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/10               95.38n ±  1%   128.75n ± 10%  +34.99% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialInt64Histogram/Attributes/0       141.3n ±  7%    173.3n ±  8%  +22.60% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialInt64Histogram/Attributes/1       150.7n ±  1%    184.6n ±  4%  +22.50% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialInt64Histogram/Attributes/10      151.4n ±  1%    189.5n ±  1%  +25.13% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/0     139.4n ±  1%    194.3n ±  2%  +39.33% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/1     150.1n ± 11%    182.3n ±  2%  +21.49% (p=0.002 n=6)
SyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/10    150.1n ±  2%    179.4n ±  1%  +19.52% (p=0.002 n=6)
geomean                                                         110.0n          121.9n        +10.81%

@dashpole dashpole changed the title Optimize locking for aggregations Optimize locking for metric aggregations Aug 14, 2025
Copy link

codecov bot commented Aug 14, 2025

Codecov Report

❌ Patch coverage is 88.84615% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.5%. Comparing base (b866e36) to head (9457877).
⚠️ Report is 40 commits behind head on main.

Files with missing lines Patch % Lines
...metric/internal/aggregate/exponential_histogram.go 83.3% 9 Missing and 2 partials ⚠️
sdk/metric/internal/aggregate/atomic.go 88.4% 4 Missing and 2 partials ⚠️
sdk/metric/internal/aggregate/histogram.go 92.5% 3 Missing and 1 partial ⚠️
sdk/metric/internal/aggregate/lastvalue.go 88.8% 3 Missing and 1 partial ⚠️
sdk/metric/internal/aggregate/sum.go 90.6% 3 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff           @@
##            main   #7189    +/-   ##
======================================
  Coverage   85.4%   85.5%            
======================================
  Files        271     275     +4     
  Lines      24391   24763   +372     
======================================
+ Hits       20845   21181   +336     
- Misses      3168    3197    +29     
- Partials     378     385     +7     
Files with missing lines Coverage Δ
...dk/metric/internal/aggregate/filtered_reservoir.go 100.0% <100.0%> (ø)
sdk/metric/internal/aggregate/limit.go 100.0% <100.0%> (ø)
sdk/metric/internal/aggregate/histogram.go 97.6% <92.5%> (-2.4%) ⬇️
sdk/metric/internal/aggregate/lastvalue.go 96.9% <88.8%> (-3.1%) ⬇️
sdk/metric/internal/aggregate/sum.go 97.9% <90.6%> (-2.1%) ⬇️
sdk/metric/internal/aggregate/atomic.go 88.4% <88.4%> (ø)
...metric/internal/aggregate/exponential_histogram.go 95.2% <83.3%> (-1.4%) ⬇️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dashpole
Copy link
Contributor Author

dashpole commented Aug 15, 2025

Don't review this yet. I've tested it with #7175, and the gains seem much less after that optimization. I'll retest after #7175 lands.

The benchmarks in the description now are the improvement on top of the change to attribute.Distinct.

@dashpole dashpole force-pushed the optimize_parallel branch 6 times, most recently from 60c1c88 to 4adb38d Compare September 3, 2025 19:14
dmathieu pushed a commit that referenced this pull request Sep 8, 2025
This is helpful for my work optimizing locking in
#7189

```
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/0-24         	   43419	       306.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/0-24         	   40675	       316.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/0-24         	   53746	       305.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/0-24         	   38224	       296.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/0-24         	   53796	       330.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/0-24         	   47823	       316.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/1-24         	   32172	       373.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/1-24         	   33084	       379.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/1-24         	   29956	       360.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/1-24         	   33921	       370.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/1-24         	   34449	       387.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/1-24         	   29156	       393.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/10-24        	   10768	      1040 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/10-24        	   11190	      1056 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/10-24        	   11078	      1176 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/10-24        	   10929	      1219 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/10-24        	   10454	      1066 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/NoView/ExponentialFloat64Histogram/Attributes/10-24        	   10501	      1064 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/0-24       	 5919898	         1.879 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/0-24       	 5834685	         1.990 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/0-24       	 5782334	         2.433 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/0-24       	 5320924	         2.404 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/0-24       	 5742800	         2.109 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/0-24       	 5885475	         2.056 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/1-24       	 4331547	         2.758 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/1-24       	 3848244	         2.725 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/1-24       	 4542189	         2.636 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/1-24       	 4201015	         2.641 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/1-24       	 4597094	         2.608 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/1-24       	 4869811	         2.491 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/10-24      	 4826077	         2.329 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/10-24      	 4866612	         2.382 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/10-24      	 5149305	         2.310 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/10-24      	 5101305	         2.384 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/10-24      	 4854940	         2.440 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/DropView/ExponentialFloat64Histogram/Attributes/10-24      	 4874742	         2.501 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/0-24 	   55272	       359.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/0-24 	   52195	       318.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/0-24 	   52324	       330.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/0-24 	   49879	       321.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/0-24 	   56212	       351.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/0-24 	   43362	       301.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/1-24 	   30296	       461.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/1-24 	   24178	       469.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/1-24 	   24901	       477.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/1-24 	   21748	       477.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/1-24 	   23972	       512.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/1-24 	   28039	       499.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/10-24         	    7598	      1646 ns/op	     704 B/op	       2 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/10-24         	    7452	      1660 ns/op	     704 B/op	       2 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/10-24         	    7620	      1644 ns/op	     704 B/op	       2 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/10-24         	    6459	      1853 ns/op	     704 B/op	       2 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/10-24         	    7950	      1575 ns/op	     704 B/op	       2 allocs/op
BenchmarkSyncMeasure/AttrFilterView/ExponentialFloat64Histogram/Attributes/10-24         	    8948	      1421 ns/op	     704 B/op	       2 allocs/op
```
@MrAlias MrAlias added this to the v1.39.0 milestone Sep 11, 2025
@dashpole
Copy link
Contributor Author

We could also look into adopting https://www.youtube.com/watch?v=VmrEG-3bWyM&t=253s.

@dashpole dashpole force-pushed the optimize_parallel branch 2 times, most recently from dd1c720 to fd8961b Compare September 17, 2025 19:52
@dashpole
Copy link
Contributor Author

dashpole commented Sep 17, 2025

I'm mostly happy with the current results. The single-threaded performance gets worse for histograms and exp histograms, but it seems worth it overall. In an ideal world, we would implement "lockless" histograms and exponential histograms like Prometheus does, but that is hard... Using RLock() for the common case seems like a reasonable first approach.

The main blocker is that we don't have good concurrency testing around aggregations. I'll add that in a separate PR before I mark this ready for review.

I'll also try and pull some of the small refactor changes out to be reviewed as separate changes.

@dashpole dashpole force-pushed the optimize_parallel branch 2 times, most recently from 23411a4 to 126c5cf Compare September 17, 2025 20:18
dashpole added a commit that referenced this pull request Sep 26, 2025
Prerequisite for
#7189.

Add tests to try to catch race conditions for concurrent measurements.
@dashpole
Copy link
Contributor Author

Closing in favor of #7427

@dashpole dashpole closed this Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants