Use Set hash in Distinct (2nd attempt) #7175

dashpole · 2025-08-12T00:32:31Z

Re-opening #5028 with new benchmarks. For cases with 10 attributes, this reduces the overhead of metric measurements by ~80-90% (depending on lock contention). It introduces a small probability of collision for attribute sets in the metrics SDK. For an instrument with 1 million different attribute sets, the probability of a collision is approximately 2 * 10^-8. For a more "normal" cardinality of 1000 on an instrument, it is approximately 2 * 10^-17.

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/attribute
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
                                            │   main.txt    │              hash.txt               │
                                            │    sec/op     │    sec/op     vs base               │
EquivalentMapAccess/Empty-24                    32.01n ± 2%   10.12n ±  4%  -68.37% (p=0.002 n=6)
EquivalentMapAccess/1_string_attribute-24      106.25n ± 2%   10.01n ±  5%  -90.58% (p=0.002 n=6)
EquivalentMapAccess/10_string_attributes-24   826.250n ± 1%   9.982n ± 11%  -98.79% (p=0.002 n=6)
EquivalentMapAccess/1_int_attribute-24         106.65n ± 2%   10.13n ±  3%  -90.50% (p=0.002 n=6)
EquivalentMapAccess/10_int_attributes-24       833.25n ± 2%   10.04n ±  5%  -98.80% (p=0.002 n=6)
geomean                                         190.3n        10.06n        -94.72%

Parallel benchmarks:

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
                                                         │  main24.txt   │              new24.txt              │
                                                         │    sec/op     │    sec/op     vs base               │
SyncMeasure/NoView/Int64Counter/Attributes/0-24             288.4n ± 13%   267.0n ± 16%        ~ (p=0.180 n=6)
SyncMeasure/NoView/Int64Counter/Attributes/1-24             372.7n ± 24%   303.3n ±  6%  -18.61% (p=0.002 n=6)
SyncMeasure/NoView/Int64Counter/Attributes/10-24           1862.5n ± 11%   302.2n ±  6%  -83.77% (p=0.002 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/0-24           288.2n ±  5%   291.8n ± 14%        ~ (p=0.589 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/1-24           374.8n ± 22%   326.2n ± 15%  -12.98% (p=0.002 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/10-24         1984.0n ± 10%   277.9n ± 15%  -85.99% (p=0.002 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/0-24       286.8n ± 13%   279.4n ± 14%        ~ (p=0.818 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/1-24       415.4n ± 14%   309.5n ± 11%  -25.47% (p=0.002 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/10-24     1923.0n ± 19%   294.1n ± 17%  -84.71% (p=0.002 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/0-24     284.9n ±  5%   271.6n ± 11%        ~ (p=0.240 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/1-24     382.9n ± 23%   295.7n ± 13%  -22.78% (p=0.002 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/10-24   1787.0n ± 28%   289.2n ± 12%  -83.81% (p=0.002 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/0-24           283.4n ±  8%   269.9n ±  9%        ~ (p=0.589 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/1-24           300.7n ±  8%   270.1n ± 15%  -10.16% (p=0.026 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/10-24         1046.8n ± 24%   299.2n ± 16%  -71.42% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/0-24         264.3n ± 12%   295.9n ±  5%  +11.93% (p=0.026 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/1-24         321.0n ±  8%   269.4n ± 11%  -16.09% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/10-24       1052.2n ± 10%   274.6n ±  5%  -73.90% (p=0.002 n=6)
geomean                                                     540.0n         287.7n        -46.72%

Single-threaded benchmarks:

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
                                                      │   main1.txt   │              new1.txt               │
                                                      │    sec/op     │    sec/op     vs base               │
SyncMeasure/NoView/Int64Counter/Attributes/0            130.95n ±  1%   97.99n ± 21%  -25.17% (p=0.002 n=6)
SyncMeasure/NoView/Int64Counter/Attributes/1             300.8n ±  7%   104.6n ±  3%  -65.21% (p=0.002 n=6)
SyncMeasure/NoView/Int64Counter/Attributes/10           1646.0n ±  2%   105.8n ±  2%  -93.58% (p=0.002 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/0          132.65n ±  1%   99.28n ±  4%  -25.16% (p=0.002 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/1           295.4n ±  3%   107.7n ±  3%  -63.54% (p=0.002 n=6)
SyncMeasure/NoView/Float64Counter/Attributes/10         1620.0n ±  1%   109.6n ±  4%  -93.23% (p=0.002 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/0      132.85n ± 80%   99.34n ±  1%  -25.22% (p=0.002 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/1       300.4n ±  1%   106.0n ±  1%  -64.71% (p=0.002 n=6)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/10     1622.0n ±  1%   105.8n ±  1%  -93.48% (p=0.002 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/0    134.90n ± 51%   99.16n ±  4%  -26.49% (p=0.002 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/1     312.4n ± 34%   107.8n ±  2%  -65.51% (p=0.002 n=6)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/10   1613.0n ± 23%   106.1n ±  1%  -93.43% (p=0.002 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/0          103.50n ± 17%   88.53n ±  1%  -14.46% (p=0.002 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/1          199.50n ± 16%   95.44n ±  2%  -52.16% (p=0.002 n=6)
SyncMeasure/NoView/Int64Histogram/Attributes/10         878.70n ±  2%   95.78n ±  2%  -89.10% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/0        108.55n ± 54%   88.45n ±  1%  -18.51% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/1        257.30n ± 14%   95.05n ±  2%  -63.06% (p=0.002 n=6)
SyncMeasure/NoView/Float64Histogram/Attributes/10       882.70n ± 18%   96.28n ±  1%  -89.09% (p=0.002 n=6)
geomean                                                  355.2n         100.3n        -71.77%

codecov · 2025-08-12T00:34:15Z

Codecov Report

❌ Patch coverage is 90.90909% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.4%. Comparing base (666f95c) to head (41b0468).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
attribute/hash.go	87.2%	7 Missing ⚠️
attribute/set.go	89.1%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #7175   +/-   ##
=====================================
  Coverage   85.4%   85.4%           
=====================================
  Files        269     271    +2     
  Lines      24297   24391   +94     
=====================================
+ Hits       20762   20843   +81     
- Misses      3158    3170   +12     
- Partials     377     378    +1

Files with missing lines	Coverage Δ
attribute/internal/fnv/fnv.go	`100.0% <100.0%> (ø)`
attribute/set.go	`83.0% <89.1%> (-1.2%)`	⬇️
attribute/hash.go	`87.2% <87.2%> (ø)`

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

attribute/set.go

MrAlias

We need to consider what is an acceptable level of hash collision here. We may want to use a larger bit size for the hash or look into ways to avoid the collision if we determine the collision rate too high. We may also consider this approach acceptable.

attribute/internal/fnv/fnv.go

attribute/hash.go

attribute/hash_test.go

attribute/set.go

MrAlias · 2025-08-14T17:15:38Z

For reference, Prometheus has used as similar size hash for their implementation. There haven't been any known issues related to collisions there and it has been out for ~ 10 years.

juliusv · 2025-08-15T09:31:49Z

Some input from the sidelines (and yay on the performance improvements! 🥳):

For reference, Prometheus has used as similar size hash for their implementation. There haven't been any known issues related to collisions there and it has been out for ~ 10 years.

It looks like collision handling was added to client_golang around a decade ago: prometheus/client_golang#221

Besides the client library, we have seen some real collisions over the years in the Prometheus server that have all been addressed:

Over a decade ago, a server with millions of series at SoundCloud ran into hash collisions. I think we used an FNV-1a hash back then to differentiate series within the TSDB. All this has completely changed by now, so the TSDB is resistant against this now.
Another real hash collision happened more recently in the PromQL engine in Prometheus: hash collision in engine leads to incorrect "multiple matches for labels" error prometheus/prometheus#5724. But as far as I can tell, this was due to a bad custom way of how we hashed labels together.

We still use hashes without collision checking in some places, which are hopefully unlikely enough to ever cause a collision. See also prometheus/prometheus#12519 for more discussion on hash collision potential in Prometheus.

bwplotka

Amazing, thanks! Looks good modulo previous comments around runes/bytes on hash string and unmarshalling questions 👍🏽

attribute/hash_test.go

CHANGELOG.md

MrAlias · 2025-08-26T23:01:10Z

@dashpole are we going to look into collision remediation techniques based on the Prometheus groups experience?

dashpole · 2025-08-27T13:30:33Z

Yes, I will look into what was done in prometheus/client_golang#221.

dashpole · 2025-08-27T14:59:38Z

The TL;DR is that handling collisions just means using a map[uint64][]metricWithLabelValues instead of a map[uint64]metricWithLabelValues to store aggregations (note the slice), and iterating over elements to find the exact one. Looks fairly straightforward. See https://github.com/prometheus/client_golang/blob/84a4734fe73aecc9ec36e1fbcb09c9a2d679f392/prometheus/vec.go#L319

I think the biggest difference for the attributes package is that we would need to ensure that Equals() correctly handles hash collisions (i.e. it should check more than that the set's Distinct are the same). Alternatively, we could switch to the "Add a new Hash() function" approach since that keeps a correct Equals() implementation. I'm not sure if the current Distinct implementation has poor == performance or not...

Forked from #7175 ``` $ go test -timeout 60s -run=xxxxxMatchNothingxxxxx -test.benchtime=10ms -count 6 -cpu 1 -bench=BenchmarkEquals ./... goos: linux goarch: amd64 pkg: go.opentelemetry.io/otel/attribute cpu: Intel(R) Xeon(R) CPU @ 2.20GHz BenchmarkEquals/Empty 1314884 9.008 ns/op BenchmarkEquals/Empty 1875915 8.461 ns/op BenchmarkEquals/Empty 1461613 7.749 ns/op BenchmarkEquals/Empty 1636912 9.359 ns/op BenchmarkEquals/Empty 1863039 6.355 ns/op BenchmarkEquals/Empty 1789053 6.336 ns/op BenchmarkEquals/1_string_attribute 674168 16.92 ns/op BenchmarkEquals/1_string_attribute 701983 16.42 ns/op BenchmarkEquals/1_string_attribute 692001 16.52 ns/op BenchmarkEquals/1_string_attribute 687970 16.29 ns/op BenchmarkEquals/1_string_attribute 751766 16.58 ns/op BenchmarkEquals/1_string_attribute 703534 16.88 ns/op BenchmarkEquals/10_string_attributes 85400 137.1 ns/op BenchmarkEquals/10_string_attributes 91045 136.1 ns/op BenchmarkEquals/10_string_attributes 90973 150.7 ns/op BenchmarkEquals/10_string_attributes 62877 177.5 ns/op BenchmarkEquals/10_string_attributes 90780 194.2 ns/op BenchmarkEquals/10_string_attributes 91058 144.6 ns/op BenchmarkEquals/1_int_attribute 624625 18.72 ns/op BenchmarkEquals/1_int_attribute 689478 16.03 ns/op BenchmarkEquals/1_int_attribute 719173 15.68 ns/op BenchmarkEquals/1_int_attribute 707005 16.18 ns/op BenchmarkEquals/1_int_attribute 752048 15.94 ns/op BenchmarkEquals/1_int_attribute 752034 16.23 ns/op BenchmarkEquals/10_int_attributes 90302 132.5 ns/op BenchmarkEquals/10_int_attributes 89929 131.9 ns/op BenchmarkEquals/10_int_attributes 86578 135.2 ns/op BenchmarkEquals/10_int_attributes 90482 133.1 ns/op BenchmarkEquals/10_int_attributes 90255 132.0 ns/op BenchmarkEquals/10_int_attributes 87615 134.6 ns/op PASS ok go.opentelemetry.io/otel/attribute 0.578s PASS ok go.opentelemetry.io/otel/attribute/internal 0.017s ``` --------- Co-authored-by: Flc゛ <four_leaf_clover@foxmail.com> Co-authored-by: Robert Pająk <pellared@hotmail.com>

Forked from #7175 (comment) This adds a test for JSON marshaling of attribute sets. --------- Co-authored-by: Flc゛ <four_leaf_clover@foxmail.com> Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

dashpole · 2025-08-29T18:31:32Z

There is a significant performance cost (the cost of set.Equals from #7262) to check for duplicates, but it is still a huge improvement over the current state. I'm trying to add a unit test with a collision, but it is taking a while for the fuzzer to generate a collision for me... Even with a cardinality of 100 million attributes, I didn't hit one :). I updated the benchmark results in the description.

attribute/hash_test.go

attribute/set.go

bboreham · 2025-09-11T17:38:20Z

For reference, Prometheus has used as similar size hash for their implementation. There haven't been any known issues related to collisions there and it has been out for ~ 10 years.

Julius already cited this, but I too want to note there were issues:

hash collision in engine leads to incorrect "multiple matches for labels" error prometheus/prometheus#5724

Prometheus has generally moved from fnv to xxhash which is much better behaved.

Citation for xxhash being better than fnv: Potential for false AlreadyRegistered errors prometheus/client_golang#633 (comment) (unfortunately there is no evidence there, just a statement).
The first place I can see xxhash added (2016): prometheus/prometheus@c6cd998
A utility to deliberately search for collisions: https://github.com/pstibrany/labels_hash_collisions

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

Co-authored-by: Robert Pająk <pellared@hotmail.com>

dashpole · 2025-09-16T17:47:52Z

I plan to investigate xxHash in a separate issue. From my brief research, it is primarily designed for efficient hashing of large amounts of input data (e.g. checksums for files). It should still be faster than fnv64-1a for our inputs, but the difference may or may not be significant. I've seen conflicting statements for which algorithm has a better distribution of hash values, although most seem to favor xxHash as well. I don't think that should block this PR considering the substantial performance benefits it brings.

pellared reviewed Aug 14, 2025

View reviewed changes

attribute/set.go Show resolved Hide resolved

pellared reviewed Aug 14, 2025

View reviewed changes

attribute/set.go Show resolved Hide resolved

MrAlias reviewed Aug 14, 2025

View reviewed changes

attribute/internal/fnv/fnv.go Outdated Show resolved Hide resolved

attribute/hash.go Show resolved Hide resolved

attribute/hash_test.go Show resolved Hide resolved

attribute/set.go Show resolved Hide resolved

dashpole mentioned this pull request Aug 15, 2025

Optimize locking for metric aggregations #7189

Closed

bwplotka approved these changes Aug 18, 2025

View reviewed changes

attribute/hash_test.go Outdated Show resolved Hide resolved

attribute/hash_test.go Outdated Show resolved Hide resolved

dashpole force-pushed the hash_distinct branch from a8037e9 to 6bf5e93 Compare August 19, 2025 17:36

dashpole marked this pull request as ready for review August 19, 2025 17:56

dashpole requested review from XSAM, dmathieu and flc1125 as code owners August 19, 2025 17:56

pellared reviewed Aug 25, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

dashpole marked this pull request as draft August 27, 2025 13:30

dashpole force-pushed the hash_distinct branch from 162f87d to 6ffffec Compare August 27, 2025 18:03

This was referenced Aug 27, 2025

Add benchmark for set equality #7262

Merged

Add tests for attribute JSON marshalling #7268

Merged

dashpole force-pushed the hash_distinct branch 3 times, most recently from a69ecb9 to 431c5ed Compare August 28, 2025 19:07

dashpole force-pushed the hash_distinct branch 3 times, most recently from a3fb9c7 to ebbb889 Compare August 29, 2025 18:27

dashpole force-pushed the hash_distinct branch from 5804461 to d60622b Compare September 8, 2025 20:17

MrAlias reviewed Sep 9, 2025

View reviewed changes

attribute/hash_test.go Show resolved Hide resolved

pellared reviewed Sep 9, 2025

View reviewed changes

attribute/hash_test.go Outdated Show resolved Hide resolved

attribute/hash_test.go Outdated Show resolved Hide resolved

MrAlias reviewed Sep 9, 2025

View reviewed changes

attribute/set.go Show resolved Hide resolved

MrAlias mentioned this pull request Sep 11, 2025

Why does cardinality limit add performance overhead to metric measurements #7035

Closed

MrAlias added this to the v1.39.0 milestone Sep 11, 2025

dashpole and others added 13 commits September 16, 2025 16:55

tyler's PoC for attribute hashing

6f8149a

lint

f36871d

iterate on runes, not bytes

8da27b2

simplify benchmark loops

1ca3971

declare benchmark var globally

6d02670

add fuzz test

34995bf

remove changelog entry

8a13753

lint

1ca4a33

fix set equality

fffec87

document that Distinct now has theoretically possible collisions

7ac0013

move changelog to unreleased

04702b7

Update attribute/hash_test.go

103045b

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

Apply suggestions from code review

3add646

Co-authored-by: Robert Pająk <pellared@hotmail.com>

dashpole force-pushed the hash_distinct branch from 70744f0 to 3add646 Compare September 16, 2025 16:56

dashpole added 2 commits September 16, 2025 16:59

fixes after rebase

5acd267

lint

4ed4e0d

dashpole force-pushed the hash_distinct branch from c8d11bd to 4ed4e0d Compare September 16, 2025 17:04

dashpole mentioned this pull request Sep 16, 2025

Consider using xxHash instead of fnv64-1a #7371

Open

MrAlias approved these changes Sep 16, 2025

View reviewed changes

Merge branch 'main' into hash_distinct

41b0468

dashpole merged commit 9d52bde into open-telemetry:main Sep 16, 2025
33 checks passed

dashpole deleted the hash_distinct branch September 16, 2025 20:04

Use Set hash in Distinct (2nd attempt) #7175

Use Set hash in Distinct (2nd attempt) #7175

Uh oh!

Conversation

dashpole commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

MrAlias left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MrAlias commented Aug 14, 2025

Uh oh!

juliusv commented Aug 15, 2025

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MrAlias commented Aug 26, 2025

Uh oh!

dashpole commented Aug 27, 2025

Uh oh!

dashpole commented Aug 27, 2025

Uh oh!

dashpole commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bboreham commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dashpole commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dashpole commented Aug 12, 2025 •

edited

Loading

codecov bot commented Aug 12, 2025 •

edited

Loading

dashpole commented Aug 29, 2025 •

edited

Loading

bboreham commented Sep 11, 2025 •

edited

Loading