Skip to content

Conversation

@julietteO
Copy link

@julietteO julietteO commented Nov 26, 2025

What this PR does

This PR aims to handle the case where several replicas/clusters push metrics in a single request.
Without those changes, deduplications is done on a per request basis, now it can handle a per sample logic.

We rebased @dimitarvdimitrov 's branch to current main, updated the tests and handled some remaining todos.

Which issue(s) this PR fixes or relates to

Fixes #3199

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

Note

Refactors HA deduplication to operate per-series instead of per-request, updating distributor middleware, HA label handling, docs, tests, and benchmarks.

  • Distributor (HA dedup):
    • Implement per-series deduplication in prePushHaDedupeMiddleware with replica state tracking, request slicing, and label removal (replicaState, haReplica, sortByAccepted(), removeHAReplicaLabels(), etc.).
    • Use earliest sample timestamp per request when configured and aggregate errors via multierror.
    • Update HA metrics accounting to reflect per-series decisions.
  • HA Tracker:
    • Change findHALabels() to return a haReplica struct; update call sites accordingly.
  • Tests:
    • Add comprehensive HA scenarios (mixed replicas/clusters, partial dedup, non-HA series, all deduped) and adjust assertions; include new HA cases in benchmarks; increase numSeriesPerRequest to 1024.
    • Add health check stub to noopIngester for test stability.
  • Docs:
    • Update HA deduplication docs to state labels are checked on every series (not only the first).
  • Benchmarks/CHANGELOG:
    • Add before/after benchmark outputs and summary stats for HA dedup scenarios.
    • Add CHANGELOG entry: “HA: Deduplicate per sample instead of per batch.”

Written by Cursor Bugbot for commit 7739ac3. This will update automatically on new commits. Configure here.

@julietteO julietteO requested a review from a team as a code owner November 26, 2025 16:07
@CLAassistant
Copy link

CLAassistant commented Nov 26, 2025

CLA assistant check
All committers have signed the CLA.

@julietteO julietteO force-pushed the dimitar/ha-dedup-on-every-sample-bench branch from 9a5373e to 656ea69 Compare November 26, 2025 16:13
@vaxvms
Copy link
Collaborator

vaxvms commented Nov 26, 2025

pkg/distributor$ go test -bench=.
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg="Get - not found" key=prefixuser/cluster
level=debug msg=CAS key=prefixuser/cluster modify_index=0 value="\"\\x15P\\n\\x05first\\x10\\xeb\\xc1\\xf1\\x86\\xac3 \\xeb\\xc1\\xf1\\x86\\xac3\""
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg=Get key=prefixuser/cluster modify_index=2 value="\"\\x15P\\n\\x05first\\x10\\xeb\\xc1\\xf1\\x86\\xac3 \\xeb\\xc1\\xf1\\x86\\xac3\""
level=debug msg=CAS key=prefixuser/cluster modify_index=2 value="\"\\x18\\\\\\n\\x06second\\x10\\xfb\\x8f\\xf2\\x86\\xac3 \\xfb\\x8f\\xf2\\x86\\xac3(\\x01\""
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg=Get key=prefixuser/cluster modify_index=3 value="\"\\x18\\\\\\n\\x06second\\x10\\xfb\\x8f\\xf2\\x86\\xac3 \\xfb\\x8f\\xf2\\x86\\xac3(\\x01\""
level=debug msg=CAS key=prefixuser/cluster modify_index=3 value="\"\\x17X\\n\\x05first\\x10\\xa3\\xd6\\xf2\\x86\\xac3 \\x8b\\xde\\xf2\\x86\\xac3(\\x02\""
2025/11/26 15:51:00 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/11/26 15:51:00 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/11/26 15:51:00 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/11/26 15:51:00 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/11/26 15:51:00 label __name__ is overwritten. Check if Prometheus reserved labels are used.
level=info msg="server listening on addresses" http=127.0.0.1:39481 grpc=127.0.0.1:42189
level=warn method=/httpgrpc.HTTP/Handle duration=279.635µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[123 34 114 101 115 111 117 114 99 101 77 101 116 114 105 99 115 34 58 32 91 123 34 115 99 111 112 101 77 101 116 114 105 99 115 34 58 32 91 123 34 109 101 116 114 105 99 115 34 58 32 91 123 34 110 97 109 101 34 58 32 34 114 101 112 111 114 116 95 115 101 114 118 101 114 95 101 114 114 111 114 34 44 32 34 103 97 117 103 101 34 58 32 123 34 100 97 116 97 80 111 105 110 116 115 34 58 32 91 123 34 116 105 109 101 85 110 105 120 78 97 110 111 34 58 32 34 49 54 55 57 57 49 50 52 54 51 51 52 48 48 48 48 48 48 48 34 44 32 34 97 115 68 111 117 98 108 101 34 58 32 49 48 46 54 54 125 93 125 125 93 125 93 125 93 125],}" msg=gRPC err="rpc error: code = Code(503) desc = some random push error"
level=warn method=/httpgrpc.HTTP/Handle duration=100.554µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{},Body:[104 101 108 108 111],}" msg=gRPC err="rpc error: code = Code(415) desc = unsupported content type: , supported: [application/json, application/x-protobuf]"
level=warn method=/httpgrpc.HTTP/Handle duration=208.888µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[105 110 118 97 108 105 100],}" msg=gRPC err="rpc error: code = Code(400) desc = ReadObject: expect { or , or } or n, but found i, error found in #1 byte of ...|invalid|..., bigger context ...|invalid|..."
level=warn method=/httpgrpc.HTTP/Handle duration=79.776µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/x-protobuf],},},Body:[105 110 118 97 108 105 100],}" msg=gRPC err="rpc error: code = Code(400) desc = unexpected EOF"
level=warn method=/httpgrpc.HTTP/Handle duration=209.455µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[10 246 22 10 211 2 10 29 10 17 99 111 110 116 97 105 110 101 114 46 114 117 110 116 105 109 101 18 8 10 6 100 111 99 107 101 114 10 39 10 18 99 111 110 116 97 105 110 101 114 46 104],}" msg=gRPC err="rpc error: code = Code(400) desc = ReadObject: expect { or , or } or n, but found \ufffd, error found in #2 byte of ...|\n\ufffd\u0016\n\ufffd\u0002\n\u001d\n\u0011co|..., bigger context ...|\n\ufffd\u0016\n\ufffd\u0002\n\u001d\n\u0011container.runtime\u0012\u0008\n\u0006docker\n'\n\u0012container.h|..."
level=info msg="=== Handler.Stop()'d ==="
goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/distributor
cpu: Intel(R) Xeon(R) CPU E5-2689 v4 @ 3.10GHz
BenchmarkDistributor_prePushMaxSeriesLimitMiddleware/no_series_rejected-5         	   13635	    106898 ns/op
BenchmarkDistributor_prePushMaxSeriesLimitMiddleware/10%_series_rejected-5        	   10000	    116378 ns/op
BenchmarkDistributor_prePushMaxSeriesLimitMiddleware/50%_series_rejected-5        	    8655	    147131 ns/op
BenchmarkDistributor_prePushMaxSeriesLimitMiddleware/all_series_rejected-5        	    5775	    220958 ns/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_one_histogram-5 	 6430651	       202.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_one_sample_and_one_histogram-5         	 3267134	       329.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_two_samples-5                          	 2995648	       394.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_two_samples_and_two_histograms-5       	 1628072	       742.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_80_000_samples_with_duplicated_timestamps-5         	    1423	    923709 ns/op	    2579 B/op	       5 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_80_000_histograms_with_duplicated_timestamps-5      	     416	   3416069 ns/op	    3804 B/op	      18 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_one_sample-5                                        	 6098230	       193.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_two_histograms-5                                    	 2914910	       408.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_80_000_samples_and_80_000_histograms_with_duplicated_timestamps-5         	     308	   4245952 ns/op	    5749 B/op	      18 allocs/op
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=too_many_labels_limit_reached-5                                                	     250	   5047816 ns/op	 1199984 B/op	    5068 allocs/op
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=max_label_name_length_limit_reached-5                                          	     363	   3586454 ns/op	 1135857 B/op	    5063 allocs/op
--- FAIL: BenchmarkDistributor_Push/cost_attribution=disabled/scenario=max_label_value_length_limit_reached
    distributor_test.go:2476: expected received a series whose label value length exceeds the limit error but got rpc error: code = InvalidArgument desc = received a series whose label value length of 204 exceeds the limit of 200, label: 'xxx', value: 'xxx_0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000' (truncated) series: 'foo{name_0="value_0", name_1="value_1", name_2="value_2", name_3="value_3", name_4="value_4", name_5="value_5", name_6="value_6", name_7="value_7", name_8="value_8", name_9="value_9", team="0", xxx="x' (err-mimir-label-value-too-long). To adjust the related per-tenant limit, configure -validation.max-length-label-value, or contact your service administrator.
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=timestamp_too_new-5                                                            	     699	   1634618 ns/op	  324731 B/op	    4059 allocs/op
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=all_samples_go_to_metric_relabel_configs-5                                     	     348	   3630785 ns/op	  242820 B/op	    5097 allocs/op
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=all_samples_successfully_pushed-5                                              	     682	   1893318 ns/op	  160855 B/op	      85 allocs/op
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=ingestion_rate_limit_reached-5                                                 	    1204	    995819 ns/op	    2664 B/op	      47 allocs/op
--- FAIL: BenchmarkDistributor_Push/cost_attribution=disabled
--- FAIL: BenchmarkDistributor_Push/cost_attribution=enabled/scenario=max_label_value_length_limit_reached
    distributor_test.go:2476: expected received a series whose label value length exceeds the limit error but got rpc error: code = InvalidArgument desc = received a series whose label value length of 204 exceeds the limit of 200, label: 'xxx', value: 'xxx_0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000' (truncated) series: 'foo{name_0="value_0", name_1="value_1", name_2="value_2", name_3="value_3", name_4="value_4", name_5="value_5", name_6="value_6", name_7="value_7", name_8="value_8", name_9="value_9", team="0", xxx="x' (err-mimir-label-value-too-long). To adjust the related per-tenant limit, configure -validation.max-length-label-value, or contact your service administrator.
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=timestamp_too_new-5                                                             	     706	   1827770 ns/op	  324554 B/op	    4059 allocs/op
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=all_samples_go_to_metric_relabel_configs-5                                      	     324	   3617407 ns/op	  242308 B/op	    5089 allocs/op
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=all_samples_successfully_pushed-5                                               	     614	   1883868 ns/op	  160519 B/op	      84 allocs/op
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=ingestion_rate_limit_reached-5                                                  	    1227	   1043338 ns/op	    2606 B/op	      47 allocs/op
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=too_many_labels_limit_reached-5                                                 	     202	   5516525 ns/op	 1201387 B/op	    5082 allocs/op
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=max_label_name_length_limit_reached-5                                           	     294	   4150221 ns/op	 1136113 B/op	    5069 allocs/op
--- FAIL: BenchmarkDistributor_Push/cost_attribution=enabled
--- FAIL: BenchmarkDistributor_Push
BenchmarkDistributor_ActiveSeries-5                                                                                                         	      27	  41962331 ns/op	 6857461 B/op	   86267 allocs/op
BenchmarkOTLPHandler/protobuf-5                                                                                                             	      20	  55924069 ns/op
BenchmarkOTLPHandler/JSON-5                                                                                                                 	      16	  83105382 ns/op
BenchmarkPushHandler-5                                                                                                                      	  302606	      4289 ns/op
BenchmarkMergeExemplars-5                                                                                                                   	     386	   3169380 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_1_zones,_1_series_per_ingester-5                                                     	 3971000	       294.3 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_1_zones,_10_series_per_ingester-5                                                    	  863239	      1909 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_1_zones,_100_series_per_ingester-5                                                   	   73368	     16180 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_1_zones,_1000_series_per_ingester-5                                                  	   10000	    128066 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_1_zones,_10000_series_per_ingester-5                                                 	     734	   1747952 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_2_zones,_1_series_per_ingester-5                                                     	 2951730	       400.7 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_2_zones,_10_series_per_ingester-5                                                    	  561928	      2928 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_2_zones,_100_series_per_ingester-5                                                   	   52220	     22283 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_2_zones,_1000_series_per_ingester-5                                                  	    6922	    227013 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_2_zones,_10000_series_per_ingester-5                                                 	     483	   2568923 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_3_zones,_1_series_per_ingester-5                                                     	 1905758	       632.9 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_3_zones,_10_series_per_ingester-5                                                    	  362325	      4091 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_3_zones,_100_series_per_ingester-5                                                   	   32584	     33655 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_3_zones,_1000_series_per_ingester-5                                                  	    4477	    306673 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_3_zones,_10000_series_per_ingester-5                                                 	     318	   3569789 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_1_zones,_1_series_per_ingester-5                                                     	 2140617	       524.4 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_1_zones,_10_series_per_ingester-5                                                    	  308349	      4128 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_1_zones,_100_series_per_ingester-5                                                   	   34064	     36377 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_1_zones,_1000_series_per_ingester-5                                                  	    3536	    370762 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_1_zones,_10000_series_per_ingester-5                                                 	     250	   4601581 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_2_zones,_1_series_per_ingester-5                                                     	 1216155	      1061 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_2_zones,_10_series_per_ingester-5                                                    	  200676	      5892 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_2_zones,_100_series_per_ingester-5                                                   	   19245	     53103 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_2_zones,_1000_series_per_ingester-5                                                  	    1965	    542698 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_2_zones,_10000_series_per_ingester-5                                                 	     175	   6517588 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_3_zones,_1_series_per_ingester-5                                                     	 1000000	      1333 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_3_zones,_10_series_per_ingester-5                                                    	  149478	      8512 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_3_zones,_100_series_per_ingester-5                                                   	   13296	     86891 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_3_zones,_1000_series_per_ingester-5                                                  	    1360	    864098 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_3_zones,_10000_series_per_ingester-5                                                 	     124	   9328728 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_1_zones,_1_series_per_ingester-5                                                     	 1000000	      1287 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_1_zones,_10_series_per_ingester-5                                                    	  134592	      9140 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_1_zones,_100_series_per_ingester-5                                                   	   14137	     86357 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_1_zones,_1000_series_per_ingester-5                                                  	    1222	    860586 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_1_zones,_10000_series_per_ingester-5                                                 	     100	  12191916 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_2_zones,_1_series_per_ingester-5                                                     	  729943	      2058 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_2_zones,_10_series_per_ingester-5                                                    	   86208	     14824 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_2_zones,_100_series_per_ingester-5                                                   	   10000	    136939 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_2_zones,_1000_series_per_ingester-5                                                  	     786	   1483957 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_2_zones,_10000_series_per_ingester-5                                                 	      64	  17215064 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_3_zones,_1_series_per_ingester-5                                                     	  384852	      3060 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_3_zones,_10_series_per_ingester-5                                                    	   55456	     21917 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_3_zones,_100_series_per_ingester-5                                                   	    6297	    213793 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_3_zones,_1000_series_per_ingester-5                                                  	     553	   2148239 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_3_zones,_10000_series_per_ingester-5                                                 	      58	  23428732 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_1_zones,_1_series_per_ingester-5                                                    	  358521	      3606 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_1_zones,_10_series_per_ingester-5                                                   	   51464	     24992 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_1_zones,_100_series_per_ingester-5                                                  	    5948	    238229 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_1_zones,_1000_series_per_ingester-5                                                 	     423	   2854911 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_1_zones,_10000_series_per_ingester-5                                                	      45	  32631640 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_2_zones,_1_series_per_ingester-5                                                    	  203570	      6372 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_2_zones,_10_series_per_ingester-5                                                   	   26821	     44175 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_2_zones,_100_series_per_ingester-5                                                  	    3067	    439770 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_2_zones,_1000_series_per_ingester-5                                                 	     244	   4736829 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_2_zones,_10000_series_per_ingester-5                                                	      25	  53132221 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_3_zones,_1_series_per_ingester-5                                                    	  140487	      8475 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_3_zones,_10_series_per_ingester-5                                                   	   18651	     67852 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_3_zones,_100_series_per_ingester-5                                                  	    1998	    639591 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_3_zones,_1000_series_per_ingester-5                                                 	     170	   6842800 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_3_zones,_10000_series_per_ingester-5                                                	      16	  84026045 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_1_zones,_1_series_per_ingester-5                                                   	   28773	     41593 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_1_zones,_10_series_per_ingester-5                                                  	    3343	    334955 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_1_zones,_100_series_per_ingester-5                                                 	     284	   4167464 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_1_zones,_1000_series_per_ingester-5                                                	      30	  48596008 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_1_zones,_10000_series_per_ingester-5                                               	       3	 470896851 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_2_zones,_1_series_per_ingester-5                                                   	   16784	     71272 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_2_zones,_10_series_per_ingester-5                                                  	    2108	    684121 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_2_zones,_100_series_per_ingester-5                                                 	     164	   7167500 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_2_zones,_1000_series_per_ingester-5                                                	      14	  78230054 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_2_zones,_10000_series_per_ingester-5                                               	       2	 831123503 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_3_zones,_1_series_per_ingester-5                                                   	   10000	    108579 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_3_zones,_10_series_per_ingester-5                                                  	    1386	    968323 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_3_zones,_100_series_per_ingester-5                                                 	     100	  10143713 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_3_zones,_1000_series_per_ingester-5                                                	      10	 111290189 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_3_zones,_10000_series_per_ingester-5                                               	       1	1002488383 ns/op
FAIL
exit status 1
FAIL	github.com/grafana/mimir/pkg/distributor	321.002s

On the working branch:

level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg="Get - not found" key=prefixuser/cluster
level=debug msg=CAS key=prefixuser/cluster modify_index=0 value="\"\\x15P\\n\\x05first\\x10\\xf6\\x80\\x8d\\x87\\xac3 \\xf6\\x80\\x8d\\x87\\xac3\""
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg=Get key=prefixuser/cluster modify_index=2 value="\"\\x15P\\n\\x05first\\x10\\xf6\\x80\\x8d\\x87\\xac3 \\xf6\\x80\\x8d\\x87\\xac3\""
level=debug msg=CAS key=prefixuser/cluster modify_index=2 value="\"\\x18\\\\\\n\\x06second\\x10\\x86ύ\\x87\\xac3 \\x86ύ\\x87\\xac3(\\x01\""
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg=Get key=prefixuser/cluster modify_index=3 value="\"\\x18\\\\\\n\\x06second\\x10\\x86ύ\\x87\\xac3 \\x86ύ\\x87\\xac3(\\x01\""
level=debug msg=CAS key=prefixuser/cluster modify_index=3 value="\"\\x17X\\n\\x05first\\x10\\xae\\x95\\x8e\\x87\\xac3 \\x96\\x9d\\x8e\\x87\\xac3(\\x02\""
2025/11/26 15:58:30 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/11/26 15:58:30 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/11/26 15:58:30 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/11/26 15:58:30 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/11/26 15:58:30 label __name__ is overwritten. Check if Prometheus reserved labels are used.
level=info msg="server listening on addresses" http=127.0.0.1:38259 grpc=127.0.0.1:35187
level=warn method=/httpgrpc.HTTP/Handle duration=231.995µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{},Body:[104 101 108 108 111],}" msg=gRPC err="rpc error: code = Code(415) desc = unsupported content type: , supported: [application/json, application/x-protobuf]"
level=warn method=/httpgrpc.HTTP/Handle duration=148.203µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[105 110 118 97 108 105 100],}" msg=gRPC err="rpc error: code = Code(400) desc = ReadObject: expect { or , or } or n, but found i, error found in #1 byte of ...|invalid|..., bigger context ...|invalid|..."
level=warn method=/httpgrpc.HTTP/Handle duration=67.843µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/x-protobuf],},},Body:[105 110 118 97 108 105 100],}" msg=gRPC err="rpc error: code = Code(400) desc = unexpected EOF"
level=warn method=/httpgrpc.HTTP/Handle duration=160.202µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[10 246 22 10 211 2 10 29 10 17 99 111 110 116 97 105 110 101 114 46 114 117 110 116 105 109 101 18 8 10 6 100 111 99 107 101 114 10 39 10 18 99 111 110 116 97 105 110 101 114 46 104],}" msg=gRPC err="rpc error: code = Code(400) desc = ReadObject: expect { or , or } or n, but found \ufffd, error found in #2 byte of ...|\n\ufffd\u0016\n\ufffd\u0002\n\u001d\n\u0011co|..., bigger context ...|\n\ufffd\u0016\n\ufffd\u0002\n\u001d\n\u0011container.runtime\u0012\u0008\n\u0006docker\n'\n\u0012container.h|..."
level=warn method=/httpgrpc.HTTP/Handle duration=162.557µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[123 34 114 101 115 111 117 114 99 101 77 101 116 114 105 99 115 34 58 32 91 123 34 115 99 111 112 101 77 101 116 114 105 99 115 34 58 32 91 123 34 109 101 116 114 105 99 115 34 58 32 91 123 34 110 97 109 101 34 58 32 34 114 101 112 111 114 116 95 115 101 114 118 101 114 95 101 114 114 111 114 34 44 32 34 103 97 117 103 101 34 58 32 123 34 100 97 116 97 80 111 105 110 116 115 34 58 32 91 123 34 116 105 109 101 85 110 105 120 78 97 110 111 34 58 32 34 49 54 55 57 57 49 50 52 54 51 51 52 48 48 48 48 48 48 48 34 44 32 34 97 115 68 111 117 98 108 101 34 58 32 49 48 46 54 54 125 93 125 125 93 125 93 125 93 125],}" msg=gRPC err="rpc error: code = Code(503) desc = some random push error"
level=info msg="=== Handler.Stop()'d ==="
goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/distributor
cpu: Intel(R) Xeon(R) CPU E5-2689 v4 @ 3.10GHz
BenchmarkDistributor_prePushMaxSeriesLimitMiddleware/50%_series_rejected-5         	    8342	    138097 ns/op
BenchmarkDistributor_prePushMaxSeriesLimitMiddleware/all_series_rejected-5         	    7670	    211625 ns/op
BenchmarkDistributor_prePushMaxSeriesLimitMiddleware/no_series_rejected-5          	   12925	     92424 ns/op
BenchmarkDistributor_prePushMaxSeriesLimitMiddleware/10%_series_rejected-5         	   10000	    119222 ns/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_one_sample-5     	 6506082	       184.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_one_sample_and_one_histogram-5         	 3854205	       318.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_two_samples-5                          	 3193756	       370.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_two_histograms-5                       	 3014504	       394.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_two_samples_and_two_histograms-5       	 1732006	       702.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_80_000_histograms_with_duplicated_timestamps-5         	     387	   3138024 ns/op	    3147 B/op	      12 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_80_000_samples_and_80_000_histograms_with_duplicated_timestamps-5         	     332	   3762606 ns/op	    5744 B/op	      18 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_one_histogram-5                                                           	 6180811	       193.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkDistributor_SampleDuplicateTimestamp/one_timeseries_with_80_000_samples_with_duplicated_timestamps-5                               	    1450	    835834 ns/op	    2551 B/op	       5 allocs/op
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=ingestion_rate_limit_reached-5                                                 	    1294	    907845 ns/op	    2601 B/op	      47 allocs/op
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=too_many_labels_limit_reached-5                                                	     231	   4975058 ns/op	 1228233 B/op	    5188 allocs/op
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=max_label_name_length_limit_reached-5                                          	     350	   3496453 ns/op	 1162151 B/op	    5184 allocs/op
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=timestamp_too_new-5                                                            	     712	   1652125 ns/op	  331583 B/op	    4154 allocs/op
--- FAIL: BenchmarkDistributor_Push/cost_attribution=disabled/scenario=HA_dedup;_4_clusters_8_replicas_evenly_split
    distributor_test.go:2554: expected replicas did not mach, rejecting sample: error but got <nil>
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=all_samples_successfully_pushed-5                                              	     721	   1723923 ns/op	  163520 B/op	      91 allocs/op
--- FAIL: BenchmarkDistributor_Push/cost_attribution=disabled/scenario=max_label_value_length_limit_reached
    distributor_test.go:2554: expected received a series whose label value length exceeds the limit error but got rpc error: code = InvalidArgument desc = received a series whose label value length of 204 exceeds the limit of 200, label: 'xxx', value: 'xxx_0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000' (truncated) series: 'foo{name_0="value_0", name_1="value_1", name_2="value_2", name_3="value_3", name_4="value_4", name_5="value_5", name_6="value_6", name_7="value_7", name_8="value_8", name_9="value_9", team="0", xxx="x' (err-mimir-label-value-too-long). To adjust the related per-tenant limit, configure -validation.max-length-label-value, or contact your service administrator.
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=all_samples_go_to_metric_relabel_configs-5                                     	     363	   3598620 ns/op	  246038 B/op	    5209 allocs/op
BenchmarkDistributor_Push/cost_attribution=disabled/scenario=HA_dedup;_all_samples_same_replica-5                                           	     637	   1875061 ns/op	  162854 B/op	      87 allocs/op
--- FAIL: BenchmarkDistributor_Push/cost_attribution=disabled
--- FAIL: BenchmarkDistributor_Push/cost_attribution=enabled/scenario=HA_dedup;_4_clusters_8_replicas_evenly_split
    distributor_test.go:2554: expected replicas did not mach, rejecting sample: error but got <nil>
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=all_samples_successfully_pushed-5                                               	     716	   1756858 ns/op	  162317 B/op	      82 allocs/op
--- FAIL: BenchmarkDistributor_Push/cost_attribution=enabled/scenario=max_label_value_length_limit_reached
    distributor_test.go:2554: expected received a series whose label value length exceeds the limit error but got rpc error: code = InvalidArgument desc = received a series whose label value length of 204 exceeds the limit of 200, label: 'xxx', value: 'xxx_0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000' (truncated) series: 'foo{name_0="value_0", name_1="value_1", name_2="value_2", name_3="value_3", name_4="value_4", name_5="value_5", name_6="value_6", name_7="value_7", name_8="value_8", name_9="value_9", team="0", xxx="x' (err-mimir-label-value-too-long). To adjust the related per-tenant limit, configure -validation.max-length-label-value, or contact your service administrator.
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=all_samples_go_to_metric_relabel_configs-5                                      	     343	   3383869 ns/op	  246187 B/op	    5209 allocs/op
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=HA_dedup;_all_samples_same_replica-5                                            	     591	   1987398 ns/op	  179272 B/op	    1112 allocs/op
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=ingestion_rate_limit_reached-5                                                  	    1086	   1138501 ns/op	    2646 B/op	      47 allocs/op
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=too_many_labels_limit_reached-5                                                 	     236	   5386907 ns/op	 1230956 B/op	    5215 allocs/op
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=max_label_name_length_limit_reached-5                                           	     324	   3681592 ns/op	 1162325 B/op	    5185 allocs/op
BenchmarkDistributor_Push/cost_attribution=enabled/scenario=timestamp_too_new-5                                                             	     636	   1934861 ns/op	  331768 B/op	    4156 allocs/op
--- FAIL: BenchmarkDistributor_Push/cost_attribution=enabled
--- FAIL: BenchmarkDistributor_Push
BenchmarkDistributor_ActiveSeries-5                                                                                                         	      30	  35713108 ns/op	 6740101 B/op	   85547 allocs/op
BenchmarkOTLPHandler/protobuf-5                                                                                                             	      19	  53516641 ns/op
BenchmarkOTLPHandler/JSON-5                                                                                                                 	      16	  73355678 ns/op
BenchmarkPushHandler-5                                                                                                                      	  318514	      3675 ns/op
BenchmarkMergeExemplars-5                                                                                                                   	     368	   3013160 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_1_zones,_1_series_per_ingester-5                                                     	 4479130	       286.2 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_1_zones,_10_series_per_ingester-5                                                    	  830332	      1917 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_1_zones,_100_series_per_ingester-5                                                   	   74971	     16014 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_1_zones,_1000_series_per_ingester-5                                                  	   10000	    131402 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_1_zones,_10000_series_per_ingester-5                                                 	     686	   1932363 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_2_zones,_1_series_per_ingester-5                                                     	 3064905	       416.6 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_2_zones,_10_series_per_ingester-5                                                    	  580822	      2858 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_2_zones,_100_series_per_ingester-5                                                   	   51538	     24122 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_2_zones,_1000_series_per_ingester-5                                                  	    6093	    201202 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_2_zones,_10000_series_per_ingester-5                                                 	     480	   2526390 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_3_zones,_1_series_per_ingester-5                                                     	 1995748	       583.3 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_3_zones,_10_series_per_ingester-5                                                    	  426860	      3707 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_3_zones,_100_series_per_ingester-5                                                   	   36622	     33224 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_3_zones,_1000_series_per_ingester-5                                                  	    4892	    303580 ns/op
BenchmarkMergingAndSortingSeries/1_ingesters_per_zone,_3_zones,_10000_series_per_ingester-5                                                 	     339	   3715665 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_1_zones,_1_series_per_ingester-5                                                     	 2108652	       597.9 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_1_zones,_10_series_per_ingester-5                                                    	  286135	      3943 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_1_zones,_100_series_per_ingester-5                                                   	   31014	     37858 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_1_zones,_1000_series_per_ingester-5                                                  	    3289	    349008 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_1_zones,_10000_series_per_ingester-5                                                 	     267	   4531328 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_2_zones,_1_series_per_ingester-5                                                     	 1272732	      1011 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_2_zones,_10_series_per_ingester-5                                                    	  200030	      6006 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_2_zones,_100_series_per_ingester-5                                                   	   20175	     55196 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_2_zones,_1000_series_per_ingester-5                                                  	    2104	    547591 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_2_zones,_10000_series_per_ingester-5                                                 	     171	   6661000 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_3_zones,_1_series_per_ingester-5                                                     	 1000000	      1365 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_3_zones,_10_series_per_ingester-5                                                    	  132756	      8795 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_3_zones,_100_series_per_ingester-5                                                   	   13796	     82681 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_3_zones,_1000_series_per_ingester-5                                                  	    1408	    823438 ns/op
BenchmarkMergingAndSortingSeries/2_ingesters_per_zone,_3_zones,_10000_series_per_ingester-5                                                 	     120	   9492899 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_1_zones,_1_series_per_ingester-5                                                     	 1000000	      1180 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_1_zones,_10_series_per_ingester-5                                                    	  107215	      9389 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_1_zones,_100_series_per_ingester-5                                                   	   14493	     85570 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_1_zones,_1000_series_per_ingester-5                                                  	    1437	    876474 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_1_zones,_10000_series_per_ingester-5                                                 	     100	  10599159 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_2_zones,_1_series_per_ingester-5                                                     	  808478	      1975 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_2_zones,_10_series_per_ingester-5                                                    	   86056	     15178 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_2_zones,_100_series_per_ingester-5                                                   	    8662	    142906 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_2_zones,_1000_series_per_ingester-5                                                  	     728	   1402686 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_2_zones,_10000_series_per_ingester-5                                                 	      73	  16094121 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_3_zones,_1_series_per_ingester-5                                                     	  421899	      2864 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_3_zones,_10_series_per_ingester-5                                                    	   57614	     21412 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_3_zones,_100_series_per_ingester-5                                                   	    5595	    203138 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_3_zones,_1000_series_per_ingester-5                                                  	     594	   2269783 ns/op
BenchmarkMergingAndSortingSeries/4_ingesters_per_zone,_3_zones,_10000_series_per_ingester-5                                                 	      56	  24281851 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_1_zones,_1_series_per_ingester-5                                                    	  371364	      3387 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_1_zones,_10_series_per_ingester-5                                                   	   47996	     24618 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_1_zones,_100_series_per_ingester-5                                                  	    5904	    233670 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_1_zones,_1000_series_per_ingester-5                                                 	     416	   2896071 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_1_zones,_10000_series_per_ingester-5                                                	      39	  34400046 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_2_zones,_1_series_per_ingester-5                                                    	  182092	      6210 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_2_zones,_10_series_per_ingester-5                                                   	   28605	     42079 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_2_zones,_100_series_per_ingester-5                                                  	    2960	    430583 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_2_zones,_1000_series_per_ingester-5                                                 	     244	   5038108 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_2_zones,_10000_series_per_ingester-5                                                	      20	  55193958 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_3_zones,_1_series_per_ingester-5                                                    	  140899	      8812 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_3_zones,_10_series_per_ingester-5                                                   	   18117	     63684 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_3_zones,_100_series_per_ingester-5                                                  	    1992	    630498 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_3_zones,_1000_series_per_ingester-5                                                 	     162	   7437378 ns/op
BenchmarkMergingAndSortingSeries/10_ingesters_per_zone,_3_zones,_10000_series_per_ingester-5                                                	      16	  75063980 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_1_zones,_1_series_per_ingester-5                                                   	   31634	     39688 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_1_zones,_10_series_per_ingester-5                                                  	    3256	    360712 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_1_zones,_100_series_per_ingester-5                                                 	     306	   4101863 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_1_zones,_1000_series_per_ingester-5                                                	      26	  46588290 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_1_zones,_10000_series_per_ingester-5                                               	       3	 470779559 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_2_zones,_1_series_per_ingester-5                                                   	   15381	     74551 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_2_zones,_10_series_per_ingester-5                                                  	    2068	    649477 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_2_zones,_100_series_per_ingester-5                                                 	     170	   7126433 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_2_zones,_1000_series_per_ingester-5                                                	      15	  76587684 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_2_zones,_10000_series_per_ingester-5                                               	       2	 769336730 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_3_zones,_1_series_per_ingester-5                                                   	   10000	    109348 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_3_zones,_10_series_per_ingester-5                                                  	    1306	   1017795 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_3_zones,_100_series_per_ingester-5                                                 	     100	  10304919 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_3_zones,_1000_series_per_ingester-5                                                	      10	 119098741 ns/op
BenchmarkMergingAndSortingSeries/100_ingesters_per_zone,_3_zones,_10000_series_per_ingester-5                                               	       1	1068979279 ns/op
FAIL
exit status 1
FAIL	github.com/grafana/mimir/pkg/distributor	324.543s

@julietteO julietteO force-pushed the dimitar/ha-dedup-on-every-sample-bench branch from 656ea69 to cbc8c2d Compare November 26, 2025 16:21
@colega
Copy link
Contributor

colega commented Nov 26, 2025

@vaxvms thank you for running the benchmarks, can I ask you to run them with -count=6 | tee old_or_new and then run benchstat old new to compare the results?

@dimitarvdimitrov
Copy link
Contributor

I see the benchmarks/tests are failing. Is it possible to run the benchmarks without them failing? Also can you run them with -count=6 and then post the result after you've ran them through benchstat

@colega
Copy link
Contributor

colega commented Nov 26, 2025

Oh, @dimitarvdimitrov :)

I also see that there are benchmark results with -count=6 attached in the PR (which should be removed but they're the thing we want to see)

@colega colega closed this Nov 26, 2025
@colega colega reopened this Nov 26, 2025
@colega
Copy link
Contributor

colega commented Nov 26, 2025

(Sorry I was so excited that I clicked the wrong button, maybe enough caffeine for me today)

BenchmarkDistributor_Push/too_many_labels_limit_reached-10 2209 540531 ns/op 135548 B/op 3234 allocs/op
PASS

Process finished with the exit code 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Benchmark output files accidentally committed to repository

The files benchmarks/after.txt and benchmarks/before.txt contain local benchmark output with machine-specific paths (e.g., /Users/dimitar/Library/Caches/JetBrains/GoLand2023.2/...). These appear to be development artifacts that were unintentionally included in the commit.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

committed on purpose

@vaxvms
Copy link
Collaborator

vaxvms commented Nov 26, 2025

@colega Thanks for pointing me the procedure to run benchs.

failures seems to be different between main and the feature branch, fixing

@colega
Copy link
Contributor

colega commented Nov 26, 2025

I'll take a look once you've fixed the CI & checked Cursor's comments (at least one of them, about cost attribution, seems legit). Of course feel free to challenge Cursor's statements :)

@julietteO julietteO force-pushed the dimitar/ha-dedup-on-every-sample-bench branch from cbc8c2d to 7c6a812 Compare November 27, 2025 08:58
if len(req.Timeseries) > 0 {
err = next(ctx, pushReq)
}
errs.Add(err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Duplicate error added to multierror when all samples rejected

When all samples in a request are rejected (all replicas deduped or rejected), the last error from replicaObserved gets added to the multierror twice. Inside the loop at line 1292-1294, err is assigned from replicaObserved and immediately added to errs. After the loop, if len(req.Timeseries) == 0 (all samples rejected), the next() call is skipped, leaving err holding the last loop error. Line 1331 then adds this same error again via errs.Add(err). This causes duplicate errors in the returned multierror, potentially resulting in confusing error messages and logs.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now

@julietteO julietteO force-pushed the dimitar/ha-dedup-on-every-sample-bench branch from 7c6a812 to 513a4a2 Compare November 27, 2025 16:20
@julietteO julietteO force-pushed the dimitar/ha-dedup-on-every-sample-bench branch from 513a4a2 to bc2e330 Compare November 27, 2025 16:31
@colega colega self-requested a review November 27, 2025 16:35
CHANGELOG.md Outdated
* [ENHANCEMENT] OTLP: Add metric `cortex_distributor_otlp_requests_by_content_type_total` to track content type (json or proto) of OTLP packets. #13525
* [ENHANCEMENT] OTLP: Add experimental metric `cortex_distributor_otlp_array_lengths` to better understand the layout of OTLP packets in practice. #13525
* [ENHANCEMENT] Ruler: gRPC errors without details are classified as `operator` errors, and rule evaluation failures (such as duplicate labelsets) are classified as `user` errors. #13586
* [ENHANCEMENT] HA: Deduplication per sample instead of per batch. #13665
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [ENHANCEMENT] HA: Deduplication per sample instead of per batch. #13665
* [ENHANCEMENT] HA: Deduplicate per sample instead of per batch. #13665

Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog looks good! I made a small suggestion.

colega pushed a commit that referenced this pull request Nov 28, 2025
…tributor push benchmark (#13688)

change introduced by PR #12583

#### What this PR does

Fix benchmark failure

#### Which issue(s) this PR fixes or relates to

Fixes distributor benchmark to run them for #13665

#### Checklist

- [X] Tests updated.
- [ ] Documentation added.
- [ ] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [ ]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Updates the expected error string for max label value length to
include actual (204) and limit (200) values in
`BenchmarkDistributor_Push`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
743cb98. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
@julietteO julietteO force-pushed the dimitar/ha-dedup-on-every-sample-bench branch from bc2e330 to b4dd196 Compare November 28, 2025 16:25
@julietteO julietteO force-pushed the dimitar/ha-dedup-on-every-sample-bench branch from b4dd196 to 0a7033a Compare November 28, 2025 16:44
@colega
Copy link
Contributor

colega commented Dec 1, 2025

I would suggest you to run make lint (and maybe make test, although that's longer) on your machine so we don't have to wait for our approval on the CI every time you push.

@dimitarvdimitrov
Copy link
Contributor

cursor also left a few comments, which look legit, can you take a look at those too?

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
dimitarvdimitrov and others added 12 commits December 1, 2025 16:43
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/distributor
BenchmarkDistributor_Push
BenchmarkDistributor_Push/max_label_name_length_limit_reached
BenchmarkDistributor_Push/max_label_name_length_limit_reached-10         	     121	   9844396 ns/op	  124584 B/op	    2158 allocs/op
BenchmarkDistributor_Push/max_label_value_length_limit_reached
BenchmarkDistributor_Push/max_label_value_length_limit_reached-10        	     153	   7800602 ns/op	  109439 B/op	    2158 allocs/op
BenchmarkDistributor_Push/timestamp_too_new
BenchmarkDistributor_Push/timestamp_too_new-10                           	     243	   5008468 ns/op	   89449 B/op	    2085 allocs/op
BenchmarkDistributor_Push/HA_dedup;_all_samples_same_replica
BenchmarkDistributor_Push/HA_dedup;_all_samples_same_replica-10          	    1263	    924608 ns/op	  150549 B/op	      43 allocs/op
BenchmarkDistributor_Push/HA_dedup;_4_clusters_8_replicas_evenly_split
BenchmarkDistributor_Push/HA_dedup;_4_clusters_8_replicas_evenly_split-10         	    1407	    806527 ns/op	   85525 B/op	      99 allocs/op
BenchmarkDistributor_Push/all_samples_successfully_pushed
BenchmarkDistributor_Push/all_samples_successfully_pushed-10                      	    1398	    809442 ns/op	  150418 B/op	      42 allocs/op
BenchmarkDistributor_Push/ingestion_rate_limit_reached
BenchmarkDistributor_Push/ingestion_rate_limit_reached-10                         	    1965	    603205 ns/op	   25324 B/op	      47 allocs/op
BenchmarkDistributor_Push/too_many_labels_limit_reached
BenchmarkDistributor_Push/too_many_labels_limit_reached-10                        	     213	   5744598 ns/op	   80979 B/op	    2188 allocs/op
PASS

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
Co-authored-by: Nicolas DUPEUX <nicolas.dupeux@corp.ovh.com>
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
Co-authored-by: Nicolas DUPEUX <nicolas.dupeux@ovhcloud.com>
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
@julietteO julietteO force-pushed the dimitar/ha-dedup-on-every-sample-bench branch from 12b4d15 to 61e13ea Compare December 1, 2025 15:44
Comment on lines +114 to +117
Distributor_Push/cost_attribution=disabled/scenario=HA_dedup;_all_samples_same_replica-5 1.934m ± 4%
Distributor_Push/cost_attribution=disabled/scenario=HA_dedup;_4_clusters_8_replicas_evenly_split-5 1.971m ± 4%
Distributor_Push/cost_attribution=enabled/scenario=HA_dedup;_all_samples_same_replica-5 2.078m ± 4%
Distributor_Push/cost_attribution=enabled/scenario=HA_dedup;_4_clusters_8_replicas_evenly_split-5 2.167m ± 4%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to run these with the previous implementation too? i understand that they won't do the same, but it would show us how much slower the new path is

Comment on lines 2416 to 2435
switch i % 8 {
case 0:
cluster, replica = "c1", "r1"
case 1:
cluster, replica = "c1", "r2"
case 2:
cluster, replica = "c2", "r1"
case 3:
cluster, replica = "c2", "r2"
case 4:
cluster, replica = "c3", "r1"
case 5:
cluster, replica = "c3", "r2"
case 6:
cluster, replica = "c4", "r1"
case 7:
cluster, replica = "c4", "r2"
default:
panic("in the disco")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick to make this cluster, replica = i/2 + 1, i%2 + 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you delete these before merging?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for running the benchmarks. it looks like there's nothing to worry about, but imo worth running the cases with HA tracker without your diff too.

normally it's enough to post this as a comment, but committed works too. Just don't forget to delete it before this PR is merged

// replicaObserved checks if a sample from a given replica should be accepted for ingestion based on HA deduplication rules.
//
// Returns a replicaState indicating the acceptance status and classification of the replica:
// - replicaIsPrimary: sample is from the elected primary replica and should be accepted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would move these descriptions as a godoc comment on each enum value

haReplicaLabel := d.limits.HAReplicaLabel(userID)
cluster, replica := findHALabels(haReplicaLabel, d.limits.HAClusterLabel(userID), req.Timeseries[0].Labels)
haClusterLabel := d.limits.HAClusterLabel(userID)
cluster, replica := findHALabels(haReplicaLabel, haClusterLabel, req.Timeseries[0].Labels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's a bit confusing that we log only the first replicas even though we'll now ingest data from multiple replicas. can we add the tracing instrumentation further down as we're looping over the replica states?

// These samples have been deduped.
d.dedupedSamples.WithLabelValues(userID, cluster).Add(float64(numSamples))
var errs multierror.MultiError
replicaInfos := make(map[haReplica]*replicaInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a helper function which returns you replicaInfos?

d.dedupedSamples.WithLabelValues(userID, cluster).Add(float64(numSamples))
var errs multierror.MultiError
replicaInfos := make(map[haReplica]*replicaInfo)
samplesPerState := make(map[replicaState]int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move all of the logic for building samplesPerState (purely based on replicaInfos as far as i can tell) and then using it (updating metrics from what i can see) to a function of their own? or maybe add the creation of samplesPerState (say, call it countSamplesPerState()) as a first step in updateHADedupeMetrics . you'd need to change the if which now contains the two calls to updateHADedupeMetrics - that you may be able to do based on whether lastAccepted is a valid index or not. will that work?

samplesPerState := make(map[replicaState]int)
// Check if all timeseries belong to the same replica
firstReplica := getReplicaForSample(0)
isOneReplica := true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we can move all the logic of getReplicaForSample, replicaInfos and getReplicaState into a struct of their own? something like

type replicaInfos struct {
    isOneReplica bool
    theOneReplica *haReplica
    multieplReplicas map[haReplica]*replicaInfo
    // maybe more things?
}

func newReplicaInfos(Timeseries []PreallocTimeseries) replicaInfos {...}

func (replicaInfos) replicaState() {...}

func (replicaInfos) replicaForSample() {...}

maybe even add the countSamplesPerState i suggested above

func (replicaInfos) countSamplesPerState() map[replicaState]int {...}

},
expectDetails: []*mimirpb.ErrorDetails{nil, replicasDidNotMatchDetails, tooManyClusterDetails, tooManyClusterDetails},
}, {
name: "perform partial HA deduplication",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a few more tests cases? for example where we have mixed series from multiple primary replicas in a single request and where we have mixed with series which don't have a cluster label.

@dimitarvdimitrov
Copy link
Contributor

thanks for rebasing this! i'm sorry if most of this code is code i already wrote, but i think it needs a bit of cleanup before we can merge it 😅

@dimitarvdimitrov
Copy link
Contributor

i forgot to mention - can you also update docs/sources/mimir/configure/configure-high-availability-deduplication.md:48 to mention that now we check all samples in a batch?

vaxvms and others added 9 commits December 3, 2025 15:21
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
We loose the cluster, replica pairing, might be a bad idea
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
Signed-off-by: juliette.orain <juliette.orain@ovhcloud.com>
@dimitarvdimitrov
Copy link
Contributor

@vaxvms @julietteO is this ready for another pass or is there still things you want to address?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HA deduplication per-sample instead of per-batch

6 participants