You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add documentation for HA tracker args.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix misspelling of Prometheus in HA replica/cluster flags.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Rename accept-ha-samples to enable-ha-tracker
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Update args docs based on review and addition of etcd.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add note in architecture.md about HA tracking.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Specify that certain flags can/should be prefixed with ring/ha-tracker.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Move -ha-tracker.* to -distributor.ha-tracker.*
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
* Make the flags part of ha-tracker and explicit
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Copy file name to clipboardExpand all lines: docs/architecture.md
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -24,6 +24,8 @@ The **distributor** service is responsible for handling samples written by Prome
24
24
25
25
Distributors communicate with ingesters via [gRPC](https://grpc.io). They are stateless and can be scaled up and down as needed.
26
26
27
+
If the HA Tracker is enabled, the Distributor will deduplicate incoming samples that contain both a cluster and replica label. It talks to a KVStore to store state about which replica per cluster it's accepting samples from for a given user ID. Samples with one or neither of these labels will be accepted by default.
28
+
27
29
#### Hashing
28
30
29
31
Distributors use consistent hashing, in conjunction with the (configurable) replication factor, to determine *which* instances of the ingester service receive each sample.
@@ -147,4 +149,4 @@ The interface works somewhat differently across the supported databases:
147
149
148
150
A set of schemas are used to map the matchers and label sets used on reads and writes to the chunk store into appropriate operations on the index. Schemas have been added as Cortex has evolved, mainly in an attempt to better load balance writes and improve query performance.
149
151
150
-
> The current schema recommendation is the **v10 schema**.
152
+
> The current schema recommendation is the **v10 schema**.
Copy file name to clipboardExpand all lines: docs/arguments.md
+62-1Lines changed: 62 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -92,6 +92,67 @@ The ingester query API was improved over time, but defaults to the old behaviour
92
92
-`-distributor.extra-query-delay`
93
93
This is used by a component with an embedded distributor (Querier and Ruler) to control how long to wait until sending more than the minimum amount of queries needed for a successful response.
94
94
95
+
-`distributor.ha-tracker.enable-for-all-users`
96
+
Flag to enable, for all users, handling of samples with external labels identifying replicas in an HA Prometheus setup. This defaults to false, and is technically defined in the Distributor limits.
97
+
98
+
-`distributor.ha-tracker.enable`
99
+
Enable the distributors HA tracker so that it can accept samples from Prometheus HA replicas gracefully (requires labels). Global (for distributors), this ensures that the necessary internal data structures for the HA handling are created. The option `enable-for-all-users` is still needed to enable ingestion of HA samples for all users.
100
+
101
+
### Ring/HA Tracker Store
102
+
103
+
The KVStore client is used by both the Ring and HA Tracker.
104
+
-`{ring,distributor.ha-tracker}.prefix`
105
+
The prefix for the keys in the store. Should end with a /. For example with a prefix of foo/, the key bar would be stored under foo/bar.
106
+
-`{ring,distributor.ha-tracker}.store`
107
+
Backend storage to use for the ring (consul, etcd, inmemory).
108
+
109
+
#### Consul
110
+
111
+
By default these flags are used to configure Consul used for the ring. To configure Consul for the HA tracker,
112
+
prefix these flags with `distributor.ha-tracker.`
113
+
114
+
-`consul.hostname`
115
+
Hostname and port of Consul.
116
+
-`consul.acltoken`
117
+
ACL token used to interact with Consul.
118
+
-`consul.client-timeout`
119
+
HTTP timeout when talking to Consul.
120
+
-`consul.consistent-reads`
121
+
Enable consistent reads to Consul.
122
+
123
+
#### etcd
124
+
125
+
By default these flags are used to configure etcd used for the ring. To configure etcd for the HA tracker,
126
+
prefix these flags with `distributor.ha-tracker.`
127
+
128
+
-`etcd.endpoints`
129
+
The etcd endpoints to connect to.
130
+
-`etcd.dial-timeout`
131
+
The timeout for the etcd connection.
132
+
-`etcd.max-retries`
133
+
The maximum number of retries to do for failed ops.
134
+
135
+
### HA Tracker
136
+
137
+
HA tracking has two of it's own flags:
138
+
-`distributor.ha-tracker.cluster`
139
+
Prometheus label to look for in samples to identify a Prometheus HA cluster. (default "cluster")
140
+
-`distributor.ha-tracker.replica`
141
+
Prometheus label to look for in samples to identify a Prometheus HA replica. (default "__replica__")
142
+
143
+
It's reasonable to assume people probably already have a `cluster` label, or something similar. If not, they should add one along with `__replica__`
144
+
via external labels in their Prometheus config.
145
+
146
+
HA Tracking looks for the two labels (which can be overwritten per user)
147
+
148
+
It also talks to a KVStore and has it's own copies of the same flags used by the Distributor to connect to for the ring.
149
+
-`distributor.ha-tracker.failover-timeout`
150
+
If we don't receive any samples from the accepted replica for a cluster in this amount of time we will failover to the next replica we receive a sample from. This value must be greater than the update timeout (default 30s)
151
+
-`distributor.ha-tracker.store`
152
+
Backend storage to use for the ring (consul, etcd, inmemory). (default "consul")
153
+
-`distributor.ha-tracker.update-timeout`
154
+
Update the timestamp in the KV store for a given cluster/replica only after this amount of time has passed since the current stored timestamp. (default 15s)
155
+
95
156
## Ingester
96
157
97
158
-`-ingester.normalise-tokens`
@@ -185,4 +246,4 @@ Valid fields are (with their corresponding flags for default values):
185
246
186
247
- `s3.force-path-style`
187
248
188
-
Set this to `true` to force the request to use path-style addressing (`http://s3.amazonaws.com/BUCKET/KEY`). By default, the S3 client will use virtual hosted bucket addressing when possible (`http://BUCKET.s3.amazonaws.com/KEY`).
249
+
Set this to `true` to force the request to use path-style addressing (`http://s3.amazonaws.com/BUCKET/KEY`). By default, the S3 client will use virtual hosted bucket addressing when possible (`http://BUCKET.s3.amazonaws.com/KEY`).
f.BoolVar(&cfg.EnableBilling, "distributor.enable-billing", false, "Report number of ingested samples to billing system.")
132
-
f.BoolVar(&cfg.EnableHAReplicas, "distributor.accept-ha-labels", false, "Accept samples from Prometheus HA replicas gracefully (requires labels).")
132
+
f.BoolVar(&cfg.EnableHATracker, "distributor.ha-tracker.enable", false, "Enable the distributors HA tracker so that it can accept samples from Prometheus HA replicas gracefully (requires labels).")
133
133
f.DurationVar(&cfg.RemoteTimeout, "distributor.remote-timeout", 2*time.Second, "Timeout for downstream ingesters.")
134
134
f.DurationVar(&cfg.ExtraQueryDelay, "distributor.extra-query-delay", 0, "Time to wait before sending more than the minimum successful query requests.")
135
135
f.DurationVar(&cfg.LimiterReloadPeriod, "distributor.limiter-reload-period", 5*time.Minute, "Period at which to reload user ingestion limits.")
"Update the timestamp in the KV store for a given cluster/replica only after this amount of time has passed since the current stored timestamp.")
74
74
f.DurationVar(&cfg.FailoverTimeout,
75
-
"ha-tracker.failover-timeout",
75
+
"distributor.ha-tracker.failover-timeout",
76
76
30*time.Second,
77
77
"If we don't receive any samples from the accepted replica for a cluster in this amount of time we will failover to the next replica we receive a sample from. This value must be greater than the update timeout")
78
78
// We want the ability to use different Consul instances for the ring and for HA cluster tracking.
Copy file name to clipboardExpand all lines: pkg/util/validation/limits.go
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -43,9 +43,9 @@ type Limits struct {
43
43
func (l*Limits) RegisterFlags(f*flag.FlagSet) {
44
44
f.Float64Var(&l.IngestionRate, "distributor.ingestion-rate-limit", 25000, "Per-user ingestion rate limit in samples per second.")
45
45
f.IntVar(&l.IngestionBurstSize, "distributor.ingestion-burst-size", 50000, "Per-user allowed ingestion burst size (in number of samples). Warning, very high limits will be reset every -distributor.limiter-reload-period.")
46
-
f.BoolVar(&l.AcceptHASamples, "distributor.accept-ha-samples", false, "Per-user flag to enablehandling of samples with external labels for identifying replicas in an HA Prometheus setup.")
47
-
f.StringVar(&l.HAReplicaLabel, "ha-tracker.replica", "__replica__", "Prometheus label to look for in samples to identify a Proemtheus HA replica.")
48
-
f.StringVar(&l.HAClusterLabel, "ha-tracker.cluster", "cluster", "Prometheus label to look for in samples to identify a Poemtheus HA cluster.")
46
+
f.BoolVar(&l.AcceptHASamples, "distributor.ha-tracker.enable-for-all-users", false, "Flag to enable, for all users, handling of samples with external labels identifying replicas in an HA Prometheus setup.")
47
+
f.StringVar(&l.HAReplicaLabel, "distributor.ha-tracker.replica", "__replica__", "Prometheus label to look for in samples to identify a Prometheus HA replica.")
48
+
f.StringVar(&l.HAClusterLabel, "distributor.ha-tracker.cluster", "cluster", "Prometheus label to look for in samples to identify a Prometheus HA cluster.")
49
49
f.IntVar(&l.MaxLabelNameLength, "validation.max-length-label-name", 1024, "Maximum length accepted for label names")
50
50
f.IntVar(&l.MaxLabelValueLength, "validation.max-length-label-value", 2048, "Maximum length accepted for label value. This setting also applies to the metric name")
51
51
f.IntVar(&l.MaxLabelNamesPerSeries, "validation.max-label-names-per-series", 30, "Maximum number of label names per series.")
0 commit comments