-
Notifications
You must be signed in to change notification settings - Fork 820
Add documentation for HA tracker flags. #1465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you add a couple words about what is necessary beside the available flags? E. g. setting the external labels in prometheus. What happens if you enable the HA config in Cortex but your tenant/user does not set the replica flag or sets it incorrectly (wrong property name)? |
One thing that struck me. Why is it not nested as |
docs/arguments.md
Outdated
@@ -92,6 +92,38 @@ The ingester query API was improved over time, but defaults to the old behaviour | |||
- `-distributor.extra-query-delay` | |||
This is used by a component with an embedded distributor (Querier and Ruler) to control how long to wait until sending more than the minimum amount of queries needed for a successful response. | |||
|
|||
- `distributor.accept-ha-samples` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this a per-user flag, it's global for all user no? Can we rename it to be something more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's defined in limits, so you can set an override per user. Is the description not accurate?
We can rename to anything, any suggestions as to what would be more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think calling it "per user" is very misleading. We can call it "The default value to enable handling of samples with external labels for identifying replicas in an HA Prometheus setup. This defaults to false, and is technically defined in the Distributor limits"
Also, list it in the limits section like distributor.ingestion-rate-limit
I don't remember any reason for it not being nested from the review, it might have just slipped through then. |
@cstyan Looks good for me now, if you want a really good documentation you might want to explain at a high level how the HA feature works (maybe in the architecture.md). Does it dedupe the replicas in the distributor/ingester? Should we expect an increased RAM usage in the distributor or ingester (e. g. at the factor of n where n is the number of replicas)? |
- `ha-tracker.replica` | ||
Prometheus label to look for in samples to identify a Prometheus HA replica. (default "__replica__") | ||
|
||
It's reasonable to assume people probably already have a `cluster` label, or something similar. If not, they should add one along with `__replica__` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use external labels in our prometheus config, as each prometheus gets a different UserID / TenantID. Does Cortex still require the "cluster" external label then if I only have one prometheus cluster per tenant or does it use the default "cluster" if no label is specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both labels have to be present for Cortex to dedup samples from HA prometheus instances. Whether the labels are already in your series or added via external labels doesn't matter. Cortex just needs a way of identifying each HA cluster, and each replica within that cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming I am just about to enable the HA tracker feature and I have a single prometheus instance for each tenant/user ID which does not set either of these two labels. Does this cause any problems, will it reject samples without these labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small last comments then LGTM
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Could use another review @csmarchbanks Added a note about HA tracking in the architecture doc, and I also added info about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple small spelling/wording fixes, then this LGTM!
Signed-off-by: Callum Styan <callumstyan@gmail.com>
bump @csmarchbanks @gouthamve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed the spelling updates! This LGTM.
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
This LGTM too. I see enough LGTMs on this to merge now. |
Do we want to include the kvstore flags that are nested under
ha-tracker
?@gouthamve @tomwilkie