Skip to content

Add documentation for HA tracker flags. #1465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 31, 2019
Merged

Add documentation for HA tracker flags. #1465

merged 8 commits into from
Jul 31, 2019

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented Jun 13, 2019

Do we want to include the kvstore flags that are nested under ha-tracker?

@gouthamve @tomwilkie

@weeco
Copy link
Contributor

weeco commented Jun 13, 2019

Can you add a couple words about what is necessary beside the available flags?

E. g. setting the external labels in prometheus. What happens if you enable the HA config in Cortex but your tenant/user does not set the replica flag or sets it incorrectly (wrong property name)?

@gouthamve
Copy link
Contributor

One thing that struck me. Why is it not nested as distributor.ha-tracker?

@@ -92,6 +92,38 @@ The ingester query API was improved over time, but defaults to the old behaviour
- `-distributor.extra-query-delay`
This is used by a component with an embedded distributor (Querier and Ruler) to control how long to wait until sending more than the minimum amount of queries needed for a successful response.

- `distributor.accept-ha-samples`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this a per-user flag, it's global for all user no? Can we rename it to be something more descriptive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defined in limits, so you can set an override per user. Is the description not accurate?

We can rename to anything, any suggestions as to what would be more descriptive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling it "per user" is very misleading. We can call it "The default value to enable handling of samples with external labels for identifying replicas in an HA Prometheus setup. This defaults to false, and is technically defined in the Distributor limits"

Also, list it in the limits section like distributor.ingestion-rate-limit

@cstyan
Copy link
Contributor Author

cstyan commented Jun 13, 2019

Why is it not nested as distributor.ha-tracker?

I don't remember any reason for it not being nested from the review, it might have just slipped through then.

@weeco
Copy link
Contributor

weeco commented Jun 20, 2019

@cstyan Looks good for me now, if you want a really good documentation you might want to explain at a high level how the HA feature works (maybe in the architecture.md). Does it dedupe the replicas in the distributor/ingester? Should we expect an increased RAM usage in the distributor or ingester (e. g. at the factor of n where n is the number of replicas)?

- `ha-tracker.replica`
Prometheus label to look for in samples to identify a Prometheus HA replica. (default "__replica__")

It's reasonable to assume people probably already have a `cluster` label, or something similar. If not, they should add one along with `__replica__`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use external labels in our prometheus config, as each prometheus gets a different UserID / TenantID. Does Cortex still require the "cluster" external label then if I only have one prometheus cluster per tenant or does it use the default "cluster" if no label is specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both labels have to be present for Cortex to dedup samples from HA prometheus instances. Whether the labels are already in your series or added via external labels doesn't matter. Cortex just needs a way of identifying each HA cluster, and each replica within that cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I am just about to enable the HA tracker feature and I have a single prometheus instance for each tenant/user ID which does not set either of these two labels. Does this cause any problems, will it reject samples without these labels?

Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small last comments then LGTM

cstyan added 5 commits July 11, 2019 18:32
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan
Copy link
Contributor Author

cstyan commented Jul 12, 2019

Could use another review @csmarchbanks
cc @tomwilkie

Added a note about HA tracking in the architecture doc, and I also added info about the store flags in arguments.md since there weren't any in there.

Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small spelling/wording fixes, then this LGTM!

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan
Copy link
Contributor Author

cstyan commented Jul 29, 2019

bump @csmarchbanks @gouthamve

csmarchbanks
csmarchbanks previously approved these changes Jul 29, 2019
Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed the spelling updates! This LGTM.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
@csmarchbanks csmarchbanks self-requested a review July 30, 2019 15:14
@csmarchbanks csmarchbanks dismissed their stale review July 30, 2019 15:15

Changes since review

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
@tomwilkie
Copy link
Contributor

This LGTM too. I see enough LGTMs on this to merge now.

@tomwilkie tomwilkie merged commit 89116fc into cortexproject:master Jul 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants