Skip to content

Fail to startup Cortex if provided runtime config is invalid #3707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pracucci
Copy link
Contributor

What this PR does:
Currently, Cortex can successfully startup if the provided runtime config (typically limits overrides) is invalid. We recently had an incident because of this: we rolled out ingesters with an invalid runtime config, ingesters start up correctly, but overrides were missing causing an impact on customers. Other alerts fired pretty soon, so the impact of the incident was limited.

As a retrospective, I believe we should fast fail Cortex at startup if the provided runtime config is invalid. For example, in the case of a rolling update, ingesters would stop to rollout until fixed.

I haven't changed the behaviour of an invalid config loaded while Cortex is running because, in that case, we keep the previous config and we definitely don't want to shutdown all ingesters because of this.

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing it to fail makes sense. This behaviour was originally introduced by @bboreham in 55697b5.

Edit: apologies for poor wording. I wanted to make sure that there are no objections to the PR.

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bboreham
Copy link
Contributor

This behaviour was originally introduced by @bboreham

Disagree.

@bboreham
Copy link
Contributor

The removal of an error check on startup happens here: 870fe73#diff-f7e299904986266ff7106a24f1f542c4cab3fc980a401e416ab714627ba6585aL56-L58

@bboreham
Copy link
Contributor

Fixes #3031

@pstibrany
Copy link
Contributor

The removal of an error check on startup happens here: 870fe73#diff-f7e299904986266ff7106a24f1f542c4cab3fc980a401e416ab714627ba6585aL56-L58

👍 You're right. In that case I don't think there is any disagreement that we should merge this.

@pstibrany pstibrany merged commit 995542a into cortexproject:master Jan 19, 2021
@pracucci pracucci deleted the fast-fail-on-invalid-runtime-config-at-startup branch January 19, 2021 10:09
gouthamve pushed a commit to grafana/cortex that referenced this pull request Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants