Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mixins] Option for a custom cluster label #1651

Merged
merged 10 commits into from
Apr 13, 2022

Conversation

Whyeasy
Copy link
Contributor

@Whyeasy Whyeasy commented Apr 7, 2022

What this PR does

This PR makes it possible to provide another label that indicates your cluster. The default is set to cluster and it shouldn't break any of the current rules or dashboards.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Whyeasy <Whyeasy@users.noreply.github.com>
Signed-off-by: Whyeasy <Whyeasy@users.noreply.github.com>
Signed-off-by: Whyeasy <Whyeasy@users.noreply.github.com>
Signed-off-by: Whyeasy <Whyeasy@users.noreply.github.com>
Signed-off-by: Whyeasy <Whyeasy@users.noreply.github.com>
Signed-off-by: Whyeasy <Whyeasy@users.noreply.github.com>
@CLAassistant
Copy link

CLAassistant commented Apr 7, 2022

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I left a couple of high-level comments on the direction where I think we should go to, instead of adding another config option which could add extra burden to maintain over time. Thanks!

@@ -479,7 +479,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
rules: [
{
// cortex_ingester_ingested_samples_total is per user, in this rule we want to see the sum per cluster/namespace/instance
record: 'cluster_namespace_%s:cortex_ingester_ingested_samples_total:rate1m' % $._config.per_instance_label,
record: '%s_namespace_%s:cortex_ingester_ingested_samples_total:rate1m' % [$._config.clusterLabel, $._config.per_instance_label],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual aggregation is dynamic and based on alert_aggregation_labels. It's only the recording rule name that reference cluster_namespace hardcoded. Instead of using $._config.clusterLabel here, we should build the recording rule prefix based on alert_aggregation_labels. See this related issue: #1632

@@ -35,9 +35,12 @@
overrides_exporter: 'overrides-exporter',
},

// Custom Cluster label to use in dashboards.
clusterLabel: 'cluster',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should introduce this, but we should just keep using job_labels and cluster_labels everywhere. For example, cluster_labels is already used to build alert_aggregation_labels which is then used in alerts and recording rules. However, not all of them are using it so I think we should just keep working to make sure they're used everywhere (see few other comments I left).

Copy link
Contributor Author

@Whyeasy Whyeasy Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will start working on using alert_aggregation_labels.

But what's your idea on the approach we should take here:

.addMultiTemplate('cluster', 'cortex_build_info', '%s' % $._config.clusterLabel)
. Because this selector should be updated accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the difficulty of getting rid of this config field at all. Ok, let's keep it. Can I just ask you to rename clusterLabel to per_cluster_label to keep it consistent with per_instance_label and per_node_label, please?

The comment could be something like:

// The label used to differentiate between different Kubernetes clusters.

Signed-off-by: Whyeasy <Whyeasy@users.noreply.github.com>
@Whyeasy
Copy link
Contributor Author

Whyeasy commented Apr 8, 2022

Added alert_aggregation_rule_prefix at the places where it was previously cluster_namespace. Same goes for the group by cases.

What would you preferred direction be for cases like these:

.addMultiTemplate('cluster', 'cortex_build_info', '%s' % $._config.clusterLabel)

sum(%s_job:cortex_alertmanager_alerts_invalid_total:rate5m{%s})

%(alert_aggregation_rule_prefix)s_deployment_reason:required_replicas:count{%(clusterLabel)s=~"$cluster", namespace=~"$namespace"}

https://github.com/Whyeasy/mimir/blob/7b32d0a0d97e44c3ca828005c3de4cd443b5167e/operations/mimir-mixin/recording_rules.libsonnet#L15

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

Could you update the doc at docs/sources/operators-guide/visualizing-metrics/requirements.md? It currently says the cluster label can't be customized.

@@ -41,11 +41,11 @@ local utils = import 'mixin-utils/utils.libsonnet';
$.tablePanel([
|||
sort_desc(
cluster_namespace_deployment_reason:required_replicas:count{cluster=~"$cluster", namespace=~"$namespace"}
%(alert_aggregation_rule_prefix)s_deployment_reason:required_replicas:count{%(clusterLabel)s=~"$cluster", namespace=~"$namespace"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use namespaceMatcher() utility instead of {%(clusterLabel)s=~"$cluster", namespace=~"$namespace"}.

Same comment applies below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -35,9 +35,12 @@
overrides_exporter: 'overrides-exporter',
},

// Custom Cluster label to use in dashboards.
clusterLabel: 'cluster',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the difficulty of getting rid of this config field at all. Ok, let's keep it. Can I just ask you to rename clusterLabel to per_cluster_label to keep it consistent with per_instance_label and per_node_label, please?

The comment could be something like:

// The label used to differentiate between different Kubernetes clusters.

CHANGELOG.md Outdated
Comment on lines 36 to 38
* [ENHANCEMENT] Alerting rules: Add possibility to use a custom cluster label.
* [ENHANCEMENT] Dashboards: Add possibility to use a custom cluster label.
* [ENHANCEMENT] Recording rules: Add possibility to use a custom cluster label.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [ENHANCEMENT] Alerting rules: Add possibility to use a custom cluster label.
* [ENHANCEMENT] Dashboards: Add possibility to use a custom cluster label.
* [ENHANCEMENT] Recording rules: Add possibility to use a custom cluster label.
* [ENHANCEMENT] Added `per_cluster_label` support to allow to change the label name used to differentiate between Kubernetes clusters. #1651

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@pracucci
Copy link
Collaborator

Could you also run make check-mixin? We expect no diff in the compiled output (there are just few differences to fix).

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants