Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] Long term fix for kube-prometheus-stack release size issues #3548

Closed
jkroepke opened this issue Jul 3, 2023 · 21 comments · Fixed by #3547
Closed
Labels
enhancement New feature or request lifecycle/frozen

Comments

@jkroepke
Copy link
Member

jkroepke commented Jul 3, 2023

Hi @prometheus-community/helm-charts-maintainers,

I would like to invite some of you to collect some ideas to solve the issues around the release size of kube-prometheus-stack.

#3083 introduced the issue again, which could the resolved once more by minify the dashboards again.

However, if Prometheus Operator increases the sizes of the CRDs again, we may hit this issue soon.

I do some analysis of the current release data, based on #3546. Here are the steps that I do to gain some reproducable resultes:

helm upgrade -i kube-prometheus-stack . --set windowsMonitoring.enabled=true
kubectl get secrets sh.helm.release.v1.kube-prometheus-stack.v2 -o jsonpath='{.data.release}' | base64 -d > release
ls -l release                                                                                                      
-rw-r--r--  1 jkr  staff  1031276 Jul  4 00:03 release

By enable the windows monitoring feature from #3083, the release size is 1031276 (release name: kube-prometheus-stack / namespace: default). Without the windows monitoring, it's 1025044.

Both values are close to the hard-limit of 1048576 bytes.

To gain the full raw data of the helm release, run

kubectl get secrets sh.helm.release.v1.kube-prometheus-stack.v2 -o jsonpath='{.data.release}' | base64 -d | base64 -d | zcat - > release.json

This give us the raw JSON data of the helm release. Then I upload the data to an json size analyser:

image

Most of the release size is files section:

% jq -r '.chart.files[] | (.data | length | tostring) + ": " + .name' release.json | sort -n
532: .helmignore
876: CONTRIBUTING.md
8556: crds/crd-prometheusrules.yaml
24124: crds/crd-scrapeconfigs.yaml
47600: crds/crd-podmonitors.yaml
49656: crds/crd-probes.yaml
50024: crds/crd-servicemonitors.yaml
80636: README.md
360328: crds/crd-alertmanagerconfigs.yaml
565424: crds/crd-thanosrulers.yaml
600740: crds/crd-alertmanagers.yaml
674428: crds/crd-prometheusagents.yaml
773992: crds/crd-prometheuses.yaml

Add md files to helmignore.

As we see, the README.md is included which is not necessary. I did a test with exclude the 2 README files (add them to the .helmignore file) which only sizes 11KB here 1031276 -> 1010524

0.9% savings


Minify CRD files.

The size of the CRDs are pretty huge. In theory, a lot of data could be saved by store them as minified JSON, too.

Minify all YAML files

for i in *.yaml; do
  yq -o=json -I=0 $i > ${i/yaml/json} 
  rm $i
done 

Would result into a much lower release size. 1031276 -> 851536 bytes

image
jq -r '.chart.files[] | (.data | length | tostring) + ": " + .name' release.json | sort -n
568: .helmignore
5656: crds/crd-prometheusrules.json
15260: crds/crd-scrapeconfigs.json
26808: crds/crd-podmonitors.json
28424: crds/crd-servicemonitors.json
29220: crds/crd-probes.json
159628: crds/crd-alertmanagerconfigs.json
321280: crds/crd-thanosrulers.json
337412: crds/crd-alertmanagers.json
385084: crds/crd-prometheusagents.json
442848: crds/crd-prometheuses.json

17.4% savings.

Risks: Unknown behavior in ArgoCD/FluxCD.


Move CRD files into a sub-chart.

After reading helm/helm#11493, it seems like that helm has a different behavior for sub-charts. We all love Helm.

I tried this out by moving the CRDs to an unmanaged sub-chart, named kube-prometheus-stack-crds:

See #3547, for an POC.

Results:

Running helm install still install all CRDs on a fresh installation. However, they are not longer included into the helm release. This is fine, since helm does not manage CRD anyways.

The CRD files are still included into the chart package (helm package), which is not the case by using the .helmignore file.

image

As result, the total release size after moving the CRD is reduced now: 1031276 -> 392092 bytes

61% savings.


My proposal would be to continue the POC from #3547. I would like to see some things from some maintainers. I learned that FluxCD has a special handling for CRDs and it would be interest into it, if this approach is still compatible with FluxCD.

This would finally resolve the Too long: must have at most 1048576 bytes issue forever and we have tons of new space for new manifests. If kube-prometheus-stack hits the release limit on some point in the future again, we could think about todo the same for the dashboard manifests. But for now, moving the CRDs into a sub-chart looks like the easiest one.

@GMartinez-Sisti
Copy link
Member

Hi @jkroepke, thank you for this great explanation and discovery process.

I'm in favour of moving the CRDs to a sub-chart, it's easier to manage than having to update them manually as we have to do now. Regarding FluxCD, unfortunately I'm not a user so I don't have any experience with it.

@SuperQ
Copy link
Contributor

SuperQ commented Jul 4, 2023

I did some digging around, it doesn't seem like helm has caught up with Server-Side Apply features in Kubernetes.

@jkroepke
Copy link
Member Author

jkroepke commented Jul 4, 2023

@GMartinez-Sisti do you talk about an dedicated helm chart, like https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-operator-crds or an „unmanaged“ sub-chart like #3547?

Mention that we can not use https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-operator-crds in this context, since the CRDs must be still on the crds directory here

@GMartinez-Sisti
Copy link
Member

@GMartinez-Sisti do you talk about an dedicated helm chart, like https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-operator-crds or an „unmanaged“ sub-chart like #3547?

Mention that we can not use https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-operator-crds in this context, since the CRDs must be still on the crds directory here

I think having both a chart for CRDs and an unmanaged chart for CRDs can be kind of confusing, but I get the limitations we have. Looks like an unmanaged chart is currently our best option.

@rouke-broersma
Copy link
Contributor

We need the managed chart for applying the CRDs using ArgoCD since we need to add annotations to tell it to use server side apply, and we cannot currently turn on server side apply globally due to some missing features.

@jkroepke
Copy link
Member Author

jkroepke commented Jul 5, 2023

We need the managed chart for applying the CRDs using ArgoCD since we need to add annotations to tell it to use server side apply

But we could also add the annotations on kube-prometheus-stack as well, too?

metadata:
  annotations:
    argocd.argoproj.io/sync-options: ServerSideApply=true

@rouke-broersma
Copy link
Contributor

rouke-broersma commented Jul 5, 2023

We need the managed chart for applying the CRDs using ArgoCD since we need to add annotations to tell it to use server side apply

But we could also add the annotations on kube-prometheus-stack as well, too?

metadata:
  annotations:
    argocd.argoproj.io/sync-options: ServerSideApply=true

This was discussed back when the large description were added and the preferred install method was switched to use server side apply, and iirc the community decision was then that product-specific annotations would not be added by default. Then the new standalone chart was created with the templated crds so we can add our own annotations as needed.

We are fine with deprecating the prometheus operator crds chart if the argocd annotation is added to the crds from this new unmanaged chart if the community opinion is now changed.

@jkroepke
Copy link
Member Author

jkroepke commented Jul 5, 2023

We also can everything leave as it is and "just" copy the CRDS from crds/ to charts/prometheus-operator-crds/crds/ directory like in #3547 and we are done.

The "unmanged" sub chart prometheus-operator-crds we not exposed anywhere and would be only part of the kube-prometheus-stack.

@rouke-broersma
Copy link
Contributor

So prometheus-operator-crds would remain to exist as it is for users who need to template the crds and it would be used 'unmanaged' by kube-prometheus-stack to place the crds in the crds folder is that what you mean?

@jkroepke
Copy link
Member Author

jkroepke commented Jul 5, 2023

Correct, yes.

@jkroepke
Copy link
Member Author

jkroepke commented Jul 8, 2023

#3547 is opened for an review. More reviewer may help to guarantee that the change itself does not break something.

@rgarcia6520
Copy link

I could be misunderstanding but seems this is a breaking change. Checking out the latest version of the chart with crds.enabled=true (not modifying anything) and running helm template no CRD templates are present.

Are we expected to also install or track the CRDs separately as another chart?

@rouke-broersma
Copy link
Contributor

I could be misunderstanding but seems this is a breaking change. Checking out the latest version of the chart with crds.enabled=true (not modifying anything) and running helm template no CRD templates are present.

Are we expected to also install or track the CRDs separately as another chart?

This was a breaking change that's why the major version number was increased. The upgrade doc specifies what you need to do for this upgrade.

@rgarcia6520
Copy link

Thanks re-reading I had only seen the We do not expect any breaking changes in this version. under this specific version hence the confusion

@jkroepke
Copy link
Member Author

running helm template no CRD templates are present.

On my machine, it works fine

% helm template oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack --include-crds | grep CustomResourceDefinition
Pulled: ghcr.io/prometheus-community/charts/kube-prometheus-stack:48.3.1
Digest: sha256:c7f48ccf4070e3b26e2e489af773397be2bd045f3405679b6be601b3b0fdd8c2
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition

@rgarcia6520
Copy link

We tried updating while deploying with fluxcd v2 and it's mad no matches for any CRDs are present in the latest chart.

Cloning it down and running helm template we are seeing the same thing. I was able to finally get it to list them out with a few changes to the dependencies: part of chart and ensuring the CRDs were in crds/templates in the sub-chart.

❯ cd charts/kube-prometheus-stack
❯ ls
CONTRIBUTING.md  Chart.lock  Chart.yaml  README.md  charts  ci  hack  install  templates  unittests  values.yaml
❯ helm template monitoring ./ > install --debug
install.go:194: [debug] Original chart version: ""
install.go:211: [debug] CHART PATH: /home/ryan/kps-helm-charts/charts/kube-prometheus-stack

❯ helm template monitoring ./ | grep CustomResourceDefinition

@jkroepke
Copy link
Member Author

jkroepke commented Aug 15, 2023

What about

helm template monitoring ./ --include-crds

Maybe the FluxCD integration is not compatible with the approach here. Feels like an issue from FluxCD that CRDs from sub-charts are not supported, not sure.

But from Helm and Helm Chart point of view, there isn't a breaking change.

@rgarcia6520
Copy link

Fluxcd is compatible with that approach. It seems our issue was the chart/crds folder was not fully removed on a branch and the error was misleading as it was about duplicates but was outputting "no match or kind".

iamleot added a commit to iamleot/rpi-flux that referenced this issue Dec 29, 2023
Update to latest kube-prometheus-stack chart version.

Changes:
55.x
This version upgrades Prometheus-Operator to v0.70.0.

54.x
Grafana Helm Chart has bumped to version 7.

53.x
This version upgrades Prometheus-Operator to v0.69.1, Prometheus to 2.47.2.

52.x
This includes the ability to select between using existing secrets or
create new secret objects for various thanos config. The defaults have
not changed but if you were setting:

- `thanosRuler.thanosRulerSpec.alertmanagersConfig` or
- `thanosRuler.thanosRulerSpec.objectStorageConfig` or
- `thanosRuler.thanosRulerSpec.queryConfig` or
- `prometheus.prometheusSpec.thanos.objectStorageConfig`

you will have to need to set `existingSecret` or `secret` based on your
requirement.

51.x
This version upgrades Prometheus-Operator to v0.68.0, Prometheus to 2.47.0 and
Thanos to v0.32.2.

50.x
This version requires Kubernetes 1.19+.

We do not expect any breaking changes in this version.

49.x
This version upgrades Prometheus-Operator to v0.67.1, 0, Alertmanager
to v0.26.0, Prometheus to 2.46.0 and Thanos to v0.32.0.

48.x
This version moved all CRDs into a dedicated sub-chart. No new CRDs are
introduced in this version. See
<prometheus-community/helm-charts#3548> for
more context.

We do not expect any breaking changes in this version.

47.x
This version upgrades Prometheus-Operator to v0.66.0 with new CRDs
(PrometheusAgent and ScrapeConfig).

46.x
This version upgrades Prometheus-Operator to v0.65.1 with new CRDs
(PrometheusAgent and ScrapeConfig), Prometheus to v2.44.0 and Thanos
to v0.31.0.

45.x
This version upgrades Prometheus-Operator to v0.63.0, Prometheus to
v2.42.0 and Thanos to v0.30.2.
iamleot added a commit to iamleot/rpi-flux that referenced this issue Dec 29, 2023
Update to latest kube-prometheus-stack chart version.

Changes:
55.x
This version upgrades Prometheus-Operator to v0.70.0.

54.x
Grafana Helm Chart has bumped to version 7.

53.x
This version upgrades Prometheus-Operator to v0.69.1, Prometheus to 2.47.2.

52.x
This includes the ability to select between using existing secrets or
create new secret objects for various thanos config. The defaults have
not changed but if you were setting:

- `thanosRuler.thanosRulerSpec.alertmanagersConfig` or
- `thanosRuler.thanosRulerSpec.objectStorageConfig` or
- `thanosRuler.thanosRulerSpec.queryConfig` or
- `prometheus.prometheusSpec.thanos.objectStorageConfig`

you will have to need to set `existingSecret` or `secret` based on your
requirement.

51.x
This version upgrades Prometheus-Operator to v0.68.0, Prometheus to 2.47.0 and
Thanos to v0.32.2.

50.x
This version requires Kubernetes 1.19+.

We do not expect any breaking changes in this version.

49.x
This version upgrades Prometheus-Operator to v0.67.1, 0, Alertmanager
to v0.26.0, Prometheus to 2.46.0 and Thanos to v0.32.0.

48.x
This version moved all CRDs into a dedicated sub-chart. No new CRDs are
introduced in this version. See
<prometheus-community/helm-charts#3548> for
more context.

We do not expect any breaking changes in this version.

47.x
This version upgrades Prometheus-Operator to v0.66.0 with new CRDs
(PrometheusAgent and ScrapeConfig).

46.x
This version upgrades Prometheus-Operator to v0.65.1 with new CRDs
(PrometheusAgent and ScrapeConfig), Prometheus to v2.44.0 and Thanos
to v0.31.0.

45.x
This version upgrades Prometheus-Operator to v0.63.0, Prometheus to
v2.42.0 and Thanos to v0.30.2.
@adityamandke23
Copy link

Hi, am getting this error while deploying with ArgoCD - "rpc error: code = Unknown desc = helm dependency build failed exit status 1: Error: unable to load chart 'charts/crds': Chart.yaml file is missing for chart" Chart used : kube-prometheus-stack-61.8.0. Have installed the chart out-of-the-box and there is a Chart.yaml in that path.

@rouke-broersma
Copy link
Contributor

rouke-broersma commented Oct 24, 2024

Have you tried a newew version? You're many versions behind. The latest version 65.4.0 was just released.

@adityamandke23
Copy link

tried with the latest version as well, getting the same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lifecycle/frozen
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants