Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Helm 3 support #8

Closed
cdenneen opened this issue May 22, 2019 · 56 comments · Fixed by #209
Closed

Helm 3 support #8

cdenneen opened this issue May 22, 2019 · 56 comments · Fixed by #209
Labels
enhancement New feature or request size/large Should take longer than a week to resolve

Comments

@cdenneen
Copy link
Contributor

With the release of Helm 3 would like to track progress for this integration

@hiddeco
Copy link
Member

hiddeco commented May 23, 2019

This is definitely something we should track (and work) on.

I am currently unaware of what changes Helm 3 exactly brings us, except for having read the design proposal a couple of months ago and being aware of it being Tiller-less. I need to schedule some time to look into the alpha release and see how it fits into what we currently have (if it fits at all), so we can determine what the following steps would be.

If someone already has looked into it and has ideas, feel free to comment.

@timja
Copy link
Contributor

timja commented May 24, 2019

i think the main implication is no tiller, and release information is stored in kubernetes objects and not etcd

note that in the current alpha --namespace is broken and requires your active context to be the namespace you want to deploy into

@jpds
Copy link

jpds commented May 24, 2019

Good session from this week's Kubecon about the changes: https://www.youtube.com/watch?v=lYzrhzLAxUI

@hiddeco
Copy link
Member

hiddeco commented Jun 24, 2019

Finally had a chance to look into the first alpha release and see how this would fit into the Helm operator. Consider the following to be a list of observations and questions that came to mind, they may be short-sighted, incomplete, prone to changes, or even incorrect.


A .kube/config is the only requirement (besides helm)

This means we no longer have to initialize and/or maintain a connection with Tiller, if a user has no Helm v2 releases, and only need to run helm commands with a service account that has sufficient permissions.

Release names are now namespaced

I see room here for dropping the .spec.releaseName field and always use the name of a HelmRelease as a release name. This would also make some of the recently invented logic obsolete, and would ease parallelization.

OCI registries are going to be supported as chart repositories

Although still experimental, and a fair amount of work has to be done for alpha 2, I think this will be adopted pretty fast by some enterprise users. The Helm operator will need to understand this new chart source type, and be able to pull (helm chart pull) or save (helm chart save) them.

Helm charts with dependencies are not interchangeable between Helm versions as Helm v3 depends on Chart.{yaml,lock} files

Not a direct issue for the operator, as the helm dep commands have not changed, it will however be of importance to users when they are switching.

Chart value validation is now supported with JSONSchemes

Not much work for us here (install, upgrade actions will just error faster than before), but I hope this will give us enough tools / context to provide users with better error and validation messages on installation and upgrade failures.


The biggest question here for us to answer is: are we able to create a Helm operator which supports both versions? And if we are able to, how is it going to work?

I had a quick discussion with @stefanprodan about this and we both noticed there are some major differences between both Helm versions that by refactoring what we have to support both versions, there is a risk of introducing bugs to the currently working Helm v2 operator.

There is also a question about how the operator is going to know what helm version it needs to call, introducing a version field to the HelmRelease spec is an option (which would default to v2 for backwards compatibility), and this would give you the option to migrate your releases one-by-one (by changing the version field to v3 one release at a time).

A second option would be to change the API version of the CRD, create new listeners and event handlers, and let those two work independent from each other. Plus side to this approach is that there is a very clear separation of the two, and incorporating specific logic for a certain version would be easier due to having the ability to create specific update methods for e.g. the queue.

NB: both of the options named above could work within the same Helm operator or in two separate operators.

As a last I want to make a note about the current slowness of the operator, which is not directly related to Helm v3, but something to keep in mind while making the choice between separation or incorporation. I think parallel operations will be a lot easier when the operator only has to deal with one Helm version.

@squaremo
Copy link
Member

Release names are now namespaced

I see room here for dropping the .spec.releaseName field and always use the name of a HelmRelease as a release name.

When .spec.releaseName is not supplied, the release name generated includes the namespace, so is effectively namespaced anyway.

The rationale for .spec.releaseName is to account for "taking over" a release -- that is, introducing a HelmRelease resource to manage an existing release, and thereby avoid disruption. That use case hasn't gone away, though we could in principle decide not to care about it.

There is also a question about how the operator is going to know what helm version it needs to call, introducing a version field to the HelmRelease spec is an option (which would default to v2 for backwards compatibility), and this would give you the option to migrate your releases one-by-one (by changing the version field to v3 one release at a time).

I quite like this alternative. I think the deciding factor between this and introducing an entirely new CRD will be whether we need (substantially) different fields. A good exercise might be to speculate on what a HelmRelease for a Helm v3 release would have to look like -- are there fields that become irrelevant? Are there new, mandatory fields?

@stefanprodan stefanprodan transferred this issue from fluxcd/flux Aug 13, 2019
@hiddeco hiddeco added enhancement New feature or request size/large Should take longer than a week to resolve labels Aug 13, 2019
@stealthybox
Copy link
Member

There is also a question about how the operator is going to know what helm version it needs to call, introducing a version field to the HelmRelease spec is an option (which would default to v2 for backwards compatibility), and this would give you the option to migrate your releases one-by-one (by changing the version field to v3 one release at a time).

The Chart.yaml has an APIVersion that indicates use of Helm v3+.
We can use that to determine what version of helm the Chart requires.
https://v3.helm.sh/docs/faq/#chart-yaml-apiversion-bump

This requires less user-action and removes edge cases where the wrong client version is set for a Chart that is not supported.
For transparency we should consider logging and adding the helmClient version used to the Status object.

I see room here for dropping the .spec.releaseName field and always use the name of a HelmRelease as a release name.

+1, this should be the v3 default, even if we still support HR.Spec.ReleaseName.

The rationale for .spec.releaseName is to account for "taking over" a release -- that is, introducing a HelmRelease resource to manage an existing release, and thereby avoid disruption. That use case hasn't gone away, though we could in principle decide not to care about it.

Release names used to be cluster-global, but now they are now fully namespaced in helmv3.
This makes this use-case much less useful, because nothing stops you from taking over a release by just adding a HelmRelease of the same name in the same Namespace.

If we still support HR.Spec.TargetNamespace for helmv3, then I can see HR.Spec.ReleaseName being useful.
This would be a value-add of helm-operator since you could manage multiple releases of the same name in different Namespaces from HR's in the same management Namespace.
These would otherwise have a name collision unless you overrode HR.Spec.ReleaseName on at least n-1 of them.

@stealthybox
Copy link
Member

Thinking about U/X in the end:
We may want a global fluxd config option to enable/disable use helmv2/v3.
This is so users can enforce policies for not using v2 Tiller or beta v3 features.

@gsf
Copy link

gsf commented Oct 10, 2019

I was wondering whether there was progress going on outside of this issue. Looks like #42 and #55 have both been merged into the ongoing helm-v3-dev branch.

@hiddeco
Copy link
Member

hiddeco commented Oct 17, 2019

@gsf helm-v3-dev is the branch you want to follow for updates on the Helm 3 support.

I have been refactoring the chartsync package into a package dedicated to syncing charts from chart sources to local storage, and a package syncing HelmRelease resources to Helm. This work should surface in the near future and will bring us a lot closer to something I feel comfortable with merging into master.

@dminca
Copy link

dminca commented Nov 14, 2019

Helm 3.0.0 stable has been released 🎉

@stefanprodan
Copy link
Member

stefanprodan commented Nov 14, 2019

Helm v3 support can be tested with builds of the helm-v3-dev branch.

Install Helm v3 CLI and add it to your path:

OS=darwin-amd64 && \
mkdir -p $HOME/.helm3/bin && \
curl -sSL "https://get.helm.sh/helm-v3.0.0-${OS}.tar.gz" | tar xvz && \
chmod +x ${OS}/helm && mv ${OS}/helm $HOME/.helm3/bin/helmv3

export PATH=$PATH:$HOME/.helm3/bin
export HELM_HOME=$HOME/.helm3

Install Flux:

helmv3 repo add fluxcd https://charts.fluxcd.io

kubectl create ns fluxcd

helmv3 upgrade -i flux fluxcd/flux --wait \
--namespace fluxcd \
--set git.url=git@github.com:${GHUSER}/${GHREPO}

Install the HelmRelease CRD that contains the helm version field:

kubectl apply -f https://raw.githubusercontent.com/fluxcd/helm-operator/helm-v3-dev/deploy/flux-helm-release-crd.yaml

Install Helm Operator with Helm v3 support using the latest build:

helmv3 upgrade -i helm-operator fluxcd/helm-operator --wait \
--namespace fluxcd \
--set git.ssh.secretName=flux-git-deploy \
--set configureRepositories.enable=true \
--set configureRepositories.repositories[0].name=stable \
--set configureRepositories.repositories[0].url=https://kubernetes-charts.storage.googleapis.com \
--set extraEnvs[0].name=HELM_VERSION \
--set extraEnvs[0].value=v3 \
--set image.repository=docker.io/fluxcd/helm-operator-prerelease \
--set image.tag=helm-v3-dev-fb98e2db

Keep an eye on https://hub.docker.com/repository/docker/fluxcd/helm-operator-prerelease/tags?page=1&ordering=last_updated for new builds.

@rowecharles
Copy link

I've taken a look at the latest image (helm-v3-dev-7589ee47) and the operator is continually attempting to upgrade some releases depite there being no changes. An example is:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: elasticsearch
  namespace: monitoring
spec:
  releaseName: elasticsearch
  chart:
    repository: https://helm.elastic.co
    name: elasticsearch
    version: 7.4.1
  values:
    replicas: 1

The logs show that the operator thinks there is a difference between the applied and expected chart due to the type mismatch for the value of replicas:

ts=2019-11-24T11:40:22.774901798Z caller=release.go:340 component=release release=elasticsearch targetNamespace=monitoring resource=monitoring:helmrelease/elasticsearch helmVersion=v3 info="values have diverged" diff="  map[string]interface{}{\n  \t... // 39 identical entries\n  \t\"rbac\":           map[string]interface{}{\"create\": bool(false), \"serviceAccountName\": string(\"\")},\n  \t\"readinessProbe\": map[string]interface{}{\"failureThreshold\": float64(3), \"initialDelaySeconds\": float64(10), \"periodSeconds\": float64(10), \"successThreshold\": float64(3), \"timeoutSeconds\": float64(5)},\n- \t\"replicas\":       float64(1),\n+ \t\"replicas\":       s\"1\",\n  \t\"resources\":      map[string]interface{}{\"limits\": map[string]interface{}{\"cpu\": string(\"1000m\"), \"memory\": string(\"2Gi\")}, \"requests\": map[string]interface{}{\"cpu\": string(\"100m\"), \"memory\": string(\"2Gi\")}},\n  \t\"roles\":          map[string]interface{}{\"data\": string(\"true\"), \"ingest\": string(\"true\"), \"master\": string(\"true\")},\n  \t... // 12 identical entries\n  }\n"

The important part being:

- "replicas":       float64(1),
+ "replicas":       s"1",

This doesn't impact any of the running pods but does clog up the release history meaning rollbacks would be impossible

$helm history elasticsearch
REVISION	UPDATED                 	STATUS    	CHART              	APP VERSION	DESCRIPTION
10      	Sun Nov 24 11:42:23 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
11      	Sun Nov 24 11:43:23 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
12      	Sun Nov 24 11:44:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
13      	Sun Nov 24 11:45:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
14      	Sun Nov 24 11:46:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
15      	Sun Nov 24 11:47:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
16      	Sun Nov 24 11:48:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
17      	Sun Nov 24 11:49:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
18      	Sun Nov 24 11:50:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
19      	Sun Nov 24 11:51:22 2019	deployed  	elasticsearch-7.4.1	7.4.1      	Upgrade complete

This has happenned on a number of charts where I've used integers in the values

@eschereisin
Copy link

I am taking a look on helm-v3-dev-7589ee47 and helm-operator creates secrets in the following format:

sealed-secrets.v1

where Helm 3 has a prefix:

sh.helm.release.v1.flux.v1

@hiddeco
Copy link
Member

hiddeco commented Nov 24, 2019

@rowecharles thanks for reporting this. The reason this seems to happen is because the dry-run values generated by Helm (internally) are returned directly from memory and take a shorter route than the values returned from storage, bypassing a parser that casts the float values to strings.

I was able to get rid of the spurious 'upgrades' by always casting the values to a YAML string, and then re-reading this string into a map[string]interface{}, ensuring that the values we compare always represent the values as they would be when returned from storage. Expect a PR for this tomorrow. PR available #117, image for testing: hiddeco/helm-operator:helm-v3-v3-value-types-334af866.


@eschereisin this is because the Helm operator is still running v3.0.0-beta.3, while this was added in v3.0.0-beta.4. The upgrade to v3.0.0 (stable) will arrive soon, I had to clear the path and get the master changes in before I could start working on this.

@stromvirvel
Copy link

Is there an estimate when helm-operator supports helmv3 production-ready? Can I offer help for testing/evaluation?

@timja
Copy link
Contributor

timja commented Nov 25, 2019

Is there an estimate when helm-operator supports helmv3 production-ready? Can I offer help for testing/evaluation?

@stromvirvel See #8 (comment)

Pre-release builds are published on every commit to the helm-v3-dev branch.
All testing and feedback would be welcomed I'm sure 😄

@stefanprodan
Copy link
Member

Helm Operator using Helm v3 (stable) can now be tested.

Install Helm v3 CLI and add it to your path:

OS=darwin-amd64 && \
mkdir -p $HOME/.helm3/bin && \
curl -sSL "https://get.helm.sh/helm-v3.0.0-${OS}.tar.gz" | tar xvz && \
chmod +x ${OS}/helm && mv ${OS}/helm $HOME/.helm3/bin/helmv3

export PATH=$PATH:$HOME/.helm3/bin
export HELM_HOME=$HOME/.helm3

Install Flux:

helmv3 repo add fluxcd https://charts.fluxcd.io

kubectl create ns fluxcd

helmv3 upgrade -i flux fluxcd/flux --wait \
--namespace fluxcd \
--set git.url=git@github.com:${GHUSER}/${GHREPO}

Install the HelmRelease CRD that contains the helm version field:

kubectl apply -f https://raw.githubusercontent.com/fluxcd/helm-operator/helm-v3-dev/deploy/flux-helm-release-crd.yaml

Install Helm Operator with Helm v3 support using the latest build:

helmv3 upgrade -i helm-operator fluxcd/helm-operator --wait \
--namespace fluxcd \
--set git.ssh.secretName=flux-git-deploy \
--set configureRepositories.enable=true \
--set configureRepositories.repositories[0].name=stable \
--set configureRepositories.repositories[0].url=https://kubernetes-charts.storage.googleapis.com \
--set extraEnvs[0].name=HELM_VERSION \
--set extraEnvs[0].value=v3 \
--set image.repository=docker.io/fluxcd/helm-operator-prerelease \
--set image.tag=helm-v3-dev-0b11d9d0

We've created a a GitHub issue template for Helm v3. Please take this for a spin and create issues if you find any problems with it. Thanks!

@dragonsmith
Copy link

Hey!
I'm testing fluxcd/helm-operator-prerelease:helm-v3-dev-0b11d9d0 on my cluster right now and it mostly works but I get a strange behaviour.

For example. Here is what I get after a fresh install of sealed-secrets-controller:

helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:15:28.735809668Z caller=release.go:280 component=release release=sealed-secrets-controller targetNamespace=kube-system resource=kube-system:helmrelease/sealed-secrets-controller helmVersion=v3 info="no existing release; installing"
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:29.755719       7 client.go:87] creating 10 resource(s)
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:30.058021       7 wait.go:51] beginning wait for 10 resources with timeout of 5m0s
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:32.237735       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:34.085919       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:36.081182       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:38.079472       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:40.088582       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:42.084390       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:15:44.125082595Z caller=release.go:224 component=release release=sealed-secrets-controller targetNamespace=kube-system resource=kube-system:helmrelease/sealed-secrets-controller helmVersion=v3 info="Helm release sync succeeded" revision=1.6.0
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:15:59.17677442Z caller=operator.go:309 component=operator info="enqueuing release" resource=kube-system:helmrelease/sealed-secrets-controller
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:59.596729       7 install.go:170] WARNING: This chart or one of its subcharts contains CRDs. Rendering may fail or contain inaccuracies.
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:16:00.907662525Z caller=release.go:182 component=release release=sealed-secrets-controller targetNamespace=kube-system resource=kube-system:helmrelease/sealed-secrets-controller helmVersion=v3 error="failed to determine if the release should be synced" err="failed to upgrade chart for release [3fcfc29e-0eb4-4244-9a29-50544aba024b]: rendered manifests contain a resource that already exists. Unable to continue with install: existing resource conflict: kind: CustomResourceDefinition, namespace: , name: sealedsecrets.bitnami.com"

I also get validation errors from cert-manager release, which is also managed by helm-operator right now:

helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:25:00.496478628Z caller=release.go:182 component=release release=cert-manager targetNamespace=ingress resource=ingress:helmrelease/cert-manager helmVersion=v3 error="failed to determine if the release should be synced" err="failed to upgrade chart for release [333a1ef6-b72d-4122-857b-60852322ac72]: unable to build kubernetes objects from release manifest: error validating \"\": error validating data: ValidationError(ValidatingWebhookConfiguration.webhooks[0].namespaceSelector.matchExpressions[1].values): unknown object type \"nil\" in ValidatingWebhookConfiguration.webhooks[0].namespaceSelector.matchExpressions[1].values[0]"

Do you have any ideas how to address that?

Thank you very much!

@gsf
Copy link

gsf commented Dec 3, 2019

@dragonsmith Are you able to install these charts with the helm v3 CLI? The way CRDs are handled has changed: https://helm.sh/docs/topics/chart_best_practices/custom_resource_definitions/

@dragonsmith
Copy link

@gsf the interesting thing is these charts are installed successfully either way (via helm3-cli or using helm-operator). It's just that helm operator gives such warnings.

Also, look closer: the second warning is about: ValidationError(ValidatingWebhookConfiguration.webhooks[0].namespaceSelector.matchExpressions[1].values): unknown object type \"nil\"

I'm aware that there are changes to CRD handling but I thought it steps in only if we use Chart api V2, because we actually need a back compatibility with current charts, but I did not dive into the new code.

@gsf
Copy link

gsf commented Dec 3, 2019

@dragonsmith That ValidationError looks like this issue: pegasystems/pega-helm-charts#34 (comment).

@onedr0p
Copy link

onedr0p commented Dec 23, 2019

Glad to see this merged into master, can't wait for the next release 👍

@apenney
Copy link

apenney commented Jan 10, 2020

Also confirming that we're looking good, really appreciate the quick fixes! :)

@runningman84
Copy link
Contributor

The latest rc seems just install prometheus operator during the first run and leaves it untouched afterwards. But I still get this error message:

fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"

Can we just ignore this message?

@runningman84
Copy link
Contributor

I have also found this problem in flux-helm-operator:

fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: ts=2020-01-10T21:47:01.066165659Z caller=release.go:217 component=release release=blackbox-exporter targetNamespace=monitoring resource=monitoring:helmrelease/blackbox-exporter helmVersion=v3 error="Helm release failed" revision=1.6.0 err="failed to upgrade chart for release [blackbox-exporter]: unable to build kubernetes objects from release manifest: error validating \"\": error validating data: ValidationError(PodDisruptionBudget.spec): unknown field \"enabled\" in io.k8s.api.policy.v1beta1.PodDisruptionBudgetSpec"

I do not understand this error my config looks just fine and just to work before using helm2:

---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: blackbox-exporter
  namespace: monitoring
  annotations:
    fluxcd.io/automated: 'true'
    filter.fluxcd.io/chart-image: semver:~0.15
spec:
  releaseName: blackbox-exporter
  helmVersion: v3
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com/
    name: prometheus-blackbox-exporter
    version: 2.0.0
  values:
    name: blackbox-exporter
    image:
      tag: v0.15.1
    resources:
      limits:
        memory: 64Mi
      requests:
        cpu: 50m
        memory: 64Mi

I do not see the use of the keyword enabled anywhere in this helm release.

@gsf
Copy link

gsf commented Jan 10, 2020

@runningman84 That "enabled" field in PodDisruptionBudget was a Helm V3 incompatibility in prometheus-blackbox-exporter prior to version 2.0.0. You may need to set the content of PodDisruptionBudget as described in the README: https://github.com/helm/charts/tree/master/stable/prometheus-blackbox-exporter#200

@rowecharles
Copy link

I can also confirm that 1.0.0-rc7 has fixed my issues with spurious upgrades. Thanks for the great work!

@runningman84 the crd-install hooks are no longer supported in helm 3. I think it's safe to ignore the errors.

@smark88
Copy link

smark88 commented Jan 14, 2020

I've taken a look at the latest image (helm-v3-dev-7589ee47) and the operator is continually attempting to upgrade some releases depite there being no changes. An example is:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: elasticsearch
  namespace: monitoring
spec:
  releaseName: elasticsearch
  chart:
    repository: https://helm.elastic.co
    name: elasticsearch
    version: 7.4.1
  values:
    replicas: 1

The logs show that the operator thinks there is a difference between the applied and expected chart due to the type mismatch for the value of replicas:

ts=2019-11-24T11:40:22.774901798Z caller=release.go:340 component=release release=elasticsearch targetNamespace=monitoring resource=monitoring:helmrelease/elasticsearch helmVersion=v3 info="values have diverged" diff="  map[string]interface{}{\n  \t... // 39 identical entries\n  \t\"rbac\":           map[string]interface{}{\"create\": bool(false), \"serviceAccountName\": string(\"\")},\n  \t\"readinessProbe\": map[string]interface{}{\"failureThreshold\": float64(3), \"initialDelaySeconds\": float64(10), \"periodSeconds\": float64(10), \"successThreshold\": float64(3), \"timeoutSeconds\": float64(5)},\n- \t\"replicas\":       float64(1),\n+ \t\"replicas\":       s\"1\",\n  \t\"resources\":      map[string]interface{}{\"limits\": map[string]interface{}{\"cpu\": string(\"1000m\"), \"memory\": string(\"2Gi\")}, \"requests\": map[string]interface{}{\"cpu\": string(\"100m\"), \"memory\": string(\"2Gi\")}},\n  \t\"roles\":          map[string]interface{}{\"data\": string(\"true\"), \"ingest\": string(\"true\"), \"master\": string(\"true\")},\n  \t... // 12 identical entries\n  }\n"

The important part being:

- "replicas":       float64(1),
+ "replicas":       s"1",

This doesn't impact any of the running pods but does clog up the release history meaning rollbacks would be impossible

$helm history elasticsearch
REVISION	UPDATED                 	STATUS    	CHART              	APP VERSION	DESCRIPTION
10      	Sun Nov 24 11:42:23 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
11      	Sun Nov 24 11:43:23 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
12      	Sun Nov 24 11:44:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
13      	Sun Nov 24 11:45:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
14      	Sun Nov 24 11:46:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
15      	Sun Nov 24 11:47:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
16      	Sun Nov 24 11:48:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
17      	Sun Nov 24 11:49:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
18      	Sun Nov 24 11:50:22 2019	superseded	elasticsearch-7.4.1	7.4.1      	Upgrade complete
19      	Sun Nov 24 11:51:22 2019	deployed  	elasticsearch-7.4.1	7.4.1      	Upgrade complete

This has happenned on a number of charts where I've used integers in the values

I have also ran into this and it's rather annoying about every 5 minutes its does a revision.

REVISION        UPDATED                         STATUS          CHART           APP VERSION     DESCRIPTION     
471             Tue Jan 14 21:19:12 2020        superseded      datarepo-0.0.4                  Upgrade complete
472             Tue Jan 14 21:22:10 2020        superseded      datarepo-0.0.4                  Upgrade complete
473             Tue Jan 14 21:25:07 2020        superseded      datarepo-0.0.4                  Upgrade complete
474             Tue Jan 14 21:28:06 2020        superseded      datarepo-0.0.4                  Upgrade complete
475             Tue Jan 14 21:31:05 2020        superseded      datarepo-0.0.4                  Upgrade complete
476             Tue Jan 14 21:34:13 2020        superseded      datarepo-0.0.4                  Upgrade complete
477             Tue Jan 14 21:37:06 2020        superseded      datarepo-0.0.4                  Upgrade complete
478             Tue Jan 14 21:40:10 2020        superseded      datarepo-0.0.4                  Upgrade complete
479             Tue Jan 14 21:43:05 2020        superseded      datarepo-0.0.4                  Upgrade complete
480             Tue Jan 14 21:46:05 2020        deployed        datarepo-0.0.4                  Upgrade complete

@hiddeco
Copy link
Member

hiddeco commented Jan 14, 2020

@smark88 you are running an old version, helm-v3-dev has been merged into master, and the latest version with Helm 3 support (and bug fixes) is 1.0.0-rc7.

@REBELinBLUE
Copy link

REBELinBLUE commented Jan 14, 2020

Not sure if there is anything to do with 1.0.0-rc7 but I was running from a build of helm-v3-dev for a few weeks, I have changed to 1.0.0-rc7 and now most of the helm releases have a status of "failed to compose values for chart release" even though they are installed fine

For example

❯ helm history velero -n velero
REVISION	UPDATED                 	STATUS  	CHART       	APP VERSION	DESCRIPTION
1       	Tue Dec 24 16:08:53 2019	deployed	velero-2.7.5	1.2.0      	Install complete
❯ kubectl get hr -n velero
NAME     RELEASE   STATUS     MESSAGE                                      AGE
velero   velero    deployed   failed to compose values for chart release   21d

@smark88
Copy link

smark88 commented Jan 14, 2020

@smark88 you are running an old version, helm-v3-dev has been merged into master, and the latest version with Helm 3 support (and bug fixes) is 1.0.0-rc7.

helm-operator:1.0.0-rc5 is the container I have deployed I will try upgrading to 1.0.0-rc7 and see if that resolves it.

@runningman84
Copy link
Contributor

I have also two releases with status failed to compose values also they are installed using the rc7 version.

@hiddeco
Copy link
Member

hiddeco commented Jan 15, 2020

@runningman84 @REBELinBLUE can you both share a copy of one of HelmReleases causing trouble?

@runningman84
Copy link
Contributor

I can share one of them which is using a public chart:

---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: external-dns
  namespace: kube-system
  annotations:
    fluxcd.io/automated: "true"
    filter.fluxcd.io/chart-image: semver:~0.5
spec:
  releaseName: external-dns
  helmVersion: v3
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com/
    name: external-dns
    version: 2.5.3
  values:
    name: external-dns
    image:
      tag: 0.5.15-debian-9-r1
    rbac:
      create: true
    provider: aws
    #txtOwnerId: k8s-prod-home
    aws:
      accessKey: xxxxxxxxxxxxxxxxxxxxxxxx
      secretKey: yyyyyyyyyyyyyyyyyyyyyyyy
      region: us-east-1

this is the kubectl output:

root@cubi001:~# kubectl get hr --all-namespaces
NAMESPACE     NAME                    RELEASE                 STATUS     MESSAGE                                                                                                                                                                                              AGE
kube-system   external-auth-server    external-auth-server    deployed   Helm release sync succeeded                                                                                                                                                                          19d
kube-system   external-dns            external-dns            deployed   failed to compose values for chart release                                                                                                                                                           19d

@hiddeco
Copy link
Member

hiddeco commented Jan 15, 2020

@runningman84 this works without any issues for me, can you share the Status output from kubectl describe hr?

@runningman84
Copy link
Contributor

this is the output

kubectl describe hr external-dns -n kube-system
Name:         external-dns
Namespace:    kube-system
Labels:       fluxcd.io/sync-gc-mark=sha256.d2hcYp2rAOHFEhWk8hRzjHP_lI9tPeoShKH4OThrveI
Annotations:  filter.fluxcd.io/chart-image: semver:~0.5
              fluxcd.io/automated: true
              fluxcd.io/sync-checksum: 44ada7e95a8eed235e4e0c2bea067e1a9e5fec95
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"helm.fluxcd.io/v1","kind":"HelmRelease","metadata":{"annotations":{"filter.fluxcd.io/chart-image":"semver:~0.5","fluxcd.io/...
API Version:  helm.fluxcd.io/v1
Kind:         HelmRelease
Metadata:
  Creation Timestamp:  2019-12-26T21:55:32Z
  Generation:          1
  Resource Version:    4630545
  Self Link:           /apis/helm.fluxcd.io/v1/namespaces/kube-system/helmreleases/external-dns
  UID:                 56d5f433-05f4-4e17-a422-c38f93d9e43b
Spec:
  Chart:
    Name:        external-dns
    Repository:  https://kubernetes-charts.storage.googleapis.com/
    Version:     2.5.3
  Helm Version:  v3
  Release Name:  external-dns
  Values:
    Aws:
      Access Key:  xxxxxxxxxxxxxxxxxxxxxxxx
      Region:      us-east-1
      Secret Key:  yyyyyyyyyyyyyyyyyyyyyyyy
    Image:
      Tag:     0.5.15-debian-9-r1
    Name:      external-dns
    Provider:  aws
    Rbac:
      Create:      true
    Txt Owner Id:  k8s-prod-home
Status:
  Conditions:
    Last Transition Time:  2019-12-30T06:49:28Z
    Last Update Time:      2019-12-30T06:49:28Z
    Message:               failed to compose values for chart release
    Reason:                HelmUpgradeFailed
    Status:                False
    Type:                  Released
    Last Transition Time:  2019-12-26T22:49:16Z
    Last Update Time:      2020-01-13T20:46:27Z
    Message:               chart fetched: external-dns-2.5.3.tgz
    Reason:                RepoChartInCache
    Status:                True
    Type:                  ChartFetched
  Observed Generation:     1
  Release Name:            external-dns
  Release Status:          deployed
  Revision:                2.5.3
  Values Checksum:         db74f0cd3f490aff01288488cbc3d9ee8484af151db23ae096a015c59ca20124
Events:
  Type    Reason       Age                 From           Message
  ----    ------       ----                ----           -------
  Normal  ChartSynced  13m (x75 over 37h)  helm-operator  Chart managed by HelmRelease processed

@smark88
Copy link

smark88 commented Jan 15, 2020

@smark88 you are running an old version, helm-v3-dev has been merged into master, and the latest version with Helm 3 support (and bug fixes) is 1.0.0-rc7.

helm-operator:1.0.0-rc5 is the container I have deployed I will try upgrading to 1.0.0-rc7 and see if that resolves it.

I redeployed last night and the 1.0.0-rc5 --> 1.0.0-rc7 has resolved my issue. Typically by this time in the morning I would have about 100+ revisions.

NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                                   APP VERSION
dev-jade        dev             1               2020-01-14 23:08:11.235655799 +0000 UTC deployed        datarepo-0.0.4                                     
secrets         dev             1               2020-01-14 23:08:13.113370804 +0000 UTC deployed        create-secret-manager-secret-0.0.4 

@REBELinBLUE
Copy link

@hiddeco https://github.com/REBELinBLUE/k8s-on-hypriot/blob/master/deployments/velero/velero/velero.yaml but most of the charts in the deployments folder are showing the status, the only ones which aren't are the ones I removed with kubectl delete hr <name> and then reapplied

@hiddeco
Copy link
Member

hiddeco commented Jan 15, 2020

@REBELinBLUE @runningman84 does one of you still have access to logs from around the time the status message was pushed for the release?

@runningman84
Copy link
Contributor

I don't think so but I will check. Shouldn't helm operator try it again each run and the error should go away or come back again?

@REBELinBLUE
Copy link

REBELinBLUE commented Jan 15, 2020

Sadly not I'm afraid, as it's just a test server I don't have the logs being stored and I've killed the pod to see if it fixes it.

All I get now in the logs is things like

helm-operator-5cc864bbb7-hnm7n flux-helm-operator ts=2020-01-15T19:14:09.428375536Z caller=operator.go:309 component=operator info="enqueuing release" resource=velero:helmrelease/velero
....
helm-operator-5cc864bbb7-hnm7n flux-helm-operator I0115 19:14:24.792344       8 upgrade.go:79] preparing upgrade for velero
....
helm-operator-5cc864bbb7-hnm7n flux-helm-operator I0115 19:14:28.375813       8 upgrade.go:87] performing update for velero
helm-operator-5cc864bbb7-hnm7n flux-helm-operator I0115 19:14:28.431962       8 upgrade.go:220] dry run for velero

@hiddeco
Copy link
Member

hiddeco commented Jan 15, 2020

@runningman84 the message is only updated if an actual release happens because some mutation is made (either to the HelmRelease, the chart, or the Helm release itself). So to somewhat get an idea of what happened (and in what order) we would need logs.

@REBELinBLUE
Copy link

But that is the point, they were fine, nothing has been changed but the status has for some reason changed to that? That said, I will update the values now so an update actually runs and see what happens.

@REBELinBLUE
Copy link

Well, I made this change REBELinBLUE/k3s-rpi-cluster@eecdb49#diff-b5b0500f31d74e5d8f95df6d7237661f and when it was run the status changed to "Helm release sync succeeded" so I guess that is a fix

@hiddeco
Copy link
Member

hiddeco commented Jan 16, 2020

Closing this as most bugs have been fixed by now, in case you encounter a new one do not hesitate to open a dedicated issue.

On a last note I want to thank everyone who commented on this issue for their dedication and the quality of the reported issues, it made the bug fixing as fun as it can get. 🌷

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request size/large Should take longer than a week to resolve
Projects
None yet
Development

Successfully merging a pull request may close this issue.