-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flux v0.26.0-2] Kustomization tries to modify immutable fields #2386
Comments
I'm going to try to reproduce this, but first could you please add some more information about how you created the PVC? Is it just, create a PVC (without volumeName) through Flux 0.25.3, then upgrade to a different version? (I did not run into this issue in my testing, possibly because I am always creating PVCs with |
I updated the issue, I use PVC with dynamic provisioning of storage classes on aws. As a workaround I removed the pvc from the kustomization. |
Can you please post the content output from:
|
Also, there are some serious formatting errors in the YAML that you posted, and it's missing a namespace. This would not work in Flux. Can you put the YAML that was added in the commit to Flux? |
I tried to replicate your scenario here: https://github.com/kingdonb/fleet-testing/blob/main/apps/keycloak/example-pvc.yaml Installed Flux 0.25.3, added a PVC and a pod bound to it so the PV would be created... omitted volumeName from my spec, as it appears you have (and most people would do) Upgraded to Flux 0.26.1 I am not seeing any errors across the Kubernetes versions in my test matrix. If the issue is resolved for you, I cannot reproduce it and we'll have to close this, unless you can provide more information (or if somebody else has this issue.) Thanks again for your report. |
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
volume.kubernetes.io/selected-node: ip-192-168-41-199.eu-central-1.compute.internal
volume.kubernetes.io/storage-resizer: kubernetes.io/aws-ebs
creationTimestamp: "2020-10-02T16:43:43Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
kustomize.toolkit.fluxcd.io/name: infrastructure
kustomize.toolkit.fluxcd.io/namespace: flux-system
kustomize.toolkit.fluxcd.io/prune: disabled
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:pv.kubernetes.io/bind-completed: {}
f:pv.kubernetes.io/bound-by-controller: {}
f:volume.beta.kubernetes.io/storage-provisioner: {}
f:volume.kubernetes.io/selected-node: {}
f:volume.kubernetes.io/storage-resizer: {}
f:finalizers:
.: {}
v:"kubernetes.io/pvc-protection": {}
f:labels:
.: {}
f:kustomize.toolkit.fluxcd.io/namespace: {}
f:kustomize.toolkit.fluxcd.io/prune: {}
f:spec:
f:accessModes: {}
f:resources:
f:requests:
.: {}
f:storage: {}
f:storageClassName: {}
f:volumeMode: {}
f:volumeName: {}
f:status:
f:accessModes: {}
f:capacity:
.: {}
f:storage: {}
f:phase: {}
manager: kustomize-controller
operation: Apply
time: "2021-10-27T11:39:29Z"
name: influxdb-volume
namespace: monitoring
resourceVersion: "456920544"
uid: c7f9929e-2741-43ce-b690-ed00816092ad
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: aws-gp2-dynamic
volumeMode: Filesystem
volumeName: pvc-c7f9929e-2741-43ce-b690-ed00816092ad
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
phase: Bound I have a kustomization.yaml file which defines the namespace for all resources |
This suggests that the Flux YAML in your git repo contains the |
no |
maybe does the kustomization add this field somehow? |
It is possible the Flux manager took the field over when it shouldn't have, I'm not sure. In my cluster, I have this:
If flux finds fields managed by My cluster is kind + local-path-provisioner, I can try a different storage class provider as this might not be representative. (The best bet is probably for me to try replicating this on EKS next, since that's your environment...) |
I have kubernetes 1.20 and the volume is from 2020, back then it did not use server side apply |
I have a test suite that runs on Flux 0.17.2 and upgrades it to the current version, I can use that to try to replicate the issue. If your volumes are that old, it might have different behavior – we resolved a number of issues like that to bring out 0.26.0, where you can only see them if you have started with Flux before serverside apply, and upgraded through 0.18.x-0.25.x. Like this issue: I might actually need to start with a Kubernetes version before serverside apply to reliably reproduce all of these kind of issues. As if kube-controller-manager uses serverside apply, or if it gets captured in managedFields however that happens nowadays, it won't matter what version of Flux created the PVC initially in the cluster, I won't see the issue reproduced... |
I confirmed that Kubernetes 1.20 with Flux 0.17.2 together with local-path-provisioner produces a PVC that looks like this:
In other words, K8s 1.20.x is not old enough to satisfy the required test conditions. I'm looking into trying to create a K8s cluster @1.15.x, install Flux 0.17.x on it, ... but that will not work, since Flux hasn't supported K8s versions < 1.16.0 since Flux2 v0.0.7 This is an extreme case :) I don't think we should balk at it though... I think we'll need to upgrade a cluster from K8s 1.15.x, to know for sure what happens when the cluster has resources that were initially created before server side apply had even reached beta. SSA was marked beta in 1.16.x. It might be easier and just as effective to test against a cluster with 1.16.x, and just ensure that beta SSA feature is turned off before the volume is created. (That way, I don't also need to start with Flux v1...) I've spent too much time on this today, but I think you have likely got something here. We may need to give some advisory if your cluster was in production before a certain threshold date. Hopefully there's a way we can reconcile this more broadly. |
@kingdonb this is the pv not the pvc |
Whoops, you're right... I already tore down the test scaffold, I'll have to repeat the test again later. Sorry for the noise. |
@Legion2 is it possible for you to paste the managedFields of the PVC before you updated flux, |
No I don't have this data |
Removing |
We used flux from the beginning and used it for everything in the cluster, so I think it was created with flux 1. However, since then we migrated stuff multiple times and need to manually fix stuff, so I'm not 100% sure if kubectl is not involved here. |
@stefanprodan I'm not familiar with the kubectl patch command, could you give an example on how to remove a managed field? |
Create a YAML with only metadata and managed fields, remove the volumeName, then apply it with kubectl patch, docs here https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/ |
We had the same problem with a lot of kustomizations and immutable CluserIP, volumname and stroageclass fields, updating from kustomize controller 0.14.1 to 0.20.2.
Removing the the managed fields via patch helps though.
|
I was facing the same issue. Using the following commands was helpful for me: kubectl -n NAMESPACE patch pvc PVC_NAME --type=json -p='[{"op": "remove", "path": "/metadata/managedFields/0/fieldsV1/f:spec/f:storageClassName"}]'
kubectl -n NAMESPACE patch pvc PVC_NAME --type=json -p='[{"op": "remove", "path": "/metadata/managedFields/0/fieldsV1/f:spec/f:volumeName"}]' |
I've got the same problem migration from flux v1 to flux v2 (0.26.1) and with Flux complains with:
Here's what's in k8s: apiVersion: v1
kind: Service
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: [snip].
creationTimestamp: "2020-10-22T16:20:39Z"
finalizers:
- service.kubernetes.io/load-balancer-cleanup
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
kustomize.toolkit.fluxcd.io/name: infra
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: ingress-nginx
namespace: ingress-nginx
resourceVersion: "192626589"
uid: a79bc74c-aa57-4223-9f80-0b25249f13b9
spec:
clusterIP: 10.113.45.120
clusterIPs:
- 10.113.45.120
externalTrafficPolicy: Local
healthCheckNodePort: 32210
ports:
- name: http
nodePort: 32695
port: 80
protocol: TCP
targetPort: http
- name: https
nodePort: 30120
port: 443
protocol: TCP
targetPort: https
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: [snip] And here's what we always had in git: kind: Service
apiVersion: v1
metadata:
name: ingress-nginx
namespace: ingress-nginx
annotations:
external-dns.alpha.kubernetes.io/hostname: ingress.aks1.westeurope.azure.cratedb.net.
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
spec:
type: LoadBalancer
externalTrafficPolicy: Local
ports:
- name: http
port: 80
targetPort: http
- name: https
port: 443
targetPort: https
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx Banging my head on the table with this. Any help much appreciated. |
If you want to see the actual problematic fields you have to add "--show-managed-fields" to your kubectl command. See my post above for a workaround: #2386 (comment)
|
Thank you! I did not know that managed fields are no longer shown by default, assumed it was a different issue. Problem solved. |
If you have not tried upgrading to Flux The issue it fixes is here: This could easily be the same issue, and I think it most likely is. I see reports mentioning version v0.26.1 but I do not see anyone who has mentioned this issue on a version >= As Flux takes over all the managed fields, so if you have edits which you expect to remain in the cluster but they are not in git, they will have to set a manager to avoid being overwritten by Flux. So I want to be careful about advising this upgrade, though it is important and it makes Flux work more like as advertised, (so I do not want to caution anyone away from it.) https://fluxcd.io/docs/faq/#why-are-kubectl-edits-rolled-back-by-flux If we can confirm this issue is still present in later versions of Flux, I will be glad to investigate. The kubectl patch described above should no longer be necessary after the upgrade. If anyone is still struggling with this, please let us know. 🙏 |
I think those that upgraded to v0.26.0 or v0.26.1 or v0.26.2 will have this issue. In v0.26.3 we found a better way to take over the fields managed by kubectl. We'll need to point people that upgraded to v0.26.0-2 to this issue, as patching the |
But I am also wondering why there was no issue in our previous flux version? Shouldn't we upgrade the flux version because of this issue? |
To avoid deleting the Service, a patch can be used to remove the ClusterIP from the managed fields:
Note that the index must match the kustomize-controller manager, the example above is for index 0. |
I believe that I've reproduced this issue when migrating from Fluxv1->Fluxv2. A Knative Serving resource manages all annotations. Its validating webhook prevents modification of these annotations. The yaml in git does not contain any annotations. Everything applies fine with Flux v1. When applying with Fluxv2 I get the following error:
Note that |
I'm still reproducing my issue with |
I've tested using |
If knative updates annotations in resources that Flux applies, then what do you see in
@tshak the problem is likely that Flux doesn't allow arbitrary writes from arbitrary writers to persist, that is considered drift and then reverted. This is new behavior https://fluxcd.io/docs/faq/#why-are-kubectl-edits-rolled-back-by-flux It was required to make changes to the way that Flux does applies after server-side apply was implemented because of some issues with Kubernetes. Many users reported that Flux was allowing drift to persist in their clusters even though they had made attempts to overwrite it in git, or the section of config had already been deleted, but it still persisted in the cluster. Flux has adjusted the approach to honor the expectations of GitOps users everywhere that expect configuration to reflect what is in git, and not some drift introduced from an unknown source. But some resources are not consistently putting their (intentional, expected, according-to-specifications) drift in places where it can work with GitOps that behaves this way. Instead of using |
Thank you for the detailed explanation. Here is an example Knative Service that is failing to apply: Git: apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: emojivoto
namespace: demo
spec:
template:
spec:
containers:
- name: emojivoto
image: buoyantio/emojivoto-web:v6
env:
- name: EMOJISVC_HOST
value: emojivoto-emoji-svc.demo.svc.cluster.local:80
- name: VOTINGSVC_HOST
value: emojivoto-voting-svc.demo.svc.cluster.local:80
- name: INDEX_BUNDLE
value: dist/index_bundle.js
- name: WEB_PORT
value: "80"
ports:
- containerPort: 80
protocol: TCP Server ( apiVersion: serving.knative.dev/v1
kind: Service
metadata:
annotations:
serving.knative.dev/creator: system:serviceaccount:flux:flux
serving.knative.dev/lastModifier: system:serviceaccount:flux:flux
creationTimestamp: "2021-12-09T12:21:32Z"
generation: 5
labels:
kustomize.toolkit.fluxcd.io/name: apps-kubernetes-state
kustomize.toolkit.fluxcd.io/namespace: flux-system
managedFields:
- apiVersion: serving.knative.dev/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations: {}
f:labels:
.: {}
f:kustomize.toolkit.fluxcd.io/name: {}
f:kustomize.toolkit.fluxcd.io/namespace: {}
f:spec:
.: {}
f:template:
.: {}
f:spec:
.: {}
f:containers: {}
manager: kustomize-controller
operation: Apply
time: "2021-12-09T15:49:33Z"
- apiVersion: serving.knative.dev/v1
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:address:
.: {}
f:url: {}
f:conditions: {}
f:latestCreatedRevisionName: {}
f:latestReadyRevisionName: {}
f:observedGeneration: {}
f:traffic: {}
f:url: {}
manager: controller
operation: Update
time: "2021-12-09T15:40:00Z"
name: emojivoto
namespace: demo
resourceVersion: "533389058"
uid: d367349c-aa69-4e82-96f8-f88804108c27
spec:
template:
metadata:
creationTimestamp: null
spec:
containerConcurrency: 0
containers:
- env:
- name: EMOJISVC_HOST
value: emojivoto-emoji-svc.demo.svc.cluster.local:80
- name: VOTINGSVC_HOST
value: emojivoto-voting-svc.demo.svc.cluster.local:80
- name: INDEX_BUNDLE
value: dist/index_bundle.js
- name: WEB_PORT
value: "80"
image: buoyantio/emojivoto-web:v6
name: emojivoto
ports:
- containerPort: 80
protocol: TCP
readinessProbe:
successThreshold: 1
tcpSocket:
port: 0
resources:
limits:
cpu: "1"
memory: 2G
requests:
cpu: 50m
memory: 50M
enableServiceLinks: false
timeoutSeconds: 30
traffic:
- latestRevision: true
percent: 100 |
I was able to successfully perform a server side apply ut it required
Interestingly, a |
@tshak try to remove the annotations with |
After patching the resource I can still repro the issue. I don't believe that the root cause is due to managed fields or server-side apply behaviour. Here is a recap of what I've tried (all
In all ❌ cases, the error is the same |
Flux performs a server-side apply, and since it manages |
I have confirmed that |
So if you do |
In this case it fails but as expected:
It passes by adding |
I see in the annotations "serving.knative.dev/creator: system:serviceaccount:flux:flux" but there is no such service account in cluster. Have you changed the SA? |
This resource was created with FluxV1. I'm testing a FluxV1->FluxV2 migration. |
@tshak there is nothing we can do in Flux about this, if Knative decided to make annotations immutable then you can't reconcile this with anything else but the flux SA. I have no clue why would Knative do such a crazy thing... |
I think that it's just these specific knative annotations that are immutable. And again, even a |
Using kubectl works because there is no SA, but when you apply things from inside the cluster Knative whats the flux SA. I see no way around this but to delete the service and then it will be created by the kustomize-controller SA. |
There was an error on my end. The managedFields patch did in fact work. The error I was getting was pointing to a similarly named emojivoto service. This means that we'll need to patch all knative services prior to upgrading to Flux v2. Thank you for taking the time to help me debug this issue. 🙏 |
I'm also running into this issue. The system we're running doesn't really enable us to manually run patches. Is there any way we can apply these patches through flux to roll it out to all the servers tracking our repository? |
i'm running into this problem after updating flux from
i'm running on gke |
Hello Patch did the job 👍
Flux : 0.28.5 / K8S: 1.22 Thanks |
I've run into this issue too. I upgraded from Flux v1 to v2 (latest 0.29.3). Everything appeared to be working fine (after modifying all the yamls to use default namespace since v2 requires namespace). A few days later, I noticed that I'm getting reconciliation errors about the immutable fields |
@uclaeaslam it seems that Flux v1 took ownership of fields it didn’t manage (I suspect this is a kubectl bug). To fix this you need to remove those fields with a patch, please see the examples in this thread. If after patching, the problem persists, please open an issue in kustomize-controller repo. |
Patching did the trick The reason I thought it didn't work was because flux wasn't rerunning the reconciliation. I had to do |
@uclaeaslam the reconciliation interval is set to 10 minutes. You can trigger it with a git commit or with the CLI:
|
Unfortunately, this is still an issue with Flux and |
Describe the bug
I updated to flux 0.26.1 and then observed an reconciliation error in a deployment. I deleted the deployment and now there is a problem with the pvc of that deployment.
I tried to downgrade the kustomization controller, but that did not resolve the issue.
Steps to reproduce
Expected behavior
Should work after the update
Screenshots and recordings
No response
OS / Distro
Ubuntu 21
Flux version
flux: v0.26.1
Flux check
► checking prerequisites
✔ Kubernetes 1.20.11-eks-f17b81 >=1.20.6-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.16.0
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.20.0
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.16.0
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.19.1
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.21.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.21.1
✔ all checks passed
Git provider
Gitlab
Container Registry provider
Gitlab
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: