Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external cluster TLS client cert has expired #19033

Open
3 tasks done
DonOtuseGH opened this issue Jul 12, 2024 · 2 comments
Open
3 tasks done

external cluster TLS client cert has expired #19033

DonOtuseGH opened this issue Jul 12, 2024 · 2 comments
Labels
bug/in-triage This issue needs further triage to be correctly classified bug Something isn't working component:security type:bug

Comments

@DonOtuseGH
Copy link

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

We have encountered a situation a few times where the connection from ArgoCD to an external cluster no longer works (UI shows unknown state for all applications of the corresponding cluster). In the past, we fixed the problem with the procedure described here. Today we took a closer look at this recurring problem, gathered some more detailed information about the situation and we think we have found the "real" cause.

To Reproduce

Error messages like this can be found in ArgoCD log for all applications:

argocd-application-controller-0 argocd-application-controller time="2024-07-12T08:02:21Z" level=info msg="Normalized app spec: {\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2024-07-11T20:17:21Z\",\"message\":\"Failed to load live state: failed to get cluster info for \\\"https://k8s-adm-222-0010:6443\\\": error synchronizing cache state : the server has asked for the client to provide credentials\",\"type\":\"ComparisonError\"},{\"lastTransitionTime\":\"2024-07-12T08:02:21Z\",\"message\":\"Failed to load target state: failed to get cluster version for cluster \\\"https://k8s-adm-222-0010:6443\\\": failed to get cluster info for \\\"https://k8s-adm-222-0010:6443\\\": error synchronizing cache state : the server has asked for the client to provide credentials\",\"type\":\"ComparisonError\"},{\"lastTransitionTime\":\"2024-07-12T08:02:21Z\",\"message\":\"error synchronizing cache state : the server has asked for the client to provide credentials\",\"type\":\"UnknownError\"}]}}" application=argocd/k8s-adm-222-0010--metrics-server

The kube-apiserver of the corresponding external cluster shows error messages like this for each ArgoCD connection attempt:

kube-apiserver-k8s-adm-222-0011 kube-apiserver E0711 20:05:26.136116       1 authentication.go:73] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2024-07-11T20:05:26Z is after 2024-07-11T14:30:51Z, verifying certificate SN=3514383209763152651, SKID=, AKID=67:85:CE:27:EA:FD:61:F8:89:53:EE:38:80:D0:D6:4B:41:4C:CA:43 failed: x509: certificate has expired or is not yet valid: current time 2024-07-11T20:05:26Z is after 2024-07-11T14:30:51Z]"

We thought, that we were using bearer token authentication between ArgoCD and the external clusters, but it seem, we were wrong:

$ argocd login argocd
Username: admin
Password:
'admin:login' logged in successfully
Context 'argocd' updated

$ argocd cluster rotate-auth k8s-adm-222-0010
FATA[0000] rpc error: code = InvalidArgument desc = Cluster 'https://k8s-adm-222-0010:6443' does not use bearer token authentication

The ServiceAccount/Bearer Token should be long-lived, see annotation explained in this reference, but this seem to not matter in this case. Just for your information:

$ kubectl describe secrets -n kube-system argocd-manager-token-n8qm2
Name:         argocd-manager-token-n8qm2
Namespace:    kube-system
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: argocd-manager
              kubernetes.io/service-account.uid: 2ba34942-ca7d-49d4-92bf-e67e791c8955

Type:  kubernetes.io/service-account-token
...

While checking the ArgoCD secrets we found that it includes a TLS client certificate in the config blob, which has expired:

$ kubectl describe secrets -n argocd cluster-k8s-adm-222-0010-2645299244
Name:         cluster-k8s-adm-222-0010-2645299244
Namespace:    argocd
Labels:       argocd.argoproj.io/secret-type=cluster
Annotations:  managed-by: argocd.argoproj.io

Type:  Opaque

Data
====
server:  39 bytes
config:  5313 bytes
name:    16 bytes



$ kubectl get secrets -n argocd cluster-k8s-adm-222-0010-2645299244 -o json | jq -r '.data|[.name, .config]|@tsv' | while read -r name config; do echo -n '### '; base64 -d <<< $name; echo; base64 -d <<< $config | jq -r .tlsClientConfig.certData | base64 -d | openssl x509 -noout -issuer -subject -dates -serial; done
### k8s-adm-222-0010
issuer=CN = kubernetes
subject=O = system:masters, CN = kubernetes-admin
notBefore=Jul 12 14:30:50 2023 GMT
notAfter=Jul 11 14:30:51 2024 GMT
serial=30C598E8C687A30B



$ hex2dec 30C598E8C687A30B
3514383209763152651

===> certificate serial number matches with the on from the external cluster kube-apiserver error message
===> it is the same certificate of the external cluster kubernetes-admin, which was used during argocd cluster add operation

Expected behavior

We either want to use authentication based on the long-lived ServiceAccount/Bearer Token or an option, better an automatism, that rotates the TLS client cert.

Screenshots

Version

$ argocd version
argocd: v2.11.0+d3f33c0
  BuildDate: 2024-05-07T16:21:23Z
  GitCommit: d3f33c00197e7f1d16f2a73ce1aeced464b07175
  GitTreeState: clean
  GoVersion: go1.21.9
  Compiler: gc
  Platform: linux/amd64
argocd-server: v2.10.7+b060053
  BuildDate: 2024-04-15T08:45:08Z
  GitCommit: b060053b099b4c81c1e635839a309c9c8c1863e9
  GitTreeState: clean
  GoVersion: go1.21.3
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v5.2.1 2023-10-19T20:13:51Z
  Helm Version: v3.14.3+gf03cc04
  Kubectl Version: v0.26.11
  Jsonnet Version: v0.20.0

Logs

see above...

Thank you very much for taking care of this issue. We would be pleased if you could give us a permanent solution.

@DonOtuseGH DonOtuseGH added the bug Something isn't working label Jul 12, 2024
@alexmt alexmt added bug/in-triage This issue needs further triage to be correctly classified component:security type:bug labels Jul 12, 2024
@DonOtuseGH
Copy link
Author

Do you need further information to investigate this issue?

@DonOtuseGH
Copy link
Author

Is there anything we can contribute to further analyzing, testing or finding a solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug/in-triage This issue needs further triage to be correctly classified bug Something isn't working component:security type:bug
Projects
None yet
Development

No branches or pull requests

2 participants