Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium hubble-ui envoyproxy keeps crashing #9857

Closed
pmhahn opened this issue Mar 6, 2023 · 17 comments
Closed

Cilium hubble-ui envoyproxy keeps crashing #9857

pmhahn opened this issue Mar 6, 2023 · 17 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@pmhahn
Copy link

pmhahn commented Mar 6, 2023

Environment:

  • Cloud provider or hardware configuration:
    Single node bare-metal

  • OS

Linux 5.10.0-21-amd64 x86_64
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
  • Version of Ansible:
    ansible 2.10.8

  • Version of Python:
    Python 3.9.2

Kubespray version (commit):
release-2.21

Network plugin used:
Cilium

Full inventory with variables:

Command used to invoke ansible:

Output of ansible run:

Anything else do we need to know:

pod/hubble-ui kept crashing. Specifically proxy:

  proxy:
    Container ID:  containerd://f98d5f671c84c5e5e6fdb29388331dea2d4d766b80e6bc8e1de71dfa16a0996a
    Image:         docker.io/envoyproxy/envoy:v1.22.5
    Image ID:      sha256:e9c4ee2ce7207ce0f446892dda8f1bcc16cd6aec0c7c55d04bddca52f8af280d
    Port:          8081/TCP
    Host Port:     0/TCP
    Command:
      envoy
    Args:
      -c
      /etc/envoy.yaml
      -l
      info
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 06 Mar 2023 09:16:26 +0100
      Finished:     Mon, 06 Mar 2023 09:16:26 +0100
    Ready:          False
    Restart Count:  5
    Environment:    <none>
    Mounts:
      /etc/envoy.yaml from hubble-ui-envoy-yaml (rw,path="envoy.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p52rr (ro)

The log error was not helpful:

[2023-03-06 10:20:04.458][19][critical][main] [source/server/server.cc:117] error initializing configuration '/etc/envoy.yaml': Protobuf message (type envoy.config.bootstrap.v3.Bootstrap reason INVALID_ARGUMENT:(static_resources.clusters[1]) hosts: Cannot find field.) has unknown fields
[2023-03-06 10:20:04.458][19][info][main] [source/server/server.cc:939] exiting

Following envoyproxy/envoy#20919 I tried to update the configmap for hubble roles/network_plugin/cilium/templates/hubble/config.yml.j2:

commit 423ee4a9dbd54bc700f96411cd164da2d9510f3d (HEAD -> release-2.21)
Author: Philipp Hahn <hahn@univention.de>
Date:   Mon Mar 6 12:52:01 2023 +0100

    fix(hubble-ui): Update envoyproxy v1.22.5

diff --git roles/network_plugin/cilium/templates/hubble/config.yml.j2 roles/network_plugin/cilium/templates/hubble/config.yml.j2
index 4f42abe85..3c1a68889 100644
--- roles/network_plugin/cilium/templates/hubble/config.yml.j2
+++ roles/network_plugin/cilium/templates/hubble/config.yml.j2
@@ -37,7 +37,8 @@ data:
           filter_chains:
             - filters:
                 - name: envoy.filters.network.http_connection_manager
-                  config:
+                  typed_config:
+                    "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                     codec_type: auto
                     stat_prefix: ingress_http
                     route_config:
@@ -50,7 +51,7 @@ data:
                                 prefix: '/api/'
                               route:
                                 cluster: backend
-                                max_grpc_timeout: 0s
+                                  # max_grpc_timeout: 0s
                                 prefix_rewrite: '/'
                             - match:
                                 prefix: '/'
@@ -65,23 +66,39 @@ data:
                             expose_headers: grpc-status,grpc-message
                     http_filters:
                       - name: envoy.filters.http.grpc_web
+                        typed_config:
+                          "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
                       - name: envoy.filters.http.cors
+                        typed_config:
+                          "@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
                       - name: envoy.filters.http.router
+                        typed_config:
+                          "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
       clusters:
         - name: frontend
           connect_timeout: 0.25s
           type: strict_dns
           lb_policy: round_robin
-          hosts:
-            - socket_address:
-                address: 127.0.0.1
-                port_value: 8080
+          load_assignment:
+            cluster_name: frontend_envoyproxy_io
+            endpoints:
+            - lb_endpoints:
+              - endpoint:
+                  address:
+                    socket_address:
+                      address: 127.0.0.1
+                      port_value: 8080
         - name: backend
           connect_timeout: 0.25s
           type: logical_dns
           lb_policy: round_robin
           http2_protocol_options: {}
-          hosts:
-            - socket_address:
-                address: 127.0.0.1
-                port_value: 8090
+          load_assignment:
+            cluster_name: backend_envoyproxy_io
+            endpoints:
+            - lb_endpoints:
+              - endpoint:
+                  address:
+                    socket_address:
+                      address: 127.0.0.1
+                      port_value: 8090
@pmhahn pmhahn added the kind/bug Categorizes issue or PR as related to a bug. label Mar 6, 2023
@prashantchitta
Copy link
Contributor

+1. Same issue happening to me.

[2023-03-06 20:44:49.901][1][info][main] [source/server/server.cc:394]   envoy.upstreams: envoy.filters.connection_pools.tcp.generic
[2023-03-06 20:44:49.904][1][critical][main] [source/server/server.cc:117] error initializing configuration '/etc/envoy.yaml': Protobuf message (type envoy.config.bootstrap.v3.Bootstrap reason INVALID_ARGUMENT:(static_resources.clusters[1]) hosts: Cannot find field.) has unknown fields
[2023-03-06 20:44:49.904][1][info][main] [source/server/server.cc:939] exiting
Protobuf message (type envoy.config.bootstrap.v3.Bootstrap reason INVALID_ARGUMENT:(static_resources.clusters[1]) hosts: Cannot find field.) has unknown fields

@oomichi
Copy link
Contributor

oomichi commented Mar 7, 2023

@pmhahn Thank you for submitting this issue with the details.
According to the report, you already have a change which is necessary to be merged to solve this issue.
Could you submit it as a pull request?

@prashantchitta
Copy link
Contributor

Looks like envoy proxy has been removed in upstream cilium helm chart. Its replaced with nginx.
I see a PR merged which fixes this issue #9735

The only thing missing is these changes are not part of any release branch. v2.21.0 does not have this commit. Any idea when will a new release branch with all these changes be created?

@pmhahn
Copy link
Author

pmhahn commented Mar 7, 2023

@pmhahn Thank you for submitting this issue with the details. According to the report, you already have a change which is necessary to be merged to solve this issue. Could you submit it as a pull request?

My changes only made envoyproxy run again, but at the end it still did not work and I was greeted by nginx instead.
@prashantchitta found #9735 which I did not find, which looks more correct than my change. But I did not yet had a chance to test it myself.

@oomichi
Copy link
Contributor

oomichi commented Mar 8, 2023

I see, thank you for your explanation.
The next version of Kubespray would be released April or May according to the existing release cycles.
I will try backporting the pull request into stable branch v2.21 also to release it quickly.

@prashantchitta
Copy link
Contributor

prashantchitta commented Mar 8, 2023

@oomichi If you can backport to v2.21, that would be awesome. Is it possible to backport this PR #9856 as well?

Both of these are related

Also hubble relay is not working. I am planning to raise a PR soon to fix it as well.

@oomichi
Copy link
Contributor

oomichi commented Mar 9, 2023

@oomichi If you can backport to v2.21, that would be awesome. Is it possible to backport this PR #9856 as well?

Both of these are related

Thanks for pointing it out.
I already did it as #9871

Also hubble relay is not working. I am planning to raise a PR soon to fix it as well.

Cool, I am looking forward to seeing your pull request to fix the hubble-relay issue.

@pmhahn
Copy link
Author

pmhahn commented Mar 9, 2023

FYI: After cherry-picking 36c6de9 into my local git-branch release-2.21 I'm again able to access Hubble-UI. Jay 😄

@prashantchitta
Copy link
Contributor

@oomichi Here is the PR to fix cilium-relay #9876. Can you review it?

@oomichi
Copy link
Contributor

oomichi commented Mar 9, 2023

@oomichi Here is the PR to fix cilium-relay #9876. Can you review it?

@prashantchitta Thanks for trying to fix the cilium-relay issue.
Could you make CLA by clicking Details of EasyCLA job to move forward?

@prashantchitta
Copy link
Contributor

@oomichi I did it multiple times. I signed the docusign stuff 3 times. I dont know why its still showing up as failed. Anything wrong with the bot? Can you check

@floryut
Copy link
Member

floryut commented Mar 10, 2023

@oomichi I did it multiple times. I signed the docusign stuff 3 times. I dont know why its still showing up as failed. Anything wrong with the bot? Can you check

Did you sign it using your p*.c*@servicenow.com email ?

@prashantchitta
Copy link
Contributor

@oomichi @floryut fixed the easycla issue. Please review the PR now

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 8, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants