Cilium hubble-ui envoyproxy keeps crashing #9857

pmhahn · 2023-03-06T12:18:45Z

Environment:

Cloud provider or hardware configuration:
Single node bare-metal
OS

Linux 5.10.0-21-amd64 x86_64
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"

Version of Ansible:
ansible 2.10.8
Version of Python:
Python 3.9.2

Kubespray version (commit):
release-2.21

Network plugin used:
Cilium

Full inventory with variables:

Command used to invoke ansible:

Output of ansible run:

Anything else do we need to know:

pod/hubble-ui kept crashing. Specifically proxy:

  proxy:
    Container ID:  containerd://f98d5f671c84c5e5e6fdb29388331dea2d4d766b80e6bc8e1de71dfa16a0996a
    Image:         docker.io/envoyproxy/envoy:v1.22.5
    Image ID:      sha256:e9c4ee2ce7207ce0f446892dda8f1bcc16cd6aec0c7c55d04bddca52f8af280d
    Port:          8081/TCP
    Host Port:     0/TCP
    Command:
      envoy
    Args:
      -c
      /etc/envoy.yaml
      -l
      info
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 06 Mar 2023 09:16:26 +0100
      Finished:     Mon, 06 Mar 2023 09:16:26 +0100
    Ready:          False
    Restart Count:  5
    Environment:    <none>
    Mounts:
      /etc/envoy.yaml from hubble-ui-envoy-yaml (rw,path="envoy.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p52rr (ro)

The log error was not helpful:

[2023-03-06 10:20:04.458][19][critical][main] [source/server/server.cc:117] error initializing configuration '/etc/envoy.yaml': Protobuf message (type envoy.config.bootstrap.v3.Bootstrap reason INVALID_ARGUMENT:(static_resources.clusters[1]) hosts: Cannot find field.) has unknown fields
[2023-03-06 10:20:04.458][19][info][main] [source/server/server.cc:939] exiting

Following envoyproxy/envoy#20919 I tried to update the configmap for hubble roles/network_plugin/cilium/templates/hubble/config.yml.j2:

commit 423ee4a9dbd54bc700f96411cd164da2d9510f3d (HEAD -> release-2.21)
Author: Philipp Hahn <hahn@univention.de>
Date:   Mon Mar 6 12:52:01 2023 +0100

    fix(hubble-ui): Update envoyproxy v1.22.5

diff --git roles/network_plugin/cilium/templates/hubble/config.yml.j2 roles/network_plugin/cilium/templates/hubble/config.yml.j2
index 4f42abe85..3c1a68889 100644
--- roles/network_plugin/cilium/templates/hubble/config.yml.j2
+++ roles/network_plugin/cilium/templates/hubble/config.yml.j2
@@ -37,7 +37,8 @@ data:
           filter_chains:
             - filters:
                 - name: envoy.filters.network.http_connection_manager
-                  config:
+                  typed_config:
+                    "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                     codec_type: auto
                     stat_prefix: ingress_http
                     route_config:
@@ -50,7 +51,7 @@ data:
                                 prefix: '/api/'
                               route:
                                 cluster: backend
-                                max_grpc_timeout: 0s
+                                  # max_grpc_timeout: 0s
                                 prefix_rewrite: '/'
                             - match:
                                 prefix: '/'
@@ -65,23 +66,39 @@ data:
                             expose_headers: grpc-status,grpc-message
                     http_filters:
                       - name: envoy.filters.http.grpc_web
+                        typed_config:
+                          "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
                       - name: envoy.filters.http.cors
+                        typed_config:
+                          "@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
                       - name: envoy.filters.http.router
+                        typed_config:
+                          "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
       clusters:
         - name: frontend
           connect_timeout: 0.25s
           type: strict_dns
           lb_policy: round_robin
-          hosts:
-            - socket_address:
-                address: 127.0.0.1
-                port_value: 8080
+          load_assignment:
+            cluster_name: frontend_envoyproxy_io
+            endpoints:
+            - lb_endpoints:
+              - endpoint:
+                  address:
+                    socket_address:
+                      address: 127.0.0.1
+                      port_value: 8080
         - name: backend
           connect_timeout: 0.25s
           type: logical_dns
           lb_policy: round_robin
           http2_protocol_options: {}
-          hosts:
-            - socket_address:
-                address: 127.0.0.1
-                port_value: 8090
+          load_assignment:
+            cluster_name: backend_envoyproxy_io
+            endpoints:
+            - lb_endpoints:
+              - endpoint:
+                  address:
+                    socket_address:
+                      address: 127.0.0.1
+                      port_value: 8090

The text was updated successfully, but these errors were encountered:

prashantchitta · 2023-03-06T20:52:21Z

+1. Same issue happening to me.

[2023-03-06 20:44:49.901][1][info][main] [source/server/server.cc:394]   envoy.upstreams: envoy.filters.connection_pools.tcp.generic
[2023-03-06 20:44:49.904][1][critical][main] [source/server/server.cc:117] error initializing configuration '/etc/envoy.yaml': Protobuf message (type envoy.config.bootstrap.v3.Bootstrap reason INVALID_ARGUMENT:(static_resources.clusters[1]) hosts: Cannot find field.) has unknown fields
[2023-03-06 20:44:49.904][1][info][main] [source/server/server.cc:939] exiting
Protobuf message (type envoy.config.bootstrap.v3.Bootstrap reason INVALID_ARGUMENT:(static_resources.clusters[1]) hosts: Cannot find field.) has unknown fields

oomichi · 2023-03-07T00:19:28Z

@pmhahn Thank you for submitting this issue with the details.
According to the report, you already have a change which is necessary to be merged to solve this issue.
Could you submit it as a pull request?

prashantchitta · 2023-03-07T00:23:46Z

Looks like envoy proxy has been removed in upstream cilium helm chart. Its replaced with nginx.
I see a PR merged which fixes this issue #9735

The only thing missing is these changes are not part of any release branch. v2.21.0 does not have this commit. Any idea when will a new release branch with all these changes be created?

pmhahn · 2023-03-07T11:58:04Z

@pmhahn Thank you for submitting this issue with the details. According to the report, you already have a change which is necessary to be merged to solve this issue. Could you submit it as a pull request?

My changes only made envoyproxy run again, but at the end it still did not work and I was greeted by nginx instead.
@prashantchitta found #9735 which I did not find, which looks more correct than my change. But I did not yet had a chance to test it myself.

oomichi · 2023-03-08T00:39:43Z

I see, thank you for your explanation.
The next version of Kubespray would be released April or May according to the existing release cycles.
I will try backporting the pull request into stable branch v2.21 also to release it quickly.

prashantchitta · 2023-03-08T00:49:31Z

@oomichi If you can backport to v2.21, that would be awesome. Is it possible to backport this PR #9856 as well?

Both of these are related

Also hubble relay is not working. I am planning to raise a PR soon to fix it as well.

oomichi · 2023-03-09T06:33:17Z

@oomichi If you can backport to v2.21, that would be awesome. Is it possible to backport this PR #9856 as well?

Both of these are related

Thanks for pointing it out.
I already did it as #9871

Also hubble relay is not working. I am planning to raise a PR soon to fix it as well.

Cool, I am looking forward to seeing your pull request to fix the hubble-relay issue.

pmhahn · 2023-03-09T12:49:22Z

FYI: After cherry-picking 36c6de9 into my local git-branch release-2.21 I'm again able to access Hubble-UI. Jay 😄

prashantchitta · 2023-03-09T21:41:21Z

@oomichi Here is the PR to fix cilium-relay #9876. Can you review it?

oomichi · 2023-03-09T21:50:54Z

@oomichi Here is the PR to fix cilium-relay #9876. Can you review it?

@prashantchitta Thanks for trying to fix the cilium-relay issue.
Could you make CLA by clicking Details of EasyCLA job to move forward?

prashantchitta · 2023-03-09T21:55:18Z

@oomichi I did it multiple times. I signed the docusign stuff 3 times. I dont know why its still showing up as failed. Anything wrong with the bot? Can you check

floryut · 2023-03-10T08:06:15Z

@oomichi I did it multiple times. I signed the docusign stuff 3 times. I dont know why its still showing up as failed. Anything wrong with the bot? Can you check

Did you sign it using your p*.c*@servicenow.com email ?

prashantchitta · 2023-03-10T18:42:34Z

@oomichi @floryut fixed the easycla issue. Please review the PR now

k8s-triage-robot · 2023-06-08T19:35:54Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-07-08T19:59:47Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-01-19T13:59:53Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-01-19T13:59:59Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pmhahn added the kind/bug Categorizes issue or PR as related to a bug. label Mar 6, 2023

oomichi mentioned this issue Mar 8, 2023

[2.21] Fix cilium's hubble ui configuration (#9735) #9864

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 8, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cilium hubble-ui envoyproxy keeps crashing #9857

Cilium hubble-ui envoyproxy keeps crashing #9857

pmhahn commented Mar 6, 2023

prashantchitta commented Mar 6, 2023

oomichi commented Mar 7, 2023

prashantchitta commented Mar 7, 2023

pmhahn commented Mar 7, 2023

oomichi commented Mar 8, 2023

prashantchitta commented Mar 8, 2023 •

edited

Loading

oomichi commented Mar 9, 2023

pmhahn commented Mar 9, 2023

prashantchitta commented Mar 9, 2023

oomichi commented Mar 9, 2023

prashantchitta commented Mar 9, 2023

floryut commented Mar 10, 2023

prashantchitta commented Mar 10, 2023

k8s-triage-robot commented Jun 8, 2023

k8s-triage-robot commented Jul 8, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-ci-robot commented Jan 19, 2024

Cilium hubble-ui envoyproxy keeps crashing #9857

Cilium hubble-ui envoyproxy keeps crashing #9857

Comments

pmhahn commented Mar 6, 2023

Full inventory with variables:

Command used to invoke ansible:

Output of ansible run:

Anything else do we need to know:

prashantchitta commented Mar 6, 2023

oomichi commented Mar 7, 2023

prashantchitta commented Mar 7, 2023

pmhahn commented Mar 7, 2023

oomichi commented Mar 8, 2023

prashantchitta commented Mar 8, 2023 • edited Loading

oomichi commented Mar 9, 2023

pmhahn commented Mar 9, 2023

prashantchitta commented Mar 9, 2023

oomichi commented Mar 9, 2023

prashantchitta commented Mar 9, 2023

floryut commented Mar 10, 2023

prashantchitta commented Mar 10, 2023

k8s-triage-robot commented Jun 8, 2023

k8s-triage-robot commented Jul 8, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-ci-robot commented Jan 19, 2024

prashantchitta commented Mar 8, 2023 •

edited

Loading