Skip to content

ambassador crashing on node with wrong DNS resolver address due to misconfigured kubelet #1289

Closed
@372046933

Description

@372046933

After the follow deployment script,
curl https://raw.githubusercontent.com/kubeflow/kubeflow/v0.2.2/scripts/deploy.sh | bash.
Ambassador failed to start on one node.

 kubectl logs --namespace kubeflow ambassador-849fb9c8c5-kgrkb ambassador
./entrypoint.sh: set: line 65: can't access tty; job control turned off
2018-07-31 05:46:50 kubewatch 0.30.1 INFO: generating config with gencount 1 (4 changes)
2018-07-31 05:46:56 kubewatch 0.30.1 WARNING: Scout: could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7383625940>: Failed to establish a new connection: [Errno -3] Try again',))
2018-07-31 05:46:56 kubewatch 0.30.1 INFO: Scout reports {"latest_version": "0.30.1", "exception": "could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7383625940>: Failed to establish a new connection: [Errno -3] Try again',))", "cached": false, "timestamp": 1533016011.063859}
[2018-07-31 05:46:56.133][10][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-07-31 05:46:56.133][10][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-07-31 05:46:56.150][10][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-07-31 05:46:56.150][10][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
AMBASSADOR: starting diagd
AMBASSADOR: starting Envoy
AMBASSADOR: waiting
PIDS: 11:diagd 12:envoy 13:kubewatch
[2018-07-31 05:46:56.556][14][info][main] source/server/server.cc:184] initializing epoch 0 (hot restart version=9.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-07-31 05:46:57.574][14][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-07-31 05:46:57.767][14][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-07-31 05:46:57.767][14][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
[2018-07-31 05:46:57.769][14][info][main] source/server/server.cc:359] starting main dispatch loop
2018-07-31 05:47:04 diagd 0.30.1 WARNING: Scout: could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0bee6d95f8>: Failed to establish a new connection: [Errno -3] Try again',))
2018-07-31 05:47:04 diagd 0.30.1 INFO: Scout reports {"latest_version": "0.30.1", "exception": "could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0bee6d95f8>: Failed to establish a new connection: [Errno -3] Try again',))", "cached": false, "timestamp": 1533016019.808133}
2018-07-31 05:47:14 kubewatch 0.30.1 INFO: generating config with gencount 2 (4 changes)
2018-07-31 05:47:19 kubewatch 0.30.1 WARNING: Scout: could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6fbb8468d0>: Failed to establish a new connection: [Errno -3] Try again',))
2018-07-31 05:47:19 kubewatch 0.30.1 INFO: Scout reports {"latest_version": "0.30.1", "exception": "could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6fbb8468d0>: Failed to establish a new connection: [Errno -3] Try again',))", "cached": false, "timestamp": 1533016034.702365}
[2018-07-31 05:47:19.770][26][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-07-31 05:47:19.771][26][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-07-31 05:47:19.788][26][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-07-31 05:47:19.788][26][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
unable to initialize hot restart: previous envoy process is still initializing
starting hot-restarter with target: /application/start-envoy.sh
forking and execing new child process at epoch 0
forked new child process with PID=14
got SIGHUP
forking and execing new child process at epoch 1
forked new child process with PID=27
got SIGCHLD
PID=27 exited with code=1
Due to abnormal exit, force killing all child processes and exiting
force killing PID=14
exiting due to lack of child processes
AMBASSADOR: envoy exited with status 1
Here's the envoy.json we were trying to run with:
{
  "listeners": [

    {
      "address": "tcp://0.0.0.0:80",

      "filters": [
        {
          "type": "read",
          "name": "http_connection_manager",
          "config": {"codec_type": "auto",
            "stat_prefix": "ingress_http",
            "access_log": [
              {
                "format": "ACCESS [%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% \"%REQ(X-FORWARDED-FOR)%\" \"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\"\n",
                "path": "/dev/fd/1"
              }
            ],
            "route_config": {
              "virtual_hosts": [
                {
                  "name": "backend",
                  "domains": ["*"],"routes": [

                    {
                      "timeout_ms": 3000,"prefix": "/ambassador/v0/check_ready","prefix_rewrite": "/ambassador/v0/check_ready",
                      "weighted_clusters": {
                          "clusters": [

                                 { "name": "cluster_127_0_0_1_8877", "weight": 100.0 }

                          ]
                      }

                    }
                    ,

                    {
                      "timeout_ms": 3000,"prefix": "/ambassador/v0/check_alive","prefix_rewrite": "/ambassador/v0/check_alive",
                      "weighted_clusters": {
                          "clusters": [

                                 { "name": "cluster_127_0_0_1_8877", "weight": 100.0 }

                          ]
                      }

                    }
                    ,

                    {
                      "timeout_ms": 3000,"prefix": "/ambassador/v0/","prefix_rewrite": "/ambassador/v0/",
                      "weighted_clusters": {
                          "clusters": [

                                 { "name": "cluster_127_0_0_1_8877", "weight": 100.0 }

                          ]
                      }

                    }
                    ,

                    {
                      "timeout_ms": 3000,"prefix": "/tfjobs/","prefix_rewrite": "/tfjobs/",
                      "weighted_clusters": {
                          "clusters": [

                                 { "name": "cluster_tf_job_dashboard_default", "weight": 100.0 }

                          ]
                      }

                    }
                    ,

                    {
                      "timeout_ms": 3000,"prefix": "/k8s/ui/","prefix_rewrite": "/",
                      "weighted_clusters": {
                          "clusters": [

                                 { "name": "cluster_kubernetes_dashboard_kube_system_otls", "weight": 100.0 }

                          ]
                      }

                    }
                    ,

                    {
                      "timeout_ms": 300000,"prefix": "/user/","prefix_rewrite": "/user/",
                      "weighted_clusters": {
                          "clusters": [

                                 { "name": "cluster_tf_hub_lb_default", "weight": 100.0 }

                          ]
                      }

                    }
                    ,

                    {
                      "timeout_ms": 300000,"prefix": "/hub/","prefix_rewrite": "/hub/",
                      "weighted_clusters": {
                          "clusters": [

                                 { "name": "cluster_tf_hub_lb_default", "weight": 100.0 }

                          ]
                      }

                    }
                    ,

                    {
                      "timeout_ms": 3000,"prefix": "/","prefix_rewrite": "/",
                      "weighted_clusters": {
                          "clusters": [

                                 { "name": "cluster_centraldashboard_default", "weight": 100.0 }

                          ]
                      }

                    }


                  ]
                }
              ]
            },
            "filters": [
              {
                "name": "cors",
                "config": {}
              },{"type": "decoder",
                "name": "router",
                "config": {}
              }
            ]
          }
        }
      ]
    }
  ],
  "admin": {
    "address": "tcp://127.0.0.1:8001",
    "access_log_path": "/tmp/admin_access_log"
  },
  "cluster_manager": {
    "clusters": [
      {
        "name": "cluster_127_0_0_1_8877",
        "connect_timeout_ms": 3000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://127.0.0.1:8877"
          }

        ]},
      {
        "name": "cluster_centraldashboard_default",
        "connect_timeout_ms": 3000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://centraldashboard.default:80"
          }

        ]},
      {
        "name": "cluster_kubernetes_dashboard_kube_system_otls",
        "connect_timeout_ms": 3000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://kubernetes-dashboard.kube-system:443"
          }

        ],
        "ssl_context": {

        }},
      {
        "name": "cluster_tf_hub_lb_default",
        "connect_timeout_ms": 3000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://tf-hub-lb.default:80"
          }

        ]},
      {
        "name": "cluster_tf_job_dashboard_default",
        "connect_timeout_ms": 3000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://tf-job-dashboard.default:80"
          }

        ]}

    ]
  },
  "statsd_udp_ip_address": "127.0.0.1:8125",
  "stats_flush_interval_ms": 1000
}AMBASSADOR: shutting down

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions