Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4.11.1 unexpected error obtaining nginx status info #11689

Open
Kampe opened this issue Jul 27, 2024 · 3 comments
Open

v4.11.1 unexpected error obtaining nginx status info #11689

Kampe opened this issue Jul 27, 2024 · 3 comments
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@Kampe
Copy link

Kampe commented Jul 27, 2024

Seeing issues in nginx startup, not seeing much in relation to why there's issues with the healthcheck response.

I0727 00:08:28.342380       7 nginx.go:317] "Starting NGINX process"
I0727 00:08:28.342455       7 leaderelection.go:250] attempting to acquire leader lease ingress-nginx/ingress-nginx-internal-leader...
I0727 00:08:28.342749       7 nginx.go:337] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
I0727 00:08:28.345201       7 controller.go:193] "Configuration changes detected, backend reload required"
I0727 00:08:28.358021       7 status.go:85] "New leader elected" identity="ingress-nginx-internal-controller-67bfb7fd4b-nzkdt"
2024/07/27 00:08:35 Get "http://127.0.0.1:10246/nginx_status": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:08:35.677958       7 nginx_status.go:171] unexpected error obtaining nginx status info: Get "http://127.0.0.1:10246/nginx_status": dial tcp 127.0.0.1:10246: connect: connection refused
2024/07/27 00:09:05 Get "http://127.0.0.1:10246/nginx_status": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:05.683341       7 nginx_status.go:171] unexpected error obtaining nginx status info: Get "http://127.0.0.1:10246/nginx_status": dial tcp 127.0.0.1:10246: connect: connection refused
I0727 00:09:07.380630       7 controller.go:213] "Backend successfully reloaded"
I0727 00:09:07.380716       7 controller.go:224] "Initial sync, sleeping for 1 second"
I0727 00:09:07.380802       7 event.go:377] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-internal-controller-dbcc4dc9c-29mpv", UID:"4ee6bf1d-df1f-4bb4-8e37-04d6978dfd6d", APIVersion:"v1", ResourceVersion:"214163955", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
W0727 00:09:08.382382       7 controller.go:244] Dynamic reconfiguration failed (retrying; 15 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:09.394353       7 controller.go:244] Dynamic reconfiguration failed (retrying; 14 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:10.797697       7 controller.go:244] Dynamic reconfiguration failed (retrying; 13 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:12.616922       7 controller.go:244] Dynamic reconfiguration failed (retrying; 12 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:14.913299       7 controller.go:244] Dynamic reconfiguration failed (retrying; 11 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
I0727 00:09:16.276657       7 sigterm.go:36] "Received SIGTERM, shutting down"
I0727 00:09:16.276928       7 nginx.go:393] "Shutting down controller queues"
I0727 00:09:16.289355       7 nginx.go:401] "Stopping admission controller"
E0727 00:09:16.289652       7 nginx.go:340] "Error listening for TLS connections" err="http: Server closed"
I0727 00:09:16.289815       7 nginx.go:409] "Stopping NGINX process"
W0727 00:09:17.931239       7 controller.go:244] Dynamic reconfiguration failed (retrying; 10 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:21.837363       7 controller.go:244] Dynamic reconfiguration failed (retrying; 9 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:26.847362       7 controller.go:244] Dynamic reconfiguration failed (retrying; 8 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:33.648965       7 controller.go:244] Dynamic reconfiguration failed (retrying; 7 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
2024/07/27 00:09:16 [notice] 2486#2486: ModSecurity-nginx v1.0.3 (rules loaded inline/local/remote: 0/14418/0)
2024/07/27 00:09:16 [notice] 2486#2486: signal process started
W0727 00:09:41.869474       7 controller.go:244] Dynamic reconfiguration failed (retrying; 6 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:53.470106       7 controller.go:244] Dynamic reconfiguration failed (retrying; 5 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
I0727 00:09:59.244212       7 nginx.go:422] "NGINX process has stopped"
I0727 00:09:59.244234       7 sigterm.go:44] Handled quit, delaying controller exit for 10 seconds

What happened:

Upgraded my helm chart from v4.10.0 to v4.11.1

What you expected to happen:

All pods are replaced and working without issue.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
  Release:       v1.11.1
  Build:         7c44f992012555ff7f4e47c08d7c542ca9b4b1f7
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.25.5

Kubernetes version (use kubectl version):

Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4-eks-036c24b

Environment:
AWS EKS

  • How was the ingress-nginx-controller installed:
values: |
        fullnameOverride: ingress-nginx-internal
        controller:
          replicaCount: 3
          autoscaling:
            enabled: true
            minReplicas: 3
            targetCPUUtilizationPercentage: 80
            targetMemoryUtilizationPercentage: 80
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
          ingressClassResource:
            name: "nginx-internal"
            controllerValue: "k8s.io/ingress-nginx-internal"
            enabled: true
            default: true
          opentelemetry:
            enabled: true
          admissionWebhooks:
            timeoutSeconds: 30

          config:
            allow-snippet-annotations: "true"
            otlp-collector-host: "opentelemetry-collector.monitoring.svc"
            otlp-collector-port: "4317"
            enable-opentelemetry: "true"
            otel-sampler: "AlwaysOn"
            otel-sampler-ratio: "1.0"
            enable-underscores-in-headers: "true"
            opentelemetry-config: "/etc/nginx/opentelemetry.toml"
            opentelemetry-operation-name: "HTTP $request_method $service_name $uri"
            opentelemetry-trust-incoming-span: "false"
            otel-sampler-parent-based: "false"
            otel-max-queuesize: "2048"
            otel-schedule-delay-millis: "5000"
            otel-max-export-batch-size: "512"
            server-snippet: |
              opentelemetry_attribute "ingress.namespace" "$namespace";
              opentelemetry_attribute "ingress.service_name" "$service_name";
              opentelemetry_attribute "ingress.name" "$ingress_name";
              opentelemetry_attribute "ingress.upstream" "$proxy_upstream_name";

          metrics:
            enabled: true
            serviceMonitor:
              enabled: true
          service:
            public: false
            subdomain: "ingress-internal"
            external:
              enabled: false
            internal:
              enabled: true
              annotations: 
                service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
                service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
                service.beta.kubernetes.io/aws-load-balancer-scheme: internal
                service.beta.kubernetes.io/aws-load-balancer-internal: "true"
                service.beta.kubernetes.io/aws-load-balancer-attributes: deletion_protection.enabled=true
  • Current State of the controller:
Name:         nginx-internal
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx-internal
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.11.1
              argocd.argoproj.io/instance=ingress-nginx-internal
              helm.sh/chart=ingress-nginx-4.11.1
Annotations:  argocd.argoproj.io/tracking-id: ingress-nginx-internal:networking.k8s.io/IngressClass:ingress-nginx/nginx-internal
              ingressclass.kubernetes.io/is-default-class: true
Controller:   k8s.io/ingress-nginx-internal
Events:       <none>
@Kampe Kampe added the kind/bug Categorizes issue or PR as related to a bug. label Jul 27, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Jul 27, 2024
@longwuyuan
Copy link
Contributor

/remove-kind bug
/kind support

Please try to add the AWS documented annotation related to security-groups. It could be that you are blocking required ports so check if the required ports are opened inside the cluster (look at the multiple port fields inside the pod for port numbers).

You have not answered any questions asked in the template of a new issue so there is nothing to debug and analyze here. Answer the questions asked in the new issue template to help out.

/triage needs-information

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 27, 2024
Copy link

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

No branches or pull requests

3 participants