Skip to content

bug: APISIX 3.14.1 After the DNS service recovered temporarily, Apisix returned a 503 error. The issue was resolved only after restarting APISIX or changing the service IP. #12973

@zbfzn

Description

@zbfzn

Current Behavior

The Apisix upstream nodes are configured to use domain names. The DNS service is malfunctioning. During the period of the failure, accessing the interface results in a 503 error. Even after the DNS service is restored, accessing the interface still returns a 503 error. The upstream service test is ok. After restarting Apixix or changing the upstream service IP, accessing the system will no longer return the 503 error.

We found that it was caused by the failure to update upstream.nodes. The key code is as follows:
utils/upstream.lua

local function compare_upstream_node(up_conf, new_t)
     ......
    if up_conf.original_nodes then
        -- if original_nodes is set, it means that the upstream nodes
        -- are changed by `fill_node_info`, so we need to compare the new nodes with the
        -- original nodes.
        old_t = up_conf.original_nodes -- !!Line#57: If the original_nodes and new_nodes are the same, this function will return true.
                                       -- However, up_conf.nodes may be empty. It will only be updated when new_t undergo any changes.
    end

    if #new_t ~= #old_t then
        return false
    end

    core.table.sort(old_t, sort_by_key_host)
    core.table.sort(new_t, sort_by_key_host)

    for i = 1, #new_t do
        local new_node = new_t[i]
        local old_node = old_t[i]
        for _, name in ipairs({"host", "port", "weight", "priority", "metadata"}) do
            if new_node[name] ~= old_node[name] then
                return false
            end
        end
    end
     .....
end

.....

function _M.parse_domain_in_up(up)
    local nodes = up.value.dns_nodes
    local new_nodes, err = parse_domain_for_nodes(nodes)
    if not new_nodes then
        return nil, err
    end

    local ok = compare_upstream_node(up.value, new_nodes)
    if ok then
        return up
    end

    if not up.orig_modifiedIndex then
        up.orig_modifiedIndex = up.modifiedIndex
    end
    up.modifiedIndex = up.orig_modifiedIndex .. "#" .. ngx_now()
    up.value.nodes = new_nodes
    core.log.info("resolve upstream which contain domain: ",
                  core.json.delay_encode(up, true))
    return up
end

....

Expected Behavior

After the DNS service was restored, accessing the website did not result in a 503 error.

Error Logs

Below is a portion of the error logs after DNS recovery. It can be seen that the 'nodes' in the 'upstream' obtained by the 'get_by_id()' function is empty.

 2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] ctx.lua:280: __index(): serving ctx value from cache for key: uri

 2026/02/05 03:42:27 [info] 49#49: *3721 [lua] trusted-addresses.lua:46: is_trusted(): trusted_addresses_matcher is not initialized, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"

 2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] ctx.lua:280: __index(): serving ctx value from cache for key: scheme

 2026/02/05 03:42:27 [info] 49#49: *3721 [lua] radixtree_host_uri.lua:161: match(): route match mode: radixtree_host_uri, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"

 2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] ctx.lua:280: __index(): serving ctx value from cache for key: host

 2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] ctx.lua:280: __index(): serving ctx value from cache for key: uri

 2026/02/05 03:42:27 [info] 49#49: *3721 [lua] init.lua:732: http_access_phase(): matched route: {"key":"/apisix/routes/122222","has_domain":false,"value":{"upstream_id":"122222","name":"health-check","update_time":1770262481,"uris":["/health"],"methods":["HEAD"],"id":"122222","status":1,"create_time":1770262326,"priority":10},"createdIndex":52653,"clean_handlers":{},"orig_modifiedIndex":52656,"modifiedIndex":52656}, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"

 2026/02/05 03:42:27 [info] 49#49: *3721 [lua] client.lua:123: dns_parse(): dns resolve demo-service.middleware.svc.cluster.local, result: {"ttl":30,"address":"10.233.31.19","section":1,"name":"demo-service.middleware.svc.cluster.local","type":1,"class":1}, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"

 2026/02/05 03:42:27 [info] 49#49: *3721 [lua] resolver.lua:84: parse_domain(): parse addr: {"ttl":30,"class":1,"section":1,"name":"demo-service.middleware.svc.cluster.local","type":1,"address":"10.233.31.19"}, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"

 2026/02/05 03:42:27 [info] 49#49: *3721 [lua] resolver.lua:85: parse_domain(): resolver: ["169.254.25.10"], client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"

 2026/02/05 03:42:27 [info] 49#49: *3721 [lua] resolver.lua:86: parse_domain(): host: demo-service.middleware.svc.cluster.local, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"

 2026/02/05 03:42:27 [info] 49#49: *3721 [lua] resolver.lua:88: parse_domain(): dns resolver domain: demo-service.middleware.svc.cluster.local to 10.233.31.19, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"

 2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] upstream.lua:49: compare_upstream_node(): compare upstream nodes by value, old: table: 0x7fb122ec0ca0 {}new: table: 0x7fb122eb44a0 [{"port":80,"host":"10.233.31.19","weight":1,"domain":"demo-service.middleware.svc.cluster.local"}]

 2026/02/05 03:42:27 [info] 49#49: *3721 [lua] upstream.lua:552: get_by_id(): parsed upstream: {"key":"/apisix/upstreams/122222","clean_handlers":{},"value":{"create_time":1770262326,"resource_id":"122222","nodes":{},"scheme":"http","dns_nodes":[{"host":"demo-service.middleware.svc.cluster.local","weight":1,"port":80}],"update_time":1770262481,"name":"health-check","pass_host":"pass","resource_version":52655,"id":"122222","hash_on":"vars","resource_key":"/apisix/upstreams/122222","nodes_ref":[{"upstream_host":"demo-service.middleware.svc.cluster.local","domain":"demo-service.middleware.svc.cluster.local","port":80,"host":"10.233.31.19","weight":1,"priority":0}],"type":"roundrobin","original_nodes":[{"port":80,"host":"10.233.31.19","weight":1,"domain":"demo-service.middleware.svc.cluster.local"}]},"createdIndex":52652,"orig_modifiedIndex":52655,"has_domain":true,"modifiedIndex":"52655#1770262936.541"}, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"

 2026/02/05 03:42:27 [error] 49#49: *3721 [lua] init.lua:565: handle_upstream(): failed to set upstream: no valid upstream node, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"

 2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] ctx.lua:280: __index(): serving ctx value from cache for key: apisix_upstream_response_time

 2026/02/05 03:42:27 [info] 49#49: *3721 client 172.18.166.122 closed keepalive connection

Steps to Reproduce

1.Run Apisix in k8s cluster
2.Create test server

kind: Deployment
apiVersion: apps/v1
metadata:
  name: httpbin
  namespace: <Your Namespace>
  labels:
    app: httpbin
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpbin
  template:
    metadata:
      labels:
        app: httpbin
    spec:
      containers:
        - name: container-vki028
          image: 'kennethreitz/httpbin'
          ports:
            - name: http-80
              containerPort: 80
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      serviceAccountName: default
      serviceAccount: default
      securityContext: {}
      schedulerName: default-scheduler
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600

---
kind: Service
apiVersion: v1
metadata:
  name: httpbin
  namespace: <Your Namespace>
  labels:
    app: httpbin
spec:
  ports:
    - name: http-80
      protocol: TCP
      port: 80
      targetPort: 80
  selector:
    app: httpbin
  type: ClusterIP
  sessionAffinity: None

3.Create test route and upstream
/apisix/upstreams/22222

{
  "nodes": [
    {
      "host": "httpbin.<Your Namespace>.svc.cluster.local",
      "port": 80,
      "weight": 1
    }
  ],
  "type": "roundrobin",
  "name": "httpbin-anything"
}

/apisix/routes/22222

{
  "uris": [
    "/anything/*"
  ],
  "name": "httpbin-anything",
  "priority": 10,
  "methods": [
    "HEAD"
  ],
  "upstream_id": "22222",
  "status": 1
}

3.Test request

while true;
do
    curl <Apisix Server Address>/anything/health -I -s;
    sleep 1;
done

4.backup and delete svc

kubectl get svc httpbin -n <Your Namespace> -o yaml > svc.yaml
kubectl delete -f svc.yaml

5.Recreate svc after all requests retrun 503 error

kubectl apply -f svc.yaml

6.Apisix always return 503 error

Environment

  • APISIX version (run apisix version): 3.14.1
  • Operating system (run uname -a): Linux industry-apisix-fix-6f76685b76-4dbd5 3.10.0-957.el7.x86_64 change: added doc of how to load plugin. #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 GNU/Linux
  • OpenResty / Nginx version (run openresty -V or nginx -V):
    nginx version: openresty/1.27.1.2
    built by gcc 10.2.1 20210110 (Debian 10.2.1-6)
    built with OpenSSL 3.4.1 11 Feb 2025
    TLS SNI support enabled
    configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -DAPISIX_RUNTIME_VER=1.3.2 -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl3/include' --add-module=../ngx_devel_kit-0.3.3 --add-module=../echo-nginx-module-0.63 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.33 --add-module=../ngx_lua-0.10.28 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.37 --add-module=../array-var-nginx-module-0.06 --add-module=../memc-nginx-module-0.20 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.9 --add-module=../ngx_stream_lua-0.0.16 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -Wl,-rpath,/usr/local/openresty/wasmtime-c-api/lib -L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl3/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl3/lib' --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../mod_dubbo-1.0.2 --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../ngx_multi_upstream_module-1.3.2 --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../apisix-nginx-module-1.19.2 --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../apisix-nginx-module-1.19.2/src/stream --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../apisix-nginx-module-1.19.2/src/meta --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../wasm-nginx-module-0.7.0 --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../lua-var-nginx-module-v0.5.3 --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../lua-resty-events-0.2.0 --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --with-http_v3_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --with-http_ssl_module
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info): 3.4.16
  • APISIX Dashboard version, if relevant:
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    🏗 In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions