-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Current Behavior
The Apisix upstream nodes are configured to use domain names. The DNS service is malfunctioning. During the period of the failure, accessing the interface results in a 503 error. Even after the DNS service is restored, accessing the interface still returns a 503 error. The upstream service test is ok. After restarting Apixix or changing the upstream service IP, accessing the system will no longer return the 503 error.
We found that it was caused by the failure to update upstream.nodes. The key code is as follows:
utils/upstream.lua
local function compare_upstream_node(up_conf, new_t)
......
if up_conf.original_nodes then
-- if original_nodes is set, it means that the upstream nodes
-- are changed by `fill_node_info`, so we need to compare the new nodes with the
-- original nodes.
old_t = up_conf.original_nodes -- !!Line#57: If the original_nodes and new_nodes are the same, this function will return true.
-- However, up_conf.nodes may be empty. It will only be updated when new_t undergo any changes.
end
if #new_t ~= #old_t then
return false
end
core.table.sort(old_t, sort_by_key_host)
core.table.sort(new_t, sort_by_key_host)
for i = 1, #new_t do
local new_node = new_t[i]
local old_node = old_t[i]
for _, name in ipairs({"host", "port", "weight", "priority", "metadata"}) do
if new_node[name] ~= old_node[name] then
return false
end
end
end
.....
end
.....
function _M.parse_domain_in_up(up)
local nodes = up.value.dns_nodes
local new_nodes, err = parse_domain_for_nodes(nodes)
if not new_nodes then
return nil, err
end
local ok = compare_upstream_node(up.value, new_nodes)
if ok then
return up
end
if not up.orig_modifiedIndex then
up.orig_modifiedIndex = up.modifiedIndex
end
up.modifiedIndex = up.orig_modifiedIndex .. "#" .. ngx_now()
up.value.nodes = new_nodes
core.log.info("resolve upstream which contain domain: ",
core.json.delay_encode(up, true))
return up
end
....
Expected Behavior
After the DNS service was restored, accessing the website did not result in a 503 error.
Error Logs
Below is a portion of the error logs after DNS recovery. It can be seen that the 'nodes' in the 'upstream' obtained by the 'get_by_id()' function is empty.
2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] ctx.lua:280: __index(): serving ctx value from cache for key: uri
2026/02/05 03:42:27 [info] 49#49: *3721 [lua] trusted-addresses.lua:46: is_trusted(): trusted_addresses_matcher is not initialized, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"
2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] ctx.lua:280: __index(): serving ctx value from cache for key: scheme
2026/02/05 03:42:27 [info] 49#49: *3721 [lua] radixtree_host_uri.lua:161: match(): route match mode: radixtree_host_uri, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"
2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] ctx.lua:280: __index(): serving ctx value from cache for key: host
2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] ctx.lua:280: __index(): serving ctx value from cache for key: uri
2026/02/05 03:42:27 [info] 49#49: *3721 [lua] init.lua:732: http_access_phase(): matched route: {"key":"/apisix/routes/122222","has_domain":false,"value":{"upstream_id":"122222","name":"health-check","update_time":1770262481,"uris":["/health"],"methods":["HEAD"],"id":"122222","status":1,"create_time":1770262326,"priority":10},"createdIndex":52653,"clean_handlers":{},"orig_modifiedIndex":52656,"modifiedIndex":52656}, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"
2026/02/05 03:42:27 [info] 49#49: *3721 [lua] client.lua:123: dns_parse(): dns resolve demo-service.middleware.svc.cluster.local, result: {"ttl":30,"address":"10.233.31.19","section":1,"name":"demo-service.middleware.svc.cluster.local","type":1,"class":1}, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"
2026/02/05 03:42:27 [info] 49#49: *3721 [lua] resolver.lua:84: parse_domain(): parse addr: {"ttl":30,"class":1,"section":1,"name":"demo-service.middleware.svc.cluster.local","type":1,"address":"10.233.31.19"}, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"
2026/02/05 03:42:27 [info] 49#49: *3721 [lua] resolver.lua:85: parse_domain(): resolver: ["169.254.25.10"], client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"
2026/02/05 03:42:27 [info] 49#49: *3721 [lua] resolver.lua:86: parse_domain(): host: demo-service.middleware.svc.cluster.local, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"
2026/02/05 03:42:27 [info] 49#49: *3721 [lua] resolver.lua:88: parse_domain(): dns resolver domain: demo-service.middleware.svc.cluster.local to 10.233.31.19, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"
2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] upstream.lua:49: compare_upstream_node(): compare upstream nodes by value, old: table: 0x7fb122ec0ca0 {}new: table: 0x7fb122eb44a0 [{"port":80,"host":"10.233.31.19","weight":1,"domain":"demo-service.middleware.svc.cluster.local"}]
2026/02/05 03:42:27 [info] 49#49: *3721 [lua] upstream.lua:552: get_by_id(): parsed upstream: {"key":"/apisix/upstreams/122222","clean_handlers":{},"value":{"create_time":1770262326,"resource_id":"122222","nodes":{},"scheme":"http","dns_nodes":[{"host":"demo-service.middleware.svc.cluster.local","weight":1,"port":80}],"update_time":1770262481,"name":"health-check","pass_host":"pass","resource_version":52655,"id":"122222","hash_on":"vars","resource_key":"/apisix/upstreams/122222","nodes_ref":[{"upstream_host":"demo-service.middleware.svc.cluster.local","domain":"demo-service.middleware.svc.cluster.local","port":80,"host":"10.233.31.19","weight":1,"priority":0}],"type":"roundrobin","original_nodes":[{"port":80,"host":"10.233.31.19","weight":1,"domain":"demo-service.middleware.svc.cluster.local"}]},"createdIndex":52652,"orig_modifiedIndex":52655,"has_domain":true,"modifiedIndex":"52655#1770262936.541"}, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"
2026/02/05 03:42:27 [error] 49#49: *3721 [lua] init.lua:565: handle_upstream(): failed to set upstream: no valid upstream node, client: 172.18.166.122, server: _, request: "HEAD /health HTTP/1.1", host: "172.18.166.122:30404"
2026/02/05 03:42:27 [debug] 49#49: *3721 [lua] ctx.lua:280: __index(): serving ctx value from cache for key: apisix_upstream_response_time
2026/02/05 03:42:27 [info] 49#49: *3721 client 172.18.166.122 closed keepalive connection
Steps to Reproduce
1.Run Apisix in k8s cluster
2.Create test server
kind: Deployment
apiVersion: apps/v1
metadata:
name: httpbin
namespace: <Your Namespace>
labels:
app: httpbin
spec:
replicas: 1
selector:
matchLabels:
app: httpbin
template:
metadata:
labels:
app: httpbin
spec:
containers:
- name: container-vki028
image: 'kennethreitz/httpbin'
ports:
- name: http-80
containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: default
serviceAccount: default
securityContext: {}
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
---
kind: Service
apiVersion: v1
metadata:
name: httpbin
namespace: <Your Namespace>
labels:
app: httpbin
spec:
ports:
- name: http-80
protocol: TCP
port: 80
targetPort: 80
selector:
app: httpbin
type: ClusterIP
sessionAffinity: None3.Create test route and upstream
/apisix/upstreams/22222
{
"nodes": [
{
"host": "httpbin.<Your Namespace>.svc.cluster.local",
"port": 80,
"weight": 1
}
],
"type": "roundrobin",
"name": "httpbin-anything"
}/apisix/routes/22222
{
"uris": [
"/anything/*"
],
"name": "httpbin-anything",
"priority": 10,
"methods": [
"HEAD"
],
"upstream_id": "22222",
"status": 1
}3.Test request
while true;
do
curl <Apisix Server Address>/anything/health -I -s;
sleep 1;
done4.backup and delete svc
kubectl get svc httpbin -n <Your Namespace> -o yaml > svc.yaml
kubectl delete -f svc.yaml5.Recreate svc after all requests retrun 503 error
kubectl apply -f svc.yaml6.Apisix always return 503 error
Environment
- APISIX version (run
apisix version): 3.14.1 - Operating system (run
uname -a): Linux industry-apisix-fix-6f76685b76-4dbd5 3.10.0-957.el7.x86_64 change: added doc of how to load plugin. #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 GNU/Linux - OpenResty / Nginx version (run
openresty -Vornginx -V):
nginx version: openresty/1.27.1.2
built by gcc 10.2.1 20210110 (Debian 10.2.1-6)
built with OpenSSL 3.4.1 11 Feb 2025
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -DAPISIX_RUNTIME_VER=1.3.2 -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl3/include' --add-module=../ngx_devel_kit-0.3.3 --add-module=../echo-nginx-module-0.63 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.33 --add-module=../ngx_lua-0.10.28 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.37 --add-module=../array-var-nginx-module-0.06 --add-module=../memc-nginx-module-0.20 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.9 --add-module=../ngx_stream_lua-0.0.16 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -Wl,-rpath,/usr/local/openresty/wasmtime-c-api/lib -L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl3/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl3/lib' --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../mod_dubbo-1.0.2 --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../ngx_multi_upstream_module-1.3.2 --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../apisix-nginx-module-1.19.2 --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../apisix-nginx-module-1.19.2/src/stream --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../apisix-nginx-module-1.19.2/src/meta --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../wasm-nginx-module-0.7.0 --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../lua-var-nginx-module-v0.5.3 --add-module=/tmp/tmp.qoDwRwlWdX/openresty-1.27.1.2/../lua-resty-events-0.2.0 --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --with-http_v3_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --with-http_ssl_module - etcd version, if relevant (run
curl http://127.0.0.1:9090/v1/server_info): 3.4.16 - APISIX Dashboard version, if relevant:
- Plugin runner version, for issues related to plugin runners:
- LuaRocks version, for installation issues (run
luarocks --version):
Metadata
Metadata
Assignees
Labels
Type
Projects
Status