-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: request failed with 502 when upstream is rolling update #632
Comments
@suninuni You may try to:
Look forward to knowing whether this can eliminate or mitigate this issue. We'll also try to optimize the process from the internal of apisix and apisix-ingress-controller. |
can you paste your ApisixRoute / ApisixUpstream CR ? |
Or you can add readiness configurations |
in APISIX, the retry mechanism is enabled by default and set the number of retries according to the number of available backend nodes. |
Let's wait @suninuni for his configuration snippets. |
@tao12345666333 If you mean the readiness of the pod, yes I have. AND @tokers @Donghui0 thanks for your replies. From the screenshot of Apisix's access log, I think the retry mechanism is worked. You can find some of requests are return 200 after retried 1 times (because I only set 2 replicas for the test service). So, I known if I have enough pods in the upstream, there will no failed requests after retried many times. And for the active and passive health check, they will reduce the failed number but not make it disappear.
Yes, this is what I want. For ingress-nginx-controller, it watch the changes of upstream nodes and updated them in memory. But for apisix-ingress-controller and apisix, the update process will be (in my opinion, if there is any error, please help me to point it out): apisix-ingress-controller watch the changes -> call Apisix api to update -> Apisix save to Etcd -> Other Apisix get new upstrems from Etcd. Compared to ingress-nginx-controller, this process is indeed a lot longer. |
That's right, we need further discussions about it. |
Looking at the third access log in the picture, it seems that the retry mechanism does not take effect. Only requested "10.32.176.134:80" once, but did not continue to request "10.32.137.94:80". Only passive healthchecks are not supported in the current apisix version. The balancer.create_server_picker method uses the LRU cache. If the checker.status_ver field is not updated by the active health check, the cache will become invalid for a long time (300s). A failed node cannot be requested again for a long period of time, even if it has been restored to a healthy state. |
I guess @suninuni suggestion is to allow the Healthcheck plugin to support passive checks alone. like nginx. |
Issue description
When upstream service is rolling update, Apisix can not update upstream nodes immediately which caused request failed with 502. This will not happen in ingress-nginx-controller(version 0.46 which is already cancelled reload nginx when upstream nodes changed.)
Environment
apisix-ingress-controller version --long
); 0.6.0kubectl version
); 1.21.0Minimal test code / Steps to reproduce the issue
What's the actual result? (including assertion message & call stack if applicable)
For Apisix, the Non-2xx responses will around 1000, but for ingress-nginx-controller, this will be 0.
What's the expected result?
No failed request when upstream nodes changed.
The text was updated successfully, but these errors were encountered: