-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When gateway discovery is enabled, the liveness probe should be 👍 even if the adminapi service has 0 ready endpoints #3592
Comments
I feel kinda iffy on this because this is a case where failing, even if failing in a rather non-specific way, probably makes sense--it's a bit of a "yes, technically code can try to handle it, but probably if you do manage to wind up in this situation it's more a 'tell whomever did it not to do that/explain why' scenario". While the controller does enter crashloop backoff if you start it when no Kong instances will ever become ready, that's probably okay. If there are no Kong instances ready, we can make KIC become live despite, but KIC won't be able to actually do anything in that state. KIC will happily go live and then do nothing forever, because until you fix the lack of ready Kong instances, there's nothing for KIC to push to. Documentation and examples should avoid this. We're saying "if you're using discovery, deploy your Kong instance and point KIC to it" as the happy path for discovery mode, and we expect Kong instances to come online under normal circumstances. While you could deploy KIC and a Kong Service, but no Kong Deployment for that Service, or a broken Kong Deployment for that, that's a bit of a contrived situation where you know it'll break in a particular way. We, the application authors, know that you can create this situation, but it doesn't feel like something end users would naturally do on their own--absent evidence that users are taking that strange path, we can reasonably expect that most won't. Crash loop backoff is a reasonable approach for handling odd situations. Kong may fail to come online quickly and send the controller into backoff, but backoff isn't dead, it's just increasingly delayed retries. Hypothetically, you may install KIC and a Kong Service with no live endpoints, take an hour lunch break, come back, create live Kong endpoints, and then bemuse over why KIC doesn't instantly start pushing configuration, but that seems somewhat unlikely. If you decide to wait an hour, backoff will eventually restart KIC and KIC will come online successfully on its own. In practice I'd expect users to maybe initially install a broken state, recognize it's broken shortly after, and then either ask for assistance or redo the entire thing to get back into a happy state. Do we indeed have stories where we expect this scenario will likely happen in practice, and where we definitely need code logic to recover from it automatically? This feels like a situation that is within the realm of technical possibility, but where realistically your environment is so borked anyway that we don't necessarily need targeted automatic recovery. Hitting CrashLoopBackoff here is a reasonable "yes, your envrionment is broken, you need to fix several things to make it not broken, and having done so you've probably re-rolled your KIC Deployment anyway". |
Considering the behavior when KIC get 0 endpoints of kong admin service on initialization of kong clients, now we can have the 2 options:
Actually, it is possible that Kong version changed during KIC running: customers may upgrade kong gateway. After kong gateway is upgraded, the kong version will change from the initial versions. May be this should be considered together with #3590. |
Is there an existing issue for this?
Problem Statement
Today KIC (with GW discovery enabled) won't make the liveness probe "healthy" if the proxy service has 0 endpoints.
It does not make sense to restart the KIC pod if it's Gateways that is down.
Proposed Solution
Currently the last bullet point does not hold.
Additional information
No response
Acceptance Criteria
The text was updated successfully, but these errors were encountered: