You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the context of Envoy Gateway, a reconciliation crash would have several undesired side affects:
last-known-good XDS caches would be deleted and not recovered after a restart
infra manager disrupted during infra reconciliation, possibly creating an inconsistent infra state where only some changes are applied
If a crash occurs during an upgrade, there is a risk that envoy proxies would be replaced (e.g. due to a new proxy version being used), but no configuration is provided by the control plane, leading to a complete outage for users.
Envoy Gateway should consider recovering from panics by default or allowing users to opt-in for panic recovery. If implemented, metrics should be provided to users, so that operators are made aware of the fact that XDS translation is broken.
The text was updated successfully, but these errors were encountered:
Description:
Currently, a panic in the reconciliation flow of Envoy Gateway will lead to EG crashing: #4291, #2661, #1830, #2882.
Controller frameworks like controller runtime and api-machinery provide the means to recover from panics:
In the context of Envoy Gateway, a reconciliation crash would have several undesired side affects:
If a crash occurs during an upgrade, there is a risk that envoy proxies would be replaced (e.g. due to a new proxy version being used), but no configuration is provided by the control plane, leading to a complete outage for users.
Envoy Gateway should consider recovering from panics by default or allowing users to opt-in for panic recovery. If implemented, metrics should be provided to users, so that operators are made aware of the fact that XDS translation is broken.
The text was updated successfully, but these errors were encountered: