-
Notifications
You must be signed in to change notification settings - Fork 679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable Contour to process RLS even if it's created after Contour startup #4155
Comments
Hi @skriss! I would love to work on this. |
Hey @vishal-chdhry - looking some more at this, I'm actually thinking we should leave this as-is / close this issue. While we could add logic to allow the rate limit service to be resolved as part of DAG builds, it's unclear what should be done if the service does not exist. Right now, Contour will crash if the configured rate limit service ref cannot be resolved at startup. If we moved to resolving that ref at DAG build time, then it would be harder to provide feedback to the user -- we wouldn't want to mark every HTTPProxy as "Invalid" since that would drop traffic, but just putting something in the log is not very visible either. Absent a bigger redesign here, I think the way we have it is probably the best approach for now. cc @sunjayBhatia, any thoughts on this? |
The big questions it seems are:
Some different scenarios: The ExtensionService resource becomes invalid (in the fields we validate at startup, which isn't everything) OR The ExtensionService is deleted The ExtensionService resource changes and is not invalid, just different OR The operator which ExtensionService resource is used for RLS This is a problem in general I'm realizing with named resources in a static config at startup (ConfigMap or CRD based) that aren't re-reconciled. The first case is especially problematic because there is even less intention here than the operator changing the ConfigMap or CRD to point to a different resource. The ExtensionService resource becomes invalid in another way From here if the RLS is unavailable Envoy will return a 500 response code, which is not bad, but does mean we effectively do drop traffic. This is actually sort of the same for ExtAuth config, we don't check the Status of the ExtensionService referenced in an HTTPProxy but do check that an ExtensionCluster is in the DAG (so effectively check if the referenced ExtensionService is valid). If not there, we mark the HTTPProxy as Invalid and continue processing which also drops traffic, but has an additional place Status is kept to notify users. |
So from the above: A static startup config that references a k8s resource (that could change), is something to think about in general, but again requires a bigger redesign most likely Given the RLS configuration is invalid, our current situation does have some observability for the operator and also does likely have a user impact. One could argue what we have is correct or in favor of instead just serving traffic as-is w/o trying to rate limit if config is invalid. Maybe that is a toggle option. Aside from that or a larger change, we could do some refinements during HTTPProxy processing to be more intentional about how things work when the RLS ExtensionService changes, but we haven't received any feedback (I can remember) from users that things are not working well so maybe no need to make changes. |
I have thoughts on how we might do this better in a more Gateway API-centric world, but that is a separate conversation topic. |
this is good to be closed in that case? 🤔 |
It would be neat to allow Contour to start and pick up on the extension service if its created later on.
Originally posted by @stevesloka in #4153 (comment)
The text was updated successfully, but these errors were encountered: