Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HTTPRoute BackendRef docs to clarify error behavior #1243

Merged
merged 12 commits into from
Jul 12, 2022

Conversation

youngnick
Copy link
Contributor

What type of PR is this?
/kind cleanup
/kind documentation

What this PR does / why we need it:

  • Moves reasons for an individual BackendRef to be invalid so they are included in the HTTPBackendRef struct docs ( the BackendRef field).
  • Clarifies rules for handling invalid BackendRefs in the BackendRefs field of the HTTPRouteRule struct.

Which issue(s) this PR fixes:

Fixes #1211
Updates #1112

Does this PR introduce a user-facing change?:

The rules for BackendRefs have been updated to make clear that invalid BackendRefs must produce HTTP response codes of 500 in most cases.

This PR is still WIP, since I will need to change the conformance tests to check for the behavior. But I wanted to check that we really are in agreement on this behavior while I work on those.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/documentation Categorizes issue or PR as related to documentation. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 30, 2022
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 30, 2022
//
// A BackendRef is considered invalid when it refers to:
// If *all* entries in BackendRefs are invalid, and there are also no filters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this saying something different than "When a HTTPBackendRef is invalid, 500 status codes MUST be returned for"? They seem the same to me, and I wasn't sure if there was some subtle difference I was missing or if we were just re-iterating? Especially the "...also no filters..." part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really in this update, no, but there are a few places where "no valid backends" and "some valid backends" were treated differently, and I wanted to make doubly sure that there's no confusion.

Copy link
Contributor

@mikemorris mikemorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review the proposed docs changes more closely once this is out of draft, but suggested a few tweaks to the conformance tests for now.

Tagging @skriss for review as I'm suggesting restoring the route acceptance check that was removed in #1170 (review)

@youngnick
Copy link
Contributor Author

I updated the conformance tests quite a bit, because the changes to the docs imply some reasonably large changes:

  • Having all backends invalid no longer causes a Route not to be attached
  • In all the cases I've changed, having one invalid backend requires the presence of a ResolvedRefs Condition with status False and varying Reasons, so I've codified that into the conformance tests.
  • The existing "check parent for these conditions" helper didn't seem to quite fit, so I've moved to using the Attached Check, with a new HTTPRouteMustHaveCondition check.
  • I've also updated the findConditionInList helper to require and check for a Reason.

So this is a much bigger conformance change than I anticipated. I've left this as WIP for now, because I think there's a couple more cases to test to be sure I've got everything I've changed in the docs, and I want to make all this behavior super clear, since it will have knock-on effects to #1112 and Inclusion/Delegation.

@youngnick youngnick marked this pull request as ready for review July 5, 2022 01:27
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2022
@k8s-ci-robot k8s-ci-robot requested a review from hbagdi July 5, 2022 01:27
@youngnick
Copy link
Contributor Author

Okay, I think I've covered everything I can for now in here, so marking this ready for review.

The one case I haven't written a conformance test for is the "two backendRefs where one is invalid means that half of the traffic gets a 500", since I think we'll need some big helper changes for that, and I think we should talk about how we do that. I'll open an issue.

@youngnick youngnick marked this pull request as draft July 5, 2022 06:39
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2022
@youngnick
Copy link
Contributor Author

youngnick commented Jul 5, 2022

Nope, I missed a bunch of things, fixing them first. Pardon my dust.

I think that's everything.

@youngnick youngnick marked this pull request as ready for review July 5, 2022 07:42
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2022
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @youngnick! This mostly LGTM, but I need to spend a bit more time to run through conformance tests.

apis/v1alpha2/httproute_types.go Outdated Show resolved Hide resolved
Comment on lines 225 to 226
// When a HTTPBackendRef refers to a Service that has no ready endpoints,
// implementations MAY return a 503 for requests to that backend instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should this be interpreted? Is it valid to return 500 or 503 in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it's not very clear. I guess our two options are either make it mandatory, or just mandate 500 for everything. I'm leaning towards the latter, so I'll just remove this.

apis/v1alpha2/shared_types.go Outdated Show resolved Hide resolved
apis/v1beta1/httproute_types.go Show resolved Hide resolved
Comment on lines 346 to 348
// HTTPRouteMustHaveConditions checks that the supplied HTTPRoute has the supplied Condition,
// halting after the specified timeout is exceeded.
func HTTPRouteMustHaveCondition(t *testing.T, client client.Client, routeNN types.NamespacedName, condition metav1.Condition, seconds int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we probably want to specify which parent this condition should exist for. Otherwise we could get rather imprecise results if more than one parent exists. This is not that common in conformance tests now, but it probably will become more common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I changed this to take the Gateway name it should be checking for.

if cond.Status == metav1.ConditionStatus(condValue) {
return true
if cond.Reason == condReason {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing a couple of tests start failing with this PR that I think should still pass:

  • httproute-disallowed-kind
  • httproute-invalid-cross-namespace-parent-ref

These both use the HTTPRouteMustHaveNoAcceptedParents function, which asserts that a Route has either no Accepted condition, or an "Accepted: false" condition.

At least in Contour's implementation, we're setting an "Accepted: false" condition in both of these scenarios. The problem arises because HTTPRouteMustHaveNoAcceptedParents does not pass a reason when checking for an "Accepted: false" condition, so findConditionsInList expects the actual condition to have an empty reason.

Not sure what the best fix is here, but one option would be to consider an empty reason string to mean "match any reason" instead of "match an empty string"? Otherwise, we might have to plumb a reason through from the place where HTTPRouteMustHaveNoAcceptedParents is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that having an empty string passed for Reason can mean "match any Reason", since Reason strings should not normally be empty.

@youngnick youngnick added this to the v0.5.0 milestone Jul 11, 2022
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @youngnick! This mostly LGTM, just a couple comments.

@@ -81,7 +81,7 @@ func GWCMustBeAccepted(t *testing.T, c client.Client, gwcName string, seconds in
}

controllerName = string(gwc.Spec.ControllerName)
return findConditionInList(t, gwc.Status.Conditions, "Accepted", "True"), nil
return findConditionInList(t, gwc.Status.Conditions, "Accepted", "True", "Accepted"), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to require reason to exactly match the condition name when it is in a success state. For example, when Istio sets the Ready condition, it sets the reason to "ListenersValid". Adding this kind of change will cause any implementations like that to immediately fail all conformance tests since this is part of the fundamental check if the base resources are ready. I think we could consider adding this check in the future if interested, but would prefer to leave it open to any reason now to avoid significant disruption before a release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to the empty string, with a comment.

@@ -106,7 +106,7 @@ func NamespacesMustBeReady(t *testing.T, c client.Client, namespaces []string, s
t.Errorf("Error listing Gateways: %v", err)
}
for _, gw := range gwList.Items {
if !findConditionInList(t, gw.Status.Conditions, "Ready", "True") {
if !findConditionInList(t, gw.Status.Conditions, "Ready", "True", "Ready") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, would prefer "" instead of "Ready" as the last arg. (If I understand that code, that will allow any reason).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to the empty string, with a comment.

apis/v1beta1/httproute_types.go Show resolved Hide resolved
Copy link
Contributor

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @youngnick, this LGTM

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: skriss, youngnick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2022
Nick Young added 11 commits July 12, 2022 05:20
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2022
@robscott robscott added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jul 12, 2022
@robscott
Copy link
Member

Thanks @youngnick!

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conflicting guidance when a HTTPRoute refers to an invalid backend
7 participants