-
Notifications
You must be signed in to change notification settings - Fork 152
Closes #204 - Treat list volume tokens as opaque #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Welcome @Akrog! |
|
Hi @Akrog. Thanks for your PR. I'm waiting for a kubernetes-csi or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
e05b7b0 to
6b83ea8
Compare
6b83ea8 to
ecad573
Compare
|
This CI failure is caused by a deficient CSI pagination implementation, as it doesn't play well with volumes being created/deleted in between page requests. I'll relax the test, as there can be plugins out there with this kind of implementation and they don't go strictly speaking against the current CSI spec. Though, in my opinion doing a listing and not getting all existing volumes is a bad thing. |
When listing volumes using pagination we must treat the "next_token" field as opaque data and not make assumptions about what's in it. This patch removes tests that assume tokens are integers and that they represent the number of volumes that have been listed, replacing one of them with one to check that pagination works as expected when new volumes are created and old ones deleted between page requests. This closes issue 204
ecad573 to
9e975df
Compare
There is a newer version of the csi-sanity test suite that comes with additional tests. This patch updates the binary in the repository to the latest version. Upstream version of csi-sanity is making incorrect assumptions on how pagination should work [1], so we cannot use that version until the fix [2] has been merged. We use our own compiled version instead. [1]: kubernetes-csi/csi-test#204 [2]: kubernetes-csi/csi-test#205
The spec explicitly allows missing volumes: "If volumes are created and/or deleted while the CO is concurrently paging through ListVolumes results then it is possible that the CO MAY either witness duplicate volumes in the list, not witness existing volumes, or both." So yes, the test had to be relaxed. At first glance it looks like it would be nice if the result was consistent. However, that makes the pagination support more complex to implement, so perhaps that's why the spec authors decided to not require it. /ok-to-test |
My bad, I had completely forgotten about that part of the spec. In my opinion those behaviors effectively turn pagination into something useless. In a running environment it may happen that there are volumes you never get to see if you use listings with pagination. But that is a completely different conversation unrelated to this project and the testing of CSI plugins. ;-) |
|
Gorka Eguileor <notifications@github.com> writes:
In my opinion those behaviors effectively turn pagination into
something useless. In a running environment it may happen that there
are volumes you never get to see if you use listings with pagination.
Even without pagination a CO may miss volumes that get created shortly
after issuing the ListVolumes call. I suppose it just has to do that at
regular intervals, then eventually it'll see all volumes.
|
| Expect(serverError.Code()).To(Equal(codes.Aborted)) | ||
| }) | ||
|
|
||
| It("should fail when the starting_token is greater than total number of vols", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was your driver failing this test? <totalVols + 5> isn't a valid starting token, or is it?
Would it be better to pass a random UUID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is incorrect because the CSI spec defines tokens as strings, and these strings may not be representations of integers (spec doesn't say anything about the token format, so these are plugin specific).
And even if they were integers, they may not indicate the numbers of created volumes. For example they could be integers representing a timestamp.
So the only thing we can tell about tokens is that a weird string will probably fail (we test this in the previous test), and that a token returned by the plugin must be acceptable and not fail when passes as starting point for the next pagination request, even if the volumes have changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is incorrect because the CSI spec defines tokens as strings, and these strings may not be representations of integers
But the spec also doesn't say that the token must not be the representation of integers. The wording of the test isn't good, but the check itself makes sense.
The test should be called "should fail when the starting_token is invalid", and then we need a good way of creating an invalid token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, the spec doesn't say that the token cannot be an integer. In Ember-CSI our tokens are integers representing the volume creation time so we can use them to filter volumes and make sure we don't return duplicates and always return newer volumes.
The test you mention already exists, it's the one before this one, L185-L198. Although it is possible that "invalid-token" could be a valid token for some plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test you mention already exists, it's the one before this one
Oh, right. In that case I agree, "should fail when the starting_token is greater than total number of vols" can be removed. It might be a relevant test case for the CSI hostpath driver, but that's no reason to have it here.
The risk is too high that the integer actually is treated as a valid token by some driver, which then fails the test. Is that what happened with your driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, any integer number is a valid token for Ember-CSI.
|
There's always going to be races between List and Create/Delete. A reconciling architecture helps address this and avoids complex serialization/coordination. |
Yes, but there are ways to minimize the impact of these, for example ordering your volumes by creation time and returning a token with the time of the last volume returned. That way you won't return duplicates on pagination and you will automatically return newer volumes. A driver that uses integers that represent the number of volumes returned as tokens is a pretty bad implementation, as it is subject to both returning duplicates (depending on the order used to list the volumes) and missing volumes. With a big enough number of volumes and enough create/delete operations you could make many consecutive complete paginated list requests and still not have seen some of the existing volumes in all the complete lists you have obtained. But this is a plugin problem, since the spec allows it, and the specific plugin will get the bug if that happens. |
There is a newer version of the csi-sanity test suite that comes with additional tests. This patch updates the binary in the repository to the latest version. Upstream version of csi-sanity is making incorrect assumptions on how pagination should work [1], so we cannot use that version until the fix [2] has been merged. We use our own compiled version instead. [1]: kubernetes-csi/csi-test#204 [2]: kubernetes-csi/csi-test#205
There is a newer version of the csi-sanity test suite that comes with additional tests. This patch updates the binary in the repository to the latest version. Upstream version of csi-sanity is making incorrect assumptions on how pagination should work [1], so we cannot use that version until the fix [2] has been merged. We use our own compiled version instead. [1]: kubernetes-csi/csi-test#204 [2]: kubernetes-csi/csi-test#205
There is a newer version of the csi-sanity test suite that comes with additional tests. This patch updates the binary in the repository to the latest version. Upstream version of csi-sanity is making incorrect assumptions on how pagination should work [1], so we cannot use that version until the fix [2] has been merged. We use our own compiled version instead. [1]: kubernetes-csi/csi-test#204 [2]: kubernetes-csi/csi-test#205
pohly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Akrog, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There is a newer version of the csi-sanity test suite that comes with additional tests. This patch updates the binary in the repository to the latest version. Latest tagged release of csi-sanity is making incorrect assumptions on how pagination should work [1], so we cannot use that version until the fix [2] has been released. We use our own compiled version instead. [1]: kubernetes-csi/csi-test#204 [2]: kubernetes-csi/csi-test#205
Closes kubernetes-csi#204 - Treat list volume tokens as opaque
What type of PR is this?
/kind bug
What this PR does / why we need it:
When listing volumes using pagination we must treat the "next_token"
field as opaque data and not make assumptions about what's in it.
This patch removes tests that assume tokens are integers and that they
represent the number of volumes that have been listed, replacing one of
them with one to check that pagination works as expected when new
volumes are created and old ones deleted between page requests.
Which issue(s) this PR fixes:
Fixes #204
Special notes for your reviewer:
Does this PR introduce a user-facing change?: