Closes #204 - Treat list volume tokens as opaque #205

Akrog · 2019-06-28T09:03:23Z

What type of PR is this?
/kind bug

What this PR does / why we need it:
When listing volumes using pagination we must treat the "next_token"
field as opaque data and not make assumptions about what's in it.

This patch removes tests that assume tokens are integers and that they
represent the number of volumes that have been listed, replacing one of
them with one to check that pagination works as expected when new
volumes are created and old ones deleted between page requests.

Which issue(s) this PR fixes:
Fixes #204

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Remove volume listing tests that assumed tokens were integers and replace them with one that confirms pagination works when adding/deleting volumes between page requests.

k8s-ci-robot · 2019-06-28T09:03:24Z

Welcome @Akrog!

It looks like this is your first PR to kubernetes-csi/csi-test 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/csi-test has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2019-06-28T09:03:37Z

Hi @Akrog. Thanks for your PR.

I'm waiting for a kubernetes-csi or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Akrog · 2019-06-28T10:42:41Z

This CI failure is caused by a deficient CSI pagination implementation, as it doesn't play well with volumes being created/deleted in between page requests.

I'll relax the test, as there can be plugins out there with this kind of implementation and they don't go strictly speaking against the current CSI spec. Though, in my opinion doing a listing and not getting all existing volumes is a bad thing.

When listing volumes using pagination we must treat the "next_token" field as opaque data and not make assumptions about what's in it. This patch removes tests that assume tokens are integers and that they represent the number of volumes that have been listed, replacing one of them with one to check that pagination works as expected when new volumes are created and old ones deleted between page requests. This closes issue 204

There is a newer version of the csi-sanity test suite that comes with additional tests. This patch updates the binary in the repository to the latest version. Upstream version of csi-sanity is making incorrect assumptions on how pagination should work [1], so we cannot use that version until the fix [2] has been merged. We use our own compiled version instead. [1]: kubernetes-csi/csi-test#204 [2]: kubernetes-csi/csi-test#205

pohly · 2019-07-01T07:08:13Z

This CI failure is caused by a deficient CSI pagination implementation, as it doesn't play well with volumes being created/deleted in between page requests.

I'll relax the test, as there can be plugins out there with this kind of implementation and they don't go strictly speaking against the current CSI spec. Though, in my opinion doing a listing and not getting all existing volumes is a bad thing.

The spec explicitly allows missing volumes: "If volumes are created and/or deleted while the CO is concurrently paging through ListVolumes results then it is possible that the CO MAY either witness duplicate volumes in the list, not witness existing volumes, or both." So yes, the test had to be relaxed.

At first glance it looks like it would be nice if the result was consistent. However, that makes the pagination support more complex to implement, so perhaps that's why the spec authors decided to not require it.

/ok-to-test

Akrog · 2019-07-01T09:59:03Z

This CI failure is caused by a deficient CSI pagination implementation, as it doesn't play well with volumes being created/deleted in between page requests.
I'll relax the test, as there can be plugins out there with this kind of implementation and they don't go strictly speaking against the current CSI spec. Though, in my opinion doing a listing and not getting all existing volumes is a bad thing.

The spec explicitly allows missing volumes: "If volumes are created and/or deleted while the CO is concurrently paging through ListVolumes results then it is possible that the CO MAY either witness duplicate volumes in the list, not witness existing volumes, or both." So yes, the test had to be relaxed.

At first glance it looks like it would be nice if the result was consistent. However, that makes the pagination support more complex to implement, so perhaps that's why the spec authors decided to not require it.

/ok-to-test

My bad, I had completely forgotten about that part of the spec.

In my opinion those behaviors effectively turn pagination into something useless. In a running environment it may happen that there are volumes you never get to see if you use listings with pagination.

But that is a completely different conversation unrelated to this project and the testing of CSI plugins. ;-)

pohly · 2019-07-01T11:42:09Z

Gorka Eguileor <notifications@github.com> writes:

In my opinion those behaviors effectively turn pagination into something useless. In a running environment it may happen that there are volumes you never get to see if you use listings with pagination.

Even without pagination a CO may miss volumes that get created shortly after issuing the ListVolumes call. I suppose it just has to do that at regular intervals, then eventually it'll see all volumes.

pohly · 2019-07-02T15:00:27Z

pkg/sanity/controller.go

 			Expect(serverError.Code()).To(Equal(codes.Aborted))
 		})

-		It("should fail when the starting_token is greater than total number of vols", func() {


Why was your driver failing this test? <totalVols + 5> isn't a valid starting token, or is it?

Would it be better to pass a random UUID?

This test is incorrect because the CSI spec defines tokens as strings, and these strings may not be representations of integers (spec doesn't say anything about the token format, so these are plugin specific).

And even if they were integers, they may not indicate the numbers of created volumes. For example they could be integers representing a timestamp.

So the only thing we can tell about tokens is that a weird string will probably fail (we test this in the previous test), and that a token returned by the plugin must be acceptable and not fail when passes as starting point for the next pagination request, even if the volumes have changed.

This test is incorrect because the CSI spec defines tokens as strings, and these strings may not be representations of integers

But the spec also doesn't say that the token must not be the representation of integers. The wording of the test isn't good, but the check itself makes sense.

The test should be called "should fail when the starting_token is invalid", and then we need a good way of creating an invalid token.

That is correct, the spec doesn't say that the token cannot be an integer. In Ember-CSI our tokens are integers representing the volume creation time so we can use them to filter volumes and make sure we don't return duplicates and always return newer volumes.

The test you mention already exists, it's the one before this one, L185-L198. Although it is possible that "invalid-token" could be a valid token for some plugin.

The test you mention already exists, it's the one before this one

Oh, right. In that case I agree, "should fail when the starting_token is greater than total number of vols" can be removed. It might be a relevant test case for the CSI hostpath driver, but that's no reason to have it here.

The risk is too high that the integer actually is treated as a valid token by some driver, which then fails the test. Is that what happened with your driver?

Exactly, any integer number is a valid token for Ember-CSI.

msau42 · 2019-07-02T16:06:09Z

There's always going to be races between List and Create/Delete. A reconciling architecture helps address this and avoids complex serialization/coordination.

Akrog · 2019-07-03T09:03:57Z

There's always going to be races between List and Create/Delete. A reconciling architecture helps address this and avoids complex serialization/coordination.

Yes, but there are ways to minimize the impact of these, for example ordering your volumes by creation time and returning a token with the time of the last volume returned. That way you won't return duplicates on pagination and you will automatically return newer volumes.

A driver that uses integers that represent the number of volumes returned as tokens is a pretty bad implementation, as it is subject to both returning duplicates (depending on the order used to list the volumes) and missing volumes.

With a big enough number of volumes and enough create/delete operations you could make many consecutive complete paginated list requests and still not have seen some of the existing volumes in all the complete lists you have obtained. But this is a plugin problem, since the spec allows it, and the specific plugin will get the bug if that happens.

There is a newer version of the csi-sanity test suite that comes with additional tests. This patch updates the binary in the repository to the latest version. Upstream version of csi-sanity is making incorrect assumptions on how pagination should work [1], so we cannot use that version until the fix [2] has been merged. We use our own compiled version instead. [1]: kubernetes-csi/csi-test#204 [2]: kubernetes-csi/csi-test#205

pohly

/lgtm

pohly · 2019-07-16T15:13:49Z

/approve

k8s-ci-robot · 2019-07-16T15:13:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Akrog, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [pohly]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

There is a newer version of the csi-sanity test suite that comes with additional tests. This patch updates the binary in the repository to the latest version. Latest tagged release of csi-sanity is making incorrect assumptions on how pagination should work [1], so we cannot use that version until the fix [2] has been released. We use our own compiled version instead. [1]: kubernetes-csi/csi-test#204 [2]: kubernetes-csi/csi-test#205

Closes kubernetes-csi#204 - Treat list volume tokens as opaque

k8s-ci-robot requested review from jsafrane and saad-ali June 28, 2019 09:03

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 28, 2019

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 28, 2019

Akrog force-pushed the fix-vol-list-token branch from e05b7b0 to 6b83ea8 Compare June 28, 2019 09:05

k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jun 28, 2019

Akrog force-pushed the fix-vol-list-token branch from 6b83ea8 to ecad573 Compare June 28, 2019 10:14

Akrog force-pushed the fix-vol-list-token branch from ecad573 to 9e975df Compare June 28, 2019 10:52

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 1, 2019

pohly reviewed Jul 2, 2019

View reviewed changes

pohly approved these changes Jul 16, 2019

View reviewed changes

k8s-ci-robot assigned pohly Jul 16, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 16, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2019

k8s-ci-robot merged commit 42a3511 into kubernetes-csi:master Jul 16, 2019

k8s-ci-robot mentioned this pull request Sep 11, 2019

Pass ControllerExpandVolumeSecret info to the ExpandVolume CSI tests #216

Closed

suneeth51 pushed a commit to suneeth51/csi-test that referenced this pull request Sep 11, 2019

Merge pull request kubernetes-csi#205 from Akrog/fix-vol-list-token

8388ccb

Closes kubernetes-csi#204 - Treat list volume tokens as opaque

pohly mentioned this pull request Oct 10, 2019

WIP: Kubernetes v1.16 update intel/pmem-csi#424

Closed

alakae mentioned this pull request Apr 15, 2020

Update Spec to 1.1 and Implement Resize cloudscale-ch/csi-cloudscale#15

Merged

Closes #204 - Treat list volume tokens as opaque #205

Closes #204 - Treat list volume tokens as opaque #205

Uh oh!

Conversation

Akrog commented Jun 28, 2019

Uh oh!

k8s-ci-robot commented Jun 28, 2019

Uh oh!

k8s-ci-robot commented Jun 28, 2019

Uh oh!

Akrog commented Jun 28, 2019

Uh oh!

pohly commented Jul 1, 2019

Uh oh!

Akrog commented Jul 1, 2019

Uh oh!

pohly commented Jul 1, 2019 via email

Uh oh!

pohly Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

Akrog Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

pohly Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

Akrog Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

pohly Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

Akrog Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

msau42 commented Jul 2, 2019

Uh oh!

Akrog commented Jul 3, 2019

Uh oh!

pohly left a comment

Choose a reason for hiding this comment

Uh oh!

pohly commented Jul 16, 2019

Uh oh!

k8s-ci-robot commented Jul 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants