Skip to content

Conversation

@Akrog
Copy link
Contributor

@Akrog Akrog commented Jun 28, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
When listing volumes using pagination we must treat the "next_token"
field as opaque data and not make assumptions about what's in it.

This patch removes tests that assume tokens are integers and that they
represent the number of volumes that have been listed, replacing one of
them with one to check that pagination works as expected when new
volumes are created and old ones deleted between page requests.

Which issue(s) this PR fixes:
Fixes #204

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Remove volume listing tests that assumed tokens were integers and replace them with one that confirms pagination works when adding/deleting volumes between page requests.  

@k8s-ci-robot
Copy link
Contributor

Welcome @Akrog!

It looks like this is your first PR to kubernetes-csi/csi-test 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/csi-test has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 28, 2019
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 28, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @Akrog. Thanks for your PR.

I'm waiting for a kubernetes-csi or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 28, 2019
@Akrog Akrog force-pushed the fix-vol-list-token branch from e05b7b0 to 6b83ea8 Compare June 28, 2019 09:05
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jun 28, 2019
@Akrog Akrog force-pushed the fix-vol-list-token branch from 6b83ea8 to ecad573 Compare June 28, 2019 10:14
@Akrog
Copy link
Contributor Author

Akrog commented Jun 28, 2019

This CI failure is caused by a deficient CSI pagination implementation, as it doesn't play well with volumes being created/deleted in between page requests.

I'll relax the test, as there can be plugins out there with this kind of implementation and they don't go strictly speaking against the current CSI spec. Though, in my opinion doing a listing and not getting all existing volumes is a bad thing.

When listing volumes using pagination we must treat the "next_token"
field as opaque data and not make assumptions about what's in it.

This patch removes tests that assume tokens are integers and that they
represent the number of volumes that have been listed, replacing one of
them with one to check that pagination works as expected when new
volumes are created and old ones deleted between page requests.

This closes issue 204
@Akrog Akrog force-pushed the fix-vol-list-token branch from ecad573 to 9e975df Compare June 28, 2019 10:52
Akrog added a commit to Akrog/ember-csi that referenced this pull request Jun 28, 2019
There is a newer version of the csi-sanity test suite that comes with
additional tests.  This patch updates the binary in the repository to
the latest version.

Upstream version of csi-sanity is making incorrect assumptions on how
pagination should work [1], so we cannot use that version until the fix
[2] has been merged.  We use our own compiled version instead.

[1]: kubernetes-csi/csi-test#204
[2]: kubernetes-csi/csi-test#205
@pohly
Copy link
Contributor

pohly commented Jul 1, 2019

This CI failure is caused by a deficient CSI pagination implementation, as it doesn't play well with volumes being created/deleted in between page requests.

I'll relax the test, as there can be plugins out there with this kind of implementation and they don't go strictly speaking against the current CSI spec. Though, in my opinion doing a listing and not getting all existing volumes is a bad thing.

The spec explicitly allows missing volumes: "If volumes are created and/or deleted while the CO is concurrently paging through ListVolumes results then it is possible that the CO MAY either witness duplicate volumes in the list, not witness existing volumes, or both." So yes, the test had to be relaxed.

At first glance it looks like it would be nice if the result was consistent. However, that makes the pagination support more complex to implement, so perhaps that's why the spec authors decided to not require it.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 1, 2019
@Akrog
Copy link
Contributor Author

Akrog commented Jul 1, 2019

This CI failure is caused by a deficient CSI pagination implementation, as it doesn't play well with volumes being created/deleted in between page requests.
I'll relax the test, as there can be plugins out there with this kind of implementation and they don't go strictly speaking against the current CSI spec. Though, in my opinion doing a listing and not getting all existing volumes is a bad thing.

The spec explicitly allows missing volumes: "If volumes are created and/or deleted while the CO is concurrently paging through ListVolumes results then it is possible that the CO MAY either witness duplicate volumes in the list, not witness existing volumes, or both." So yes, the test had to be relaxed.

At first glance it looks like it would be nice if the result was consistent. However, that makes the pagination support more complex to implement, so perhaps that's why the spec authors decided to not require it.

/ok-to-test

My bad, I had completely forgotten about that part of the spec.

In my opinion those behaviors effectively turn pagination into something useless. In a running environment it may happen that there are volumes you never get to see if you use listings with pagination.

But that is a completely different conversation unrelated to this project and the testing of CSI plugins. ;-)

@pohly
Copy link
Contributor

pohly commented Jul 1, 2019 via email

Expect(serverError.Code()).To(Equal(codes.Aborted))
})

It("should fail when the starting_token is greater than total number of vols", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was your driver failing this test? <totalVols + 5> isn't a valid starting token, or is it?

Would it be better to pass a random UUID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is incorrect because the CSI spec defines tokens as strings, and these strings may not be representations of integers (spec doesn't say anything about the token format, so these are plugin specific).

And even if they were integers, they may not indicate the numbers of created volumes. For example they could be integers representing a timestamp.

So the only thing we can tell about tokens is that a weird string will probably fail (we test this in the previous test), and that a token returned by the plugin must be acceptable and not fail when passes as starting point for the next pagination request, even if the volumes have changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is incorrect because the CSI spec defines tokens as strings, and these strings may not be representations of integers

But the spec also doesn't say that the token must not be the representation of integers. The wording of the test isn't good, but the check itself makes sense.

The test should be called "should fail when the starting_token is invalid", and then we need a good way of creating an invalid token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, the spec doesn't say that the token cannot be an integer. In Ember-CSI our tokens are integers representing the volume creation time so we can use them to filter volumes and make sure we don't return duplicates and always return newer volumes.

The test you mention already exists, it's the one before this one, L185-L198. Although it is possible that "invalid-token" could be a valid token for some plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test you mention already exists, it's the one before this one

Oh, right. In that case I agree, "should fail when the starting_token is greater than total number of vols" can be removed. It might be a relevant test case for the CSI hostpath driver, but that's no reason to have it here.

The risk is too high that the integer actually is treated as a valid token by some driver, which then fails the test. Is that what happened with your driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, any integer number is a valid token for Ember-CSI.

@msau42
Copy link
Collaborator

msau42 commented Jul 2, 2019

There's always going to be races between List and Create/Delete. A reconciling architecture helps address this and avoids complex serialization/coordination.

@Akrog
Copy link
Contributor Author

Akrog commented Jul 3, 2019

There's always going to be races between List and Create/Delete. A reconciling architecture helps address this and avoids complex serialization/coordination.

Yes, but there are ways to minimize the impact of these, for example ordering your volumes by creation time and returning a token with the time of the last volume returned. That way you won't return duplicates on pagination and you will automatically return newer volumes.

A driver that uses integers that represent the number of volumes returned as tokens is a pretty bad implementation, as it is subject to both returning duplicates (depending on the order used to list the volumes) and missing volumes.

With a big enough number of volumes and enough create/delete operations you could make many consecutive complete paginated list requests and still not have seen some of the existing volumes in all the complete lists you have obtained. But this is a plugin problem, since the spec allows it, and the specific plugin will get the bug if that happens.

Akrog added a commit to Akrog/ember-csi that referenced this pull request Jul 16, 2019
There is a newer version of the csi-sanity test suite that comes with
additional tests.  This patch updates the binary in the repository to
the latest version.

Upstream version of csi-sanity is making incorrect assumptions on how
pagination should work [1], so we cannot use that version until the fix
[2] has been merged.  We use our own compiled version instead.

[1]: kubernetes-csi/csi-test#204
[2]: kubernetes-csi/csi-test#205
Akrog added a commit to Akrog/ember-csi that referenced this pull request Jul 16, 2019
There is a newer version of the csi-sanity test suite that comes with
additional tests.  This patch updates the binary in the repository to
the latest version.

Upstream version of csi-sanity is making incorrect assumptions on how
pagination should work [1], so we cannot use that version until the fix
[2] has been merged.  We use our own compiled version instead.

[1]: kubernetes-csi/csi-test#204
[2]: kubernetes-csi/csi-test#205
Akrog added a commit to Akrog/ember-csi that referenced this pull request Jul 16, 2019
There is a newer version of the csi-sanity test suite that comes with
additional tests.  This patch updates the binary in the repository to
the latest version.

Upstream version of csi-sanity is making incorrect assumptions on how
pagination should work [1], so we cannot use that version until the fix
[2] has been merged.  We use our own compiled version instead.

[1]: kubernetes-csi/csi-test#204
[2]: kubernetes-csi/csi-test#205
Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 16, 2019
@pohly
Copy link
Contributor

pohly commented Jul 16, 2019

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Akrog, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2019
@k8s-ci-robot k8s-ci-robot merged commit 42a3511 into kubernetes-csi:master Jul 16, 2019
Akrog added a commit to Akrog/ember-csi that referenced this pull request Jul 17, 2019
There is a newer version of the csi-sanity test suite that comes with
additional tests.  This patch updates the binary in the repository to
the latest version.

Latest tagged release of csi-sanity is making incorrect assumptions on
how pagination should work [1], so we cannot use that version until the
fix [2] has been released.  We use our own compiled version instead.

[1]: kubernetes-csi/csi-test#204
[2]: kubernetes-csi/csi-test#205
suneeth51 pushed a commit to suneeth51/csi-test that referenced this pull request Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

csi-sanity incorrectly testing vol list tokens

4 participants