Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate all the hostpath driver specs into one pod #192

Closed
msau42 opened this issue Jul 30, 2020 · 20 comments · Fixed by #282
Closed

Consolidate all the hostpath driver specs into one pod #192

msau42 opened this issue Jul 30, 2020 · 20 comments · Fixed by #282
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@msau42
Copy link
Collaborator

msau42 commented Jul 30, 2020

I think we kept attacher separate so that we can easily test with or without it, but I don't see a reason why provisioner, resizer, snapshotter can't all be in the same Pod as the driver. Actually I think our attach required tests are using mock driver, not hostpath driver, so I think it should be safe to bundle attacher in the same pod as well.

@msau42
Copy link
Collaborator Author

msau42 commented Jul 30, 2020

/help
/good-first-issue

@k8s-ci-robot
Copy link
Contributor

@msau42:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/help
/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jul 30, 2020
@pohly
Copy link
Contributor

pohly commented Aug 2, 2020

I don't see a reason why provisioner, resizer, snapshotter can't all be in the same Pod as the driver

One reason for separate pods is that we test the RBAC rules for each sidecar separately. For example, if resizer and provisioner both need access to the same resource, but only the RBAC rules for provisioner list that, then resizer will work fine when deployed in the same pod as provisioner, but will fail when deployed separately.

@msau42
Copy link
Collaborator Author

msau42 commented Aug 3, 2020

Hm that's a good point. I guess it depends on how much we want to balance testing for rbac breakages vs providing a best practices example to the community. I think having all the individual pods:

  • is very confusing to someone new to CSI
  • isn't something that we recommend in a real production CSI driver
  • doesn't follow security best practices since all the controller pieces need to use hostpath to share the socket
  • requires each driver to consume a Pod resource per sidecar, which limits the amount of parallel testing we can do in k/k since there's a maximum 110 pods per node, and also could land us in strange scenarios where the driver is only partially able to be deployed on a node.

@pohly
Copy link
Contributor

pohly commented Aug 4, 2020

IMHO the main purpose of the csi-hostpath-driver is to facilitate testing. Providing a good example of how to write and deploy a CSI driver is secondary.

is very confusing to someone new to CSI
isn't something that we recommend in a real production CSI driver
doesn't follow security best practices since all the controller pieces need to use hostpath to share the socket

This could be addressed by adding comments and explanations why it is done this way here and what should be done instead. Perhaps link to a "real" CSI driver that does it properly?

requires each driver to consume a Pod resource per sidecar, which limits the amount of parallel testing we can do in k/k since there's a maximum 110 pods per node, and also could land us in strange scenarios where the driver is only partially able to be deployed on a node.

This is more relevant because it affects testing. But is that really an issue in practice? Have there been test failures because of a partially deployed driver?

@pohly
Copy link
Contributor

pohly commented Aug 4, 2020

I was wondering how important RBAC testing really is: if every CSI driver uses all sidecars and thus all RBAC rules, then buggy individual RBAC rules don't matter as long as we also test with all of them combined. But there are valid reasons for CSI driver developers to not use certain sidecars (skip attach -> no external-attacher, no resize support -> no external-resizer), so I think it is important to test the RBAC rule set for each sidecar in isolation.

@msau42
Copy link
Collaborator Author

msau42 commented Aug 4, 2020

Are there other ways we can check for rbac updates, like doing a file diff? Or test via mock driver, which is definitely a testing-only driver? We don't have good coverage with csi hostpath already. For example, the primary attacher path is completely untested and our tests did not catch any of the rbac changes needed with the latest attacher.

This is more relevant because it affects testing. But is that really an issue in practice? Have there been test failures because of a partially deployed driver?

I haven't seen specifically a test failure because of this, however, I have seen symptoms of e2e jobs hitting the max pods per node limits, such as kubernetes/kubernetes#87855 (comment). I think being able to reduce every hostpath and mock test case to only deploy 1 pod instead of 5 would help. We have 80+ test cases running in k/k so that's a 300+ pod count reduction

@pohly
Copy link
Contributor

pohly commented Aug 5, 2020

Are there other ways we can check for rbac updates, like doing a file diff?

The problem is that the a missing RBAC rule only shows up when the sidecar actually tries to do some operation that is forbidden. Code reviews may be able to find new operations, but I suspect that it's very easy to miss - it has slipped through in the past.

Or test via mock driver, which is definitely a testing-only driver? We don't have good coverage with csi hostpath already.

I suspect we cover even less code paths with the mock driver compared to the hostpath driver.

For example, the primary attacher path is completely untested and our tests did not catch any of the rbac changes needed with the latest attacher.

But is that observation a reason to extend RBAC testing (i.e. implement attach in the host path driver) or reduce it (i.e. merge into a pod and try to do something else for RBAC)? I really don't know.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 3, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@msau42
Copy link
Collaborator Author

msau42 commented Apr 30, 2021

/reopen
/lifecycle-frozen

@k8s-ci-robot
Copy link
Contributor

@msau42: Reopened this issue.

In response to this:

/reopen
/lifecycle-frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Apr 30, 2021
@msau42
Copy link
Collaborator Author

msau42 commented Apr 30, 2021

I think we should seriously consider having an option to deploy all the sidecars into one Pod. k/k kind testing is very flaky and I think being able to reduce load from our tests by a few 100 pods would definitely improve the situation.

@pohly
Copy link
Contributor

pohly commented Apr 30, 2021

Sounds reasonable. But then we should still have one alternative deployment with one pod per sidecar and some minimal testing to cover the correctness of our RBAC rules.

@msau42
Copy link
Collaborator Author

msau42 commented Apr 30, 2021

Agree, unless we can think of another way to test rbacs, we'll want both methods. For our kubernetes-csi testing, we can still use the current method, but for k/k, we can use the consolidated way.

@pohly
Copy link
Contributor

pohly commented May 1, 2021

For our kubernetes-csi testing, we can still use the current method, but for k/k, we can use the consolidated way.

Agreed, that should give us good coverage of both approaches. I was struggling a bit with identifying "minimal testing" because it's not immediately obvious which tests go through all code paths that depend on RBAC - running all of them avoids having to make that choice.

@pohly
Copy link
Contributor

pohly commented May 3, 2021

/assign

@pohly
Copy link
Contributor

pohly commented May 3, 2021

I think we should have additional jobs for the csi-driver-host-path repo to cover all deployment flavors. If we only test the deployments with separate pods, then it could happen that we accidentally break the deployments meant for testing in Kubernetes and only notice when trying to use a new release in Kubernetes.

TerryHowe pushed a commit to TerryHowe/csi-driver-host-path that referenced this issue Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants