Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object bucket provisioning #2100

Merged
merged 4 commits into from
Oct 20, 2020
Merged

Conversation

wlan0
Copy link
Member

@wlan0 wlan0 commented Oct 16, 2020

KEP for bucket provisioning API. Creating a new PR since #1383 hits the ratelimit too often

@jeffvance @brahmaroutu @rrati @krishchow @sajanjswl @xing-yang @saad-ali @guymguym @alarge

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 16, 2020
@wlan0
Copy link
Member Author

wlan0 commented Oct 16, 2020

/assign @xing-yang

@fejta-bot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/check-cla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 16, 2020
Copy link

@alarge alarge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments might be handled with some doc cleanup, but I'd really like to understand the BAR/BA "delete on pod termination" rationale.

spec:
protocol:
name: [3]
version: [4]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's been some discussion about this, but I'm generally uncomfortable about pulling version out separately. Among other things, it implies you are going to do something with this information (like assume backward compatibility or some other semantic-version rule). But we don't own the versioning scheme, so there really isn't anything you can do with it other than clearly declare the requestors expectation and the driver's support. For that, I think building the version into the protocol name makes more sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this sentiment in general. However, this field is filled by COSI in some cases and we think it is better to have COSI fill an unfilled field rather than update an existing one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify the prototype and avoid the controversy of what binding logic will look like for now, I'm fine with removing version from prototype from all objects. For alpha/beta, we can discuss and reintroduce.

1. `finalizers`: added by COSI to defer `BucketRequest` deletion until backend deletion succeeds.
1. `protocol.name`: (required) specifies the desired protocol. One of {“s3”, “gs”, or “azureBlob”}.
1. `protocol.version`: (optional) specifies the desired version of the `protocol`. For "s3", a value of "v2" or "v4" could be used.
1. `bucketPrefix`: (optional) prefix prepended to a generated new bucket name, eg. “yosemite-photos-". If `bucketInstanceName` is supplied then `bucketPrefix` is ignored because the request is for access to an existing bucket. If `bucketPrexix` is specified then it is added to the `Bucket` instance as a label.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is bucketPrefix optional for greenfield? I think this section could be made a little easier to understand by clearly calling out when a parameter only applies to either greenfield or brownfield.

1. `protocol.version`: (optional) specifies the desired version of the `protocol`. For "s3", a value of "v2" or "v4" could be used.
1. `bucketPrefix`: (optional) prefix prepended to a generated new bucket name, eg. “yosemite-photos-". If `bucketInstanceName` is supplied then `bucketPrefix` is ignored because the request is for access to an existing bucket. If `bucketPrexix` is specified then it is added to the `Bucket` instance as a label.
1. `bucketClassName`: (optional for greenfield, omitted for brownfield) name of the `BucketClass` used to provision this request. If omitted for a greenfield bucket request, a default bucket class matching the protocol, if available, is used. If the greenfield bucket class is missing or does not support the requested protocol, an error is logged and the request is retried (with exponential backoff). A `BucketClass` is necessary for greenfield requests since BCs support a list of allowed namespaces. A `BucketClass` is not needed for brownfield requests since the `Bucket` instance, created by the admin, also contains `allowedNamespaces`.
1. `bucketInstanceName`: (optional) name of the cluster-wide `Bucket` instance. If blank, then COSI assumes this is a greenfield request and will fill in the name during the binding step. If set by the user, then this names the `Bucket` instance created by the admin.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is bucketInstanceName required for brownfield?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. I'll fix the doc

spec:
provisioner: [4]
retentionPolicy: [5]
anonymousAccessMode: [6]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear on why this is relevant here. Who will be looking at this field?

But this opens up a broader question - I assume this field in the class is a direction to the driver/backend to add anonymous access to the bucket. But we're not providing a mechanism to access a bucket anonymously, right? We should be clear about that - that the anonymous access is not access from within k8s - it is "out of band" access. Would clear up the confusion around this for me.

Copy link
Member Author

@wlan0 wlan0 Oct 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do NOT provide a mechanism to anonymously consume the bucket. We only provide a mechanism to provision with anonymous access. I will update the doc.

- "publicReadOnly": Read only, uncredentialed users can call ListBucket and GetObject.
- "publicWriteOnly": Write only, uncredentialed users can only call PutObject.
- "publicReadWrite": Read/Write, same as `ro` with the addition of PutObject being allowed.
1. `bucketClassName`: Name of the associated bucket class.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is either supplied by the controller (when greenfield) or ignored (brownfield), right? so shouldn't this be "optional"?


#### BucketClass

An immutable, cluster-scoped, custom resource to provide admins control over the handling of bucket provisioning. The `BucketClass` (BC) defines a retention policy, driver specific parameters, and the provisioner name. A list of allowed namespaces can be specified to restrict new bucket creation and access to existing buckets. A default bucket class can be defined for each supported protocol. This allows the bucket class to be omitted from a `BucketRequest`. Relevant `BucketClass` fields are copied to the `Bucket` instance to handle the case of the BC being deleted or re-created. If an object store supports more than one protocol then the admin should create a `BucketClass` per protocol.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the class really immutable? Specifically, are changes to "isDefaultBucketClass" disallowed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is immutable except for "isDefaultBucketClass". I'll update doc

1. `bucketInstanceName`: name of the `Bucket` instance bound to this BA.
1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
1. `serviceAccount`: an `ObjectReference` containing the name, namespace and UID of the associated `BAR.serviceAccountName`. If empty then integrated Kubernetes -> cloud identity is not being used, in which case, `BucketAccess.principal` contains the user identity, which is minted by the provisioner.
1. `mintedSecretName`: name of the provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace, is read by the cosi-node-adapter, and written to the secret mount defined in the app pod's csi-driver spec.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional? only if serviceAccount is empty/null?

{“Effect”:“Allow”,“Action”:“s3:GetObject”,“Resource”:“arn:aws:s3:::profilepics/*“},
{“Effect”:“Deny”,“Action”:“s3:*“,”NotResource”:“arn:aws:s3:::profilepics/*“}
```
1. `principal`: username/access-key for the object storage provider to uniquely identify the user who has access to this bucket. This value is minted by the provisioner (and set in the BA) when the `BucketAccessRequest` omits a ServiceAccount. There several use cases involving `principal`: 1) static/driverless brownfield where the admin creates the BA and fills in principal, so here principal is minted by the admin. 2) brownfield where COSI creates the `BucketAccess` and fills in `principal` based on the driver's minted principal. 3) the `BucketAccessRequest` specifies a ServiceAccount which is linked by the backend to a user id, and `principal` is empty.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me many reads of this before I think I understand it. Let me give it a shot and see if I've got it right:

a. Not relevant if serviceAccount is supplied.

b. For brownfield cases when there is already a storage-side principal that you want to use for access, the admin can supply this, the driver will (possibly) enable access for that principal to the bucket in question and will return credentials.

c. For greenfield cases, this could be populated by the driver. Maybe so that the driver can request deletion of this principal during deprovisioning?

At any rate, some crisper explanation would be very useful here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is accurate. I agree, clearer explanation is required here.

csi: [3]
driver: cosi.sigs.k8s.io [4]
volumeAttributes: [5]
bucketAccessRequestName: <my-bar-name>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any information on what is actually written to the volume. For example:

  • I would expect that we always need endpoint information written somewhere

  • I would expect credential information to be optional, depending on whether or not identity mapping is being used

  • I would expect that there is a well-defined location and formation for both of the above information, but that it is probably protocol-specific. It would be worth being at least partially explicit about this. Also, I've heard some mumbling about mapping this stuff to environment variables (I can't see how that would work given the possibility of having multiple of these volumes in a Pod), but if that's still the plan, it should be clear on how it is accomplished.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endpoint and credential information will be passed in via a file, whose format is determined by the protocol. The provisioner is expected to fill the contents of the file appropriately with both endpoint and credential information. This is then mounted as a file into the pod.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is determined by the protocol, wouldn't it make sense to document file names/locations for each protocol?

Also, I assume that environment variable mapping is not in scope, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 KEP should detail what files are surfaced where for which protocols.


When the app pod is started, the kubelet will gRPC call _NodeStageVolume_ and _NodePublishVolume_, which are received by the cosi-node-adapter. The pod waits until the adapter responds to the gRPC requests. The adapter ensures that the target bucket has been provisioned and is ready to be accessed.

When the app pod terminates, the kubelet gRPC calls _NodeUnstageVolume_ and _NodeUnpublishVolume_, which the cosi-node-adapter also receives. The adapter orchestrates tear-down of bucket access resources, but does not delete the `BucketRequest` nor `Bucket` instances.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels SUPER janky to me. Based on this line (and a later comment), it sounds like you are saying that when a pod terminates, you delete a Kubernetes resource that was separately created by the user. The equivalent would be to delete a PVC when a Deployment is deleted. Maybe that solves some nasty edge case and is warranted, but I think it warrants a pretty well articulated rationale.

Copy link
Member Author

@wlan0 wlan0 Oct 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error in the doc. You're right, the termination of the pod does not delete BA* resources.

  • When the pod terminates on its own, we do not delete any BA* resources
  • If BAR is deleted, then we wait for pod to terminate before clearing BA* resources
  • If BA is deleted, then we forcefully pull the creds from underneath the pod

Copy link

@alarge alarge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addresses my issues, thanks!

Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of open questions, but those are mostly around the details. I believe we've achieved consensus on the overall direction. Approving to unblock prototype. The expectation is that this KEP will be revised based on the prototype, and API review prior to an alpha implementation.

/lgtm
/approve

Adding a hold to let the authors decide when they want to merge.
/hold

spec:
protocol:
name: [3]
version: [4]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify the prototype and avoid the controversy of what binding logic will look like for now, I'm fine with removing version from prototype from all objects. For alpha/beta, we can discuss and reintroduce.

- "publicReadOnly": Read only, uncredentialed users can call ListBucket and GetObject.
- "publicWriteOnly": Write only, uncredentialed users can only call PutObject.
- "publicReadWrite": Read/Write, same as `ro` with the addition of PutObject being allowed.
> Note: `anonymousAccessMode` is intended for workloads to consume the bucket out of band, i.e. COSI does not provide a automated mechanism for a k8s workload to consume the bucket anonymously
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If COSI does not provide a way to consume the bucket anonymously, why does this even need to be a first class field? Why can't it be an opaque BucketClass parameter?

csi: [3]
driver: cosi.sigs.k8s.io [4]
volumeAttributes: [5]
bucketAccessRequestName: <my-bar-name>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 KEP should detail what files are surfaced where for which protocols.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 20, 2020
@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 20, 2020
Copy link

@guymguym guymguym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - great job @wlan0 @jeffvance @krishchow

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: guymguym, saad-ali, wlan0

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wlan0
Copy link
Member Author

wlan0 commented Oct 20, 2020

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 20, 2020
@k8s-ci-robot k8s-ci-robot merged commit 9d6eb1f into kubernetes:master Oct 20, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Oct 20, 2020
- `azureBlob`: data required to target a provisioned azure container.
- `gs`: data required to target a provisioned GS bucket.
- `s3`: data required to target a provisioned S3 bucket.
1. `parameters`: a copy of the BucketClass parameters.
Copy link
Member

@saad-ali saad-ali Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wlan0 @msau42 reminded me that PV/PVC actually handles this in a better way. Instead of copying all parameters over from the Class object (which may include lots of unnecessary information), the CSI CreateVolumeResponse contains a set of volume_context -- basically letting the driver decide what additional important info should be maintained about the volume. We should do something similar here. See https://github.com/container-storage-interface/spec/blob/master/csi.proto#L488

@pacoxu pacoxu mentioned this pull request Jul 25, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants