-
Notifications
You must be signed in to change notification settings - Fork 3k
OpenAPI: Standardize credentials in loadTable/loadView responses #10722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c00f5ac to
8284ee3
Compare
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, hope you don't mind since this is still a draft.
Also, the PR by #10576 contains valuable information around GCS/ADLS properties.
34d0b02 to
023e825
Compare
023e825 to
d8c748f
Compare
open-api/rest-catalog-open-api.yaml
Outdated
| type: object | ||
| allOf: | ||
| - $ref: '#/components/schemas/Credentials' | ||
| required: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add the region? Without it, clients might run into issues. Here’s why it would be helpful:
- Setting a default region on the client side doesn’t work well since clients could be accessing s3 files from different regions.
- Clients could try to figure out the region using
get-bucket-location, but this means an extra API call. Plus, if the vended credentials don’t allow this call, it could cause more problems.
Adding the region would simplify things and enhance overall client reliability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for including region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the other comment below - the region should be the same as the one for the S3FileIO configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flyrain the region is already sent back in the config of LoadTableResult/ LoadViewResult itself, so I'm not sure why we would want to make it part of S3Credentials? The region is for configuring the S3 client but the credentials are for configuring S3FileIO`, so it's a different layer where the region is being used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the region is bound to the credential. I believe it's possible to build a policy that allows access to buckets in multiple regions, so region should be associated with the bucket, not the credentials. Therefore, we shouldn't include it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackye1995 it's not published yet because I wanted to get this proposal in first and otherwise it would deviate the discussion.
Here's a quick summary of how refreshing vended credentials would look like:
Refreshing vended credentials would entail adding a new /v1/{prefix}/namespaces/{namespace}/tables/{table}/credentials endpoint that returns a LoadCredentialsResponse, which carries
storage-credentials:
type: array
items:
$ref: '#/components/schemas/Credential'
This new endpoint would be called by the mechanism that is offered by the respective storage provider. For example, GCS offers a OAuth2CredentialsWithRefresh which then effectively would call this new endpoint to refresh a vended credential.
Achieving the same thing for S3 would look different but effectively comes down to calling this credential endpoint too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, in that case I think this is a great step, it does make a lot of sense to have dedicated credentials refresh API, we were also thinking about proposing it later. We have implemented credentials vending in LakeFormation for a long time now, and last year S3 also offered access grant with similar feature. The user pattern has been proven that having a dedicated API for that makes a lot of sense since refreshing credentials is a much more frequent operation than loading table and can be done very quickly with a dedicated API, and from security perspective it is also a better posture. In that case I think what is proposed here makes sense, and I am more in favor of the current design than what Yufei suggests, because it makes it clear that these fields are related to that new API, and existing config map continues to work for configuring the FileIO, with the additional note that the new credentials field would override the credentials config in the config map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra, thanks for the context. The suggest I provide is a minor structure change, which isn't needed in the new endpoint. I am OK with the current one if that's more align with the new endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flyrain yes the structure that is being proposed here is fully aligned with how credentials will be returned by the new credentials endpoint. I understand your reasoning for adding one additional structure on top, but I think we should only do that if we have a concrete use case that this would address (which I don't think we currently have).
Thanks again for your review @flyrain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on recent discussions the feedback was that we don't want to have anything storage-specific in the OpenAPI spec (other than documenting the different storage configurations, which is handled by #10576). Therefore I've updated the PR and made it flexible enough so that we could still pass different credentials based on a given prefix. That should hopefully work for everyone.
d8c748f to
4bc01bb
Compare
4bc01bb to
eff9e54
Compare
|
NVM - just found it |
f1005bc to
785f6c1
Compare
snazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM - just left a comment to add format: password
a7048de to
6e5079b
Compare
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks good to me! Just a nit comment
6e5079b to
8e743a5
Compare
| specific prefix if several credentials of the same type are available. | ||
| config: | ||
| type: object | ||
| additionalProperties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand there was a general agreement to treat vendor-specific credentials as simple property bags. However, would it make sense to define some common attributes (like expiry time) in this spec? I suppose those would be relevant to the follow-up effort of refreshing credentials on the fly. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that there could be some common attributes that we could extract, but that would lead to a mixture of having to look up things in a config map vs other properties.
Also what if storage providers have different semantics for how expiration is defined (expires-at-ms vs expires-in-ms vs ....). This would require additional calculations to convert from one expiration format to the other. Therefore I think it's better to keep everything in the config for now until we have a clear use case where it makes sense to extract this into a common attribute.
This isn't set in stone and we can always revisit this once we implement refreshing vended credentials for the different providers and see how helpful having a separate field for expiration would be. I'd like to unblock this proposal so that we can make progress on refreshing vended credentials
| ## Storage Credentials | ||
| Credentials for ADLS / GCS / S3 / ... are provided through the `storage-credentials` field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Servers have to provide concrete properties for vendor-specific credentials. Since this spec does not define them, how will servers know what properties to provide?
If servesr adapt to the java REST client impl. then the java client will become the "spec"... I'd prefer for Iceberg to provide additional guidelines (which do not have to be part of the REST Catalog spec) to clarify credential properties for known use cases. It might be necessary for this spec to define a sub-type property since some vendors support multiple auth methods for the same URI scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimas-b does the documentation of the storage related properties address your comment about "additional guidelines"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#10576 appears to define credential properties in the REST spec, which is fine from my POV :) I assume after rebasing it will be talking about credential properties under the new config section introduced in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly, we're trying to move the docs to this new credential section
8e743a5 to
1ca96e5
Compare
1ca96e5 to
2d5820a
Compare
| type: object | ||
| additionalProperties: | ||
| type: string | ||
| storage-credentials: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why should return multiple credentials for one table? Any use cases? The current TableOps just support one fileIO and also noticed the AWS and GCS credential refresh handler just pick the first credential from the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case for this is outlined in #10722 (comment).
No description provided.