Skip to content

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Jul 19, 2024

No description provided.

@nastra nastra marked this pull request as draft July 19, 2024 11:05
@nastra nastra force-pushed the openapi-credentials branch from c00f5ac to 8284ee3 Compare July 19, 2024 11:07
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, hope you don't mind since this is still a draft.

Also, the PR by #10576 contains valuable information around GCS/ADLS properties.

@nastra nastra force-pushed the openapi-credentials branch 3 times, most recently from 34d0b02 to 023e825 Compare July 22, 2024 09:50
@nastra nastra force-pushed the openapi-credentials branch from 023e825 to d8c748f Compare September 11, 2024 14:28
type: object
allOf:
- $ref: '#/components/schemas/Credentials'
required:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the region? Without it, clients might run into issues. Here’s why it would be helpful:

  1. Setting a default region on the client side doesn’t work well since clients could be accessing s3 files from different regions.
  2. Clients could try to figure out the region using get-bucket-location, but this means an extra API call. Plus, if the vended credentials don’t allow this call, it could cause more problems.

Adding the region would simplify things and enhance overall client reliability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for including region.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the other comment below - the region should be the same as the one for the S3FileIO configuration?

Copy link
Contributor Author

@nastra nastra Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flyrain the region is already sent back in the config of LoadTableResult/ LoadViewResult itself, so I'm not sure why we would want to make it part of S3Credentials? The region is for configuring the S3 client but the credentials are for configuring S3FileIO`, so it's a different layer where the region is being used

Copy link
Contributor

@danielcweeks danielcweeks Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the region is bound to the credential. I believe it's possible to build a policy that allows access to buckets in multiple regions, so region should be associated with the bucket, not the credentials. Therefore, we shouldn't include it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackye1995 it's not published yet because I wanted to get this proposal in first and otherwise it would deviate the discussion.
Here's a quick summary of how refreshing vended credentials would look like:

Refreshing vended credentials would entail adding a new /v1/{prefix}/namespaces/{namespace}/tables/{table}/credentials endpoint that returns a LoadCredentialsResponse, which carries

storage-credentials:
  type: array
  items:
    $ref: '#/components/schemas/Credential'

This new endpoint would be called by the mechanism that is offered by the respective storage provider. For example, GCS offers a OAuth2CredentialsWithRefresh which then effectively would call this new endpoint to refresh a vended credential.
Achieving the same thing for S3 would look different but effectively comes down to calling this credential endpoint too.

Copy link
Contributor

@jackye1995 jackye1995 Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, in that case I think this is a great step, it does make a lot of sense to have dedicated credentials refresh API, we were also thinking about proposing it later. We have implemented credentials vending in LakeFormation for a long time now, and last year S3 also offered access grant with similar feature. The user pattern has been proven that having a dedicated API for that makes a lot of sense since refreshing credentials is a much more frequent operation than loading table and can be done very quickly with a dedicated API, and from security perspective it is also a better posture. In that case I think what is proposed here makes sense, and I am more in favor of the current design than what Yufei suggests, because it makes it clear that these fields are related to that new API, and existing config map continues to work for configuring the FileIO, with the additional note that the new credentials field would override the credentials config in the config map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra, thanks for the context. The suggest I provide is a minor structure change, which isn't needed in the new endpoint. I am OK with the current one if that's more align with the new endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flyrain yes the structure that is being proposed here is fully aligned with how credentials will be returned by the new credentials endpoint. I understand your reasoning for adding one additional structure on top, but I think we should only do that if we have a concrete use case that this would address (which I don't think we currently have).
Thanks again for your review @flyrain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on recent discussions the feedback was that we don't want to have anything storage-specific in the OpenAPI spec (other than documenting the different storage configurations, which is handled by #10576). Therefore I've updated the PR and made it flexible enough so that we could still pass different credentials based on a given prefix. That should hopefully work for everyone.

@nastra nastra marked this pull request as ready for review September 12, 2024 06:54
@nastra nastra force-pushed the openapi-credentials branch from d8c748f to 4bc01bb Compare September 12, 2024 09:14
@nastra nastra force-pushed the openapi-credentials branch from 4bc01bb to eff9e54 Compare September 12, 2024 13:13
@snazy
Copy link
Member

snazy commented Sep 16, 2024

@nastra can you post the proposal document on the dev-ML?

NVM - just found it

@nastra nastra linked an issue Sep 25, 2024 that may be closed by this pull request
6 tasks
Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM - just left a comment to add format: password

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good to me! Just a nit comment

@nastra nastra force-pushed the openapi-credentials branch from 6e5079b to 8e743a5 Compare October 11, 2024 06:39
specific prefix if several credentials of the same type are available.
config:
type: object
additionalProperties:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand there was a general agreement to treat vendor-specific credentials as simple property bags. However, would it make sense to define some common attributes (like expiry time) in this spec? I suppose those would be relevant to the follow-up effort of refreshing credentials on the fly. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that there could be some common attributes that we could extract, but that would lead to a mixture of having to look up things in a config map vs other properties.
Also what if storage providers have different semantics for how expiration is defined (expires-at-ms vs expires-in-ms vs ....). This would require additional calculations to convert from one expiration format to the other. Therefore I think it's better to keep everything in the config for now until we have a clear use case where it makes sense to extract this into a common attribute.

This isn't set in stone and we can always revisit this once we implement refreshing vended credentials for the different providers and see how helpful having a separate field for expiration would be. I'd like to unblock this proposal so that we can make progress on refreshing vended credentials

## Storage Credentials
Credentials for ADLS / GCS / S3 / ... are provided through the `storage-credentials` field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Servers have to provide concrete properties for vendor-specific credentials. Since this spec does not define them, how will servers know what properties to provide?

If servesr adapt to the java REST client impl. then the java client will become the "spec"... I'd prefer for Iceberg to provide additional guidelines (which do not have to be part of the REST Catalog spec) to clarify credential properties for known use cases. It might be necessary for this spec to define a sub-type property since some vendors support multiple auth methods for the same URI scheme.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation of the respective FileIO properties is handled outside this PR and is in the scope of #10576. We'll be updating/rebasing #10576 once this PR here is in and that should make it clearer for servers which properties are available for FileIO implementations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimas-b does the documentation of the storage related properties address your comment about "additional guidelines"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10576 appears to define credential properties in the REST spec, which is fine from my POV :) I assume after rebasing it will be talking about credential properties under the new config section introduced in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly, we're trying to move the docs to this new credential section

@nastra nastra force-pushed the openapi-credentials branch from 1ca96e5 to 2d5820a Compare October 17, 2024 15:22
@nastra nastra merged commit 44233fa into apache:main Oct 18, 2024
@nastra nastra deleted the openapi-credentials branch October 18, 2024 17:24
type: object
additionalProperties:
type: string
storage-credentials:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why should return multiple credentials for one table? Any use cases? The current TableOps just support one fileIO and also noticed the AWS and GCS credential refresh handler just pick the first credential from the server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case for this is outlined in #10722 (comment).

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

REST: Standardize vended credentials used in loadTable / loadView responses