Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS,Core: Add S3 REST Signer client + REST Spec #6169

Merged
merged 4 commits into from
Feb 3, 2023

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Nov 10, 2022

This introduces an S3 REST signer client and defines a REST spec (s3-signer-open-api.yml) for a server implementation. Below is a high-level overview of the introduced changes:

  • the main logic and functionality resides in the S3V4RestSignerClient class
    • it uses the same credential/token exchange flow as we have in RESTSessionCatalog and also uses the same token refresh mechanism. In order to achieve that, a few refactorings have been done in RESTSessionCatalog / OAuth2Util.
    • the default endpoint the signer connects to is v1/aws/s3/sign but can be customized.
    • The server decides which headers to sign and can indicate to the S3V4RestSignerClient whether a response with signed headers can be cached by sending a Cache-Control: private header
  • AwsProperties introduce s3.signer.class that allows to dynamically load an S3 Signer implementation and apply it when creating an S3 client. This can be any Signer class that implements software.amazon.awssdk.core.signer.Signer.
  • S3SignRequest and S3SignResponse classes define how the request and response looks like
  • an S3ObjectMapper class has been introduced that is similar to RESTObjectMapper but only contains what's necessary for the S3 REST signer, which are the request/response classes with OAuth-related classes and error handling.
  • Testing is done by using MinioContainer (TestContainers + MinIO) in TestS3RestSigner
    • The S3SignerServlet defines the minimum amount of work that a server-side implementation might have. It is by no means complete and only serves the purpose of testing

@nastra nastra changed the title AWS,Core: Add S3 Signer client + REST Spec AWS,Core: Add S3 REST Signer client + REST Spec Nov 10, 2022
@nastra
Copy link
Contributor Author

nastra commented Nov 10, 2022

@nastra nastra force-pushed the s3-signer branch 2 times, most recently from 07dc9f2 to b2f7900 Compare November 11, 2022 07:46
@jackye1995
Copy link
Contributor

1 general question regarding this PR, @nastra @rdblue @danielcweeks this is a feature very specific to AWS S3. What is the general guideline in the community for adding this as a part of the OpenAPI spec? In my mind the spec should be something generic for Iceberg, and not related to any specific cloud provider. Although many cloud storage providers have built S3 compatible API, but mostly just do it to some extent to support basic operations like get and put object. I would imaging signing with sigv4 is still a very AWS-specific thing, but maybe I am wrong about that.

Overall, have we thought about making this spec more generic, or is that not possible to achieve without being cloud provider specific?

@nastra nastra closed this Nov 15, 2022
@nastra nastra reopened this Nov 15, 2022
@nastra nastra closed this Nov 15, 2022
@nastra nastra reopened this Nov 15, 2022
@danielcweeks
Copy link
Contributor

1 general question regarding this PR, @nastra @rdblue @danielcweeks this is a feature very specific to AWS S3. What is the general guideline in the community for adding this as a part of the OpenAPI spec? In my mind the spec should be something generic for Iceberg, and not related to any specific cloud provider. Although many cloud storage providers have built S3 compatible API, but mostly just do it to some extent to support basic operations like get and put object. I would imaging signing with sigv4 is still a very AWS-specific thing, but maybe I am wrong about that.

Overall, have we thought about making this spec more generic, or is that not possible to achieve without being cloud provider specific?

@jackye1995 I think the purpose behind the open-api doc is just to be clear about what the expectations are from a call/response/error perspective. While it has some overlap with the other open-api doc, they are separate entities.

This signer is specific to AWS and I think that's fine since the signer implementation is in the aws module. It may make more sense to put the open-api doc in the aws module since it really only applies there (@nastra was going to look into that option).

As a general pattern, all cloud providers sign requests in some form and we may see similar implementations using different protocols in the future, but it may also be that this can be generalized in the future.

@nastra
Copy link
Contributor Author

nastra commented Nov 22, 2022

I have moved the S3 Signer REST Spec to the iceberg-aws module.

}
}

@SuppressWarnings("checkstyle:NestedTryDepth")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using more standard loading would avoid this problem so we don't need to suppress it.

Copy link
Contributor

@danielcweeks danielcweeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. There are some very minor things (like how we set kebab case is deprecated), but other than that, this looks good.

Comment on lines +69 to +70
private static final Cache<Key, SignedComponent> SIGNED_COMPONENT_CACHE =
Caffeine.newBuilder().expireAfterWrite(30, TimeUnit.SECONDS).maximumSize(100).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is it worth making the cache expiration and size configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now we'd probably go as is and follow up and make it configurable if we see that it helps with certain use cases.

This introduces an S3 REST signer client and defines a REST spec (`s3-signer-open-api.yml`) for a server implementation. Below is a high-level overview of the introduced changes:

* the main logic and functionality resides in the `S3V4RestSignerClient` class
  * it uses the same **credential/token** exchange flow as we have in `RESTSessionCatalog` and also uses the same token refresh mechanism. In order to achieve that, a few refactorings have been done in `RESTSessionCatalog` / `OAuth2Util`.
  * the default endpoint the signer connects to is `v1/aws/s3/sign` but can be customized.
  * The server decides which headers to sign and can indicate to the `S3V4RestSignerClient` whether a response with signed headers can be cached by sending a `Cache-Control: private` header
* `AwsProperties` introduce `s3.signer.class` that allows to dynamically load an S3 Signer implementation and apply it when creating an S3 client. This can be any Signer class that implements `software.amazon.awssdk.core.signer.Signer`.
* `S3SignRequest` and `S3SignResponse` classes define how the request and response looks like
* an `S3ObjectMapper` class has been introduced that is similar to `RESTObjectMapper` but only contains what's necessary for the S3 REST signer, which are the request/response classes with OAuth-related classes and error handling.
* Testing is done by using `MinioContainer` (`TestContainers` + `MinIO`) in `TestS3RestSigner`
  * The `S3SignerServlet` defines the minimum amount of work that a server-side implementation might have. It is by no means complete and only serves the purpose of testing
Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes, looks good to me!

@nastra nastra closed this Feb 1, 2023
@nastra nastra reopened this Feb 1, 2023
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me @nastra!

@danielcweeks danielcweeks merged commit 8076672 into apache:master Feb 3, 2023
@danielcweeks
Copy link
Contributor

Thanks @nastra !!

krvikash pushed a commit to krvikash/iceberg that referenced this pull request Mar 16, 2023
* AWS,Core: Add S3 Signer client + REST Spec

This introduces an S3 REST signer client and defines a REST spec (`s3-signer-open-api.yml`) for a server implementation. Below is a high-level overview of the introduced changes:

* the main logic and functionality resides in the `S3V4RestSignerClient` class
  * it uses the same **credential/token** exchange flow as we have in `RESTSessionCatalog` and also uses the same token refresh mechanism. In order to achieve that, a few refactorings have been done in `RESTSessionCatalog` / `OAuth2Util`.
  * the default endpoint the signer connects to is `v1/aws/s3/sign` but can be customized.
  * The server decides which headers to sign and can indicate to the `S3V4RestSignerClient` whether a response with signed headers can be cached by sending a `Cache-Control: private` header
* `AwsProperties` introduce `s3.signer.class` that allows to dynamically load an S3 Signer implementation and apply it when creating an S3 client. This can be any Signer class that implements `software.amazon.awssdk.core.signer.Signer`.
* `S3SignRequest` and `S3SignResponse` classes define how the request and response looks like
* an `S3ObjectMapper` class has been introduced that is similar to `RESTObjectMapper` but only contains what's necessary for the S3 REST signer, which are the request/response classes with OAuth-related classes and error handling.
* Testing is done by using `MinioContainer` (`TestContainers` + `MinIO`) in `TestS3RestSigner`
  * The `S3SignerServlet` defines the minimum amount of work that a server-side implementation might have. It is by no means complete and only serves the purpose of testing

* make http client static

* review feedback

* change dynamic loading of signer
@nastra nastra deleted the s3-signer branch June 1, 2023 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants