Skip to content

GCP: Add Google Authentication Support #13212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 19, 2025

Conversation

talatuyarer
Copy link
Contributor

@talatuyarer talatuyarer commented Jun 2, 2025

This PR introduces a dedicated GoogleAuthManager for the Iceberg REST client, enabling robust and streamlined authentication with Google Cloud Platform (GCP) services by delegating authentication to GCP's auth library.

Why add a specific GoogleAuthManager?

Currently, Iceberg's AuthManager and the generic OAuth2Manager provide a good baseline for authentication. However, connecting to services on Google Cloud often uses specific patterns that a generic manager doesn't fully simplify. This new GoogleAuthManager makes life easier for folks using Iceberg REST catalogs behind GCP Auth Service

The existing OAuth2Manager is great for standard OAuth2 flows (like client credentials) but it does not support authorization code workflow.

The GoogleAuthManager introduces authentication methods that are more convenient for GCP users, differing from the generic OAuth2 flow:

  • Application Default Credentials (ADC) Support: Provides out-of-the-box, zero-configuration authentication when the Iceberg client runs within GCP environments (e.g., Compute Engine, GKE, Cloud Functions). This is a significant usability improvement over manually setting up OAuth2 parameters.
  • Service Account Key File Support: Allows users to authenticate by directly providing a path to a GCP service account JSON key file. This is a common and secure way to authenticate applications with GCP services, and is more specific than the generic token or client secret mechanisms.
  • Configurable OAuth Scopes: While the generic OAuth2Manager also supports scopes, GoogleAuthManager defaults to scopes commonly used for GCP services and allows easy customization via the gcp.auth.scopes property.

While our current OAuth2Manager can be used with tokens from service accounts, it doesn't natively understand refresh token workflow and the ADC discovery process or how to directly consume a service account JSON file for its credentials.

This dedicated manager ensures that Iceberg can authenticate with GCP-backed REST catalog services in the most efficient, secure, and user-friendly manner, aligning with common GCP practices. The GoogleAuthManager bridges this gap for a more seamless Iceberg Rest experience on GCP.

@talatuyarer talatuyarer changed the title Initial Version of GCPAuthManager [Auth] Add Google Authentication Support for REST Client Jun 2, 2025
@talatuyarer
Copy link
Contributor Author

talatuyarer commented Jun 2, 2025

Hi @nastra, when you have time Could you review this ?

Copy link
Contributor Author

@talatuyarer talatuyarer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adutra Thank you for reviewing

@stevenzwu stevenzwu added this to the Iceberg 1.10.0 milestone Jun 2, 2025
@talatuyarer talatuyarer requested a review from nastra June 3, 2025 15:25
@@ -62,6 +62,9 @@ public class GCPProperties implements Serializable {
*/
public static final int GCS_DELETE_BATCH_SIZE_DEFAULT = 50;

public static final String GCP_CREDENTIALS_PATH_PROPERTY = "gcp.auth.credentials-path";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently not sure whether these properties should be here vs in GoogleAuthManager. The properties in this class are typically related to the FileIO implementation of GCS, whereas these new properties aren't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these configurations to GoogleAuthManager. May be we should rename GCPProperties.java as GCSProperties.java ?

@talatuyarer talatuyarer requested a review from nastra June 4, 2025 15:10
@talatuyarer talatuyarer requested a review from nastra June 4, 2025 16:55
@talatuyarer talatuyarer changed the title [Auth] Add Google Authentication Support for REST Client [Auth] Add Google Authentication Support for REST Catalog Jun 4, 2025
@nastra nastra changed the title [Auth] Add Google Authentication Support for REST Catalog GCP: Add Google Authentication Support Jun 5, 2025
Copy link
Contributor

@sungwy sungwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adutra @nastra - we are helping review a similar PR in PyIceberg, and I wanted to bring this discussion here first.

With the AuthManager, we now have a pluggable way of introducing authentication mechanisms to Iceberg clients. Given that's the case, do we want to encourage vendors to implement and manager their AuthManager separately from this project, or encourage them to contribute and have those code implementations maintained in the parent repositry?

Dremio seems to have taken the former approach: https://github.com/dremio/iceberg-auth-manager

Whereas there's precedent for us supporting AWS SigV4 authentication on the project itself: https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/RESTSigV4AuthManager.java

Would this warrant a mailing list discussion?

@adutra
Copy link
Contributor

adutra commented Jun 16, 2025

@sungwy thanks for raising the question!

do we want to encourage vendors to implement and manage their AuthManager separately from this project, or encourage them to contribute and have those code implementations maintained in the parent repository?

Once you have a pluggable API, you cannot prevent users from implementing it :-) IOW, there will necessarily be vendor-specific implementations living outside of Iceberg repo eventually.

But any implementation that is generic enough to be useful to most Iceberg users deserves being integrated into Iceberg. I think we need to strike a balance between the two possibilities and decide on a case-by-case basis.

For example, this PR does introduce a vendor-specific Auth Manager, since it can only work when running on Google's infra. But since it's fairly generic, I don't see any problem in having this Auth Manager in Iceberg.

Dremio seems to have taken the former approach: https://github.com/dremio/iceberg-auth-manager

That is not true: Dremio open-sourced this project recently, but it does not contain anything Dremio-specific. It is a greenfield project that aims at providing an alternate implementation of the Auth Manager API for OAuth, offering support for many more features than the ones supported by the built-in OAuth manager.

It is actually Dremio's intention to donate this project, but indeed that conversation didn't officially start yet.

Would this warrant a mailing list discussion?

Yes, definitely! I will start this discussion shortly.

@adutra
Copy link
Contributor

adutra commented Jun 16, 2025

@sungwy FYI:

https://lists.apache.org/thread/37w7v7gxfd0klb2lp6082wqto5krh9r8

Comment on lines +76 to +78
return newHeaders.equals(request.headers())
? request
: ImmutableHTTPRequest.builder().from(request).headers(newHeaders).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need this? I don't think we reuse requests and if we did, the putIfAbsent would resolve it. Seems like unnecessary complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep this because we re-use headers. https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L256
If someone add Authorizationheader it can broke workflow. I dont want to allow that behaviour.

@talatuyarer talatuyarer requested a review from danielcweeks June 16, 2025 23:30
Copy link
Contributor

@danielcweeks danielcweeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @talatuyarer

@danielcweeks danielcweeks merged commit 40c0a73 into apache:main Jun 19, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants