Skip to content

HADOOP-19050. SDK Add Support for AWS S3 Access Grants #6507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed
24 changes: 24 additions & 0 deletions hadoop-project/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1976,6 +1976,30 @@
<artifactId>log4j-web</artifactId>
<version>${log4j2.version}</version>
</dependency>
<dependency>
<groupId>software.amazon.s3.accessgrants</groupId>
<artifactId>aws-s3-accessgrants-java-plugin</artifactId>
<version>2.0.0</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this version flag into a property next to the other aws s3 options, at least until this module is merged into bundle.jar. this makes it overrideable

<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>software.amazon.awssdk</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>com.github.ben-manes.caffeine</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you confident this isn't used?

<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.assertj</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should really be scoped as testing in the module itself...

<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</dependencyManagement>

Expand Down
5 changes: 5 additions & 0 deletions hadoop-tools/hadoop-aws/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,11 @@
<artifactId>bundle</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go in hadoop-project, and then set the dependency here. We should probably also use provided scope, so the jar is optional.

See this PR, which added the client side encryption for example #6164 and this class which checks if the class exists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveloughran do you have any advice here? I think we should do what we did for Client Side Encryption, have this S3 access grants jar as optional, and create a new client factory which will add the S3 access grants plugin.

If there are other plugins that we want to add in the future, this new client factory can be generalised for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the plugin should go in bundle.jar. it's meant to be a bundle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should all be moved into the pom.xml in hadoop-project as that where we define dependencies. look at how we import the sdk dependency in this pom.xml for example.

<groupId>software.amazon.s3.accessgrants</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this in bundle.jar?

  • if it is: it's not needed.
  • If it isn't, why not?

is this new jar going to be mandatory, or optional?

Copy link
Contributor

@adnanhemani adnanhemani Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new JAR should be optional - and as per the original instructions the AWS S3 Access Grants team was given by the AWS Java SDK team, this plugin should not be part of the AWS Java SDK nor the SDK bundle. The reasoning behind this was that Java SDKv2 Plugins should be considered as "open source" for the most part as they are only interfaces that anyone can implement and then use wherever they'd like. In other words, the S3 Access Grants plugin should be, in theory, treated as any other open source dependency that we would be utilizing if a customer explicitly enables this in S3A.

So, to further answer the question, we need to find a way to optionally load these classes if a user specifies that they'd like to use the plugin AND provides the JAR on the classpath. That is missing from this PR as of now and @jxhan3 and I will work on it. I think @ahmarsuhail's link above has a good call pattern for doing this - we'll follow this unless you have any other suggestion.

One interesting thing to note - I've seen the S3ExpressPlugin being merged into the AWS Java SDK (which was explicitly not the recommended option by the AWS Java SDK team, per my understanding). I've started further inquiries to find why that's the case - and how this is different than S3 Access Grants. Will report my findings here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmarsuhail put in the work to get s3 express in

<artifactId>aws-s3-accessgrants-java-plugin</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1624,4 +1624,21 @@ private Constants() {
* Value: {@value}.
*/
public static final boolean DEFAULT_AWS_S3_CLASSLOADER_ISOLATION = true;

/**
* Flag {@value}
* to enable S3 Access Grants to control authorization to S3 data. More information:
* https://aws.amazon.com/s3/features/access-grants/
* and
* https://github.com/aws/aws-s3-accessgrants-plugin-java-v2/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use {@value} in the comments so ides will show the value

*/
public static final String AWS_S3_ACCESS_GRANTS_ENABLED = "fs.s3a.accessgrants.enabled";

/**
* Flag {@value} to enable jobs fall back to the Job Execution IAM role in
* case they get Access Denied from the S3 Access Grants call. More information:
* https://github.com/aws/aws-s3-accessgrants-plugin-java-v2/
*/
public static final String AWS_S3_ACCESS_GRANTS_FALLBACK_TO_IAM_ENABLED =
"fs.s3a.accessgrants.fallbacktoiam";
}
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
package org.apache.hadoop.fs.s3a;

import java.io.IOException;
import java.lang.reflect.Method;
import java.net.URI;
import java.net.URISyntaxException;

Expand Down Expand Up @@ -53,6 +54,7 @@
import org.apache.hadoop.fs.store.LogExactlyOnce;

import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION;
import static org.apache.hadoop.fs.s3a.Constants.AWS_S3_ACCESS_GRANTS_ENABLED;
import static org.apache.hadoop.fs.s3a.Constants.AWS_S3_DEFAULT_REGION;
import static org.apache.hadoop.fs.s3a.Constants.CENTRAL_ENDPOINT;
import static org.apache.hadoop.fs.s3a.Constants.FIPS_ENDPOINT;
Expand Down Expand Up @@ -88,6 +90,8 @@ public class DefaultS3ClientFactory extends Configured
protected static final Logger LOG =
LoggerFactory.getLogger(DefaultS3ClientFactory.class);

private static final LogExactlyOnce LOG_S3AG_ENABLED = new LogExactlyOnce(LOG);

/**
* A one-off warning of default region chains in use.
*/
Expand All @@ -112,6 +116,8 @@ public class DefaultS3ClientFactory extends Configured
public static final String ERROR_ENDPOINT_WITH_FIPS =
"An endpoint cannot set when " + FIPS_ENDPOINT + " is true";

private static final String S3AG_UTIL_CLASSNAME =
"org.apache.hadoop.fs.s3a.impl.S3AccessGrantsUtil";
@Override
public S3Client createS3Client(
final URI uri,
Expand Down Expand Up @@ -178,6 +184,8 @@ private <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> Build

configureEndpointAndRegion(builder, parameters, conf);

applyS3AccessGrantsConfigurations(builder, conf);

S3Configuration serviceConfiguration = S3Configuration.builder()
.pathStyleAccessEnabled(parameters.isPathStyleAccess())
.checksumValidationEnabled(parameters.isChecksumValidationEnabled())
Expand Down Expand Up @@ -401,4 +409,33 @@ private static Region getS3RegionFromEndpoint(final String endpoint,
return Region.of(AWS_S3_DEFAULT_REGION);
}

public static <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method can be private?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for testing purpose, otherwise we may need to use reflection to test private method. Please share your thoughts on this. Thanks.

applyS3AccessGrantsConfigurations(BuilderT builder, Configuration conf) {
boolean s3agEnabled = conf.getBoolean(AWS_S3_ACCESS_GRANTS_ENABLED, false);
if (!s3agEnabled){
LOG.debug("S3 Access Grants plugin is not enabled.");
return;
}
try {
LOG_S3AG_ENABLED.info("S3 Access Grants plugin is enabled.");
Class s3agUtil = Class.forName(S3AG_UTIL_CLASSNAME);
Method applyS3agConfig =
s3agUtil.getMethod("applyS3AccessGrantsConfigurations", S3BaseClientBuilder.class, Configuration.class);
applyS3agConfig.invoke(null, builder, conf);
} catch (ClassNotFoundException e) {
LOG.debug(
"Class {} is not found exception: {}.",
S3AG_UTIL_CLASSNAME,
e.getStackTrace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. pass in e and let the logger handle the rest.
  2. use multiple classes in the catch statement to avoid duplication/maintenance costs

);
} catch (Exception e) {
LOG.debug("{} exception: {})", e.getClass(), e.getStackTrace());
} catch (NoClassDefFoundError e) {
LOG.debug(
"Class {} is not found error: ",
S3AG_UTIL_CLASSNAME,
e.getStackTrace()
);
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.hadoop.fs.s3a.impl;

import org.apache.hadoop.conf.Configuration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: import ordering. I will keep complaining about this on new classes, so learn them and we will both save time. There's a style file for intellij somewhere to help

import org.apache.hadoop.fs.s3a.DefaultS3ClientFactory;
import org.apache.hadoop.fs.store.LogExactlyOnce;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import software.amazon.awssdk.s3accessgrants.plugin.S3AccessGrantsPlugin;
import software.amazon.awssdk.services.s3.S3BaseClientBuilder;

import static org.apache.hadoop.fs.s3a.Constants.AWS_S3_ACCESS_GRANTS_FALLBACK_TO_IAM_ENABLED;

public class S3AccessGrantsUtil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. javadocs here and on methods below.
  2. mark as final and give a private constructor


protected static final Logger LOG =
LoggerFactory.getLogger(S3AccessGrantsUtil.class);

private static final LogExactlyOnce LOG_S3AG_PLUGIN_INFO = new LogExactlyOnce(LOG);
private static final String S3AG_PLUGIN_CLASSNAME =
"software.amazon.awssdk.s3accessgrants.plugin.S3AccessGrantsPlugin";

/**
* S3 Access Grants plugin availability.
*/
private static final boolean S3AG_PLUGIN_FOUND = checkForS3AGPlugin();

private static boolean checkForS3AGPlugin() {
try {
ClassLoader cl = DefaultS3ClientFactory.class.getClassLoader();
cl.loadClass(S3AG_PLUGIN_CLASSNAME);
LOG.debug("S3 Access Grants plugin class {} found", S3AG_PLUGIN_CLASSNAME);
return true;
} catch (Exception e) {
LOG.debug("S3 Access Grants plugin class {} not found", S3AG_PLUGIN_CLASSNAME, e);
return false;
}
}

/**
* Is the S3AG plugin available?
* @return true if it was found in the classloader
*/
private static synchronized boolean isS3AGPluginAvailable() {
return S3AG_PLUGIN_FOUND;
}

public static <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> void
applyS3AccessGrantsConfigurations(BuilderT builder, Configuration conf) {
if (isS3AGPluginAvailable()) {
boolean s3agFallbackEnabled = conf.getBoolean(
AWS_S3_ACCESS_GRANTS_FALLBACK_TO_IAM_ENABLED, false);
S3AccessGrantsPlugin accessGrantsPlugin =
S3AccessGrantsPlugin.builder().enableFallback(s3agFallbackEnabled).build();
builder.addPlugin(accessGrantsPlugin);
LOG_S3AG_PLUGIN_INFO.info("S3 Access Grants plugin is added to s3 client with fallback: {}", s3agFallbackEnabled);
}
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

# S3 Access Grants

<!-- MACRO{toc|fromDepth=0|toDepth=5} -->

S3 Access Grants is a credential vending service for S3 data. More information:
* https://aws.amazon.com/s3/features/access-grants/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use bracketed links with [title] as well as (url)


In S3A, S3 Access Grants Plugin is used to support S3 Access Grants. More information:
* https://github.com/aws/aws-s3-accessgrants-plugin-java-v2/



## How to enable S3 Access Grants in S3A

1. Add the `hadoop-aws` JAR on your classpath.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just say "hadoop-aws JAR of the hadoop release and the normal dependencies, including bundle.jar

(gives us scope to add more mandatory stuff; i think wildfly is there already)


1. Add the `aws-java-sdk-bundle.jar` JAR to your classpath, the minimum version is v2.23.7.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cut this. name is wrong and version is already out of date


2. Add the `aws-s3-accessgrants-java-plugin-2.0.0.jar` JAR to your classpath.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't state the version; adds more work and will go out of date fast.

3. Add the `caffeine.jar` JAR to your classpath.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if it is compulsory then leave it as an access grant dependency.


1. Add configurations to enable S3 Access Grants in `core-site.xml`



Example:

```xml
<configuration>
...
<property>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use two chars for indentation

<name>fs.s3a.accessgrants.enabled</name>
<value>true</value>
<description>Enable S3 Access Grants or not</description>
</property>
<property>
<name>fs.s3a.accessgrants.fallbacktoiam</name>
<value>false</value>
<description>Enable IAM Policy as fallback or not</description>
</property>
...
</configuration>
```




Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.hadoop.fs.s3a;

import org.apache.hadoop.conf.Configuration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: review import ordering

import org.apache.hadoop.test.AbstractHadoopTestBase;
import org.junit.Test;

import software.amazon.awssdk.services.s3.S3AsyncClient;
import software.amazon.awssdk.services.s3.S3BaseClientBuilder;
import software.amazon.awssdk.services.s3.S3Client;

import static org.apache.hadoop.fs.s3a.Constants.AWS_S3_ACCESS_GRANTS_ENABLED;
import static org.junit.Assert.assertEquals;


/**
* Test S3 Access Grants configurations.
*/
public class TestS3AccessGrantConfiguration extends AbstractHadoopTestBase {

@Test
public void testS3AccessGrantsEnabled() {
applyVerifyS3AGPlugin(S3Client.builder(), false, true);
}

@Test
public void testS3AccessGrantsEnabledAsync() {
applyVerifyS3AGPlugin(S3AsyncClient.builder(), false, true);
}

@Test
public void testS3AccessGrantsDisabled() {
applyVerifyS3AGPlugin(S3Client.builder(), false, false);
}

@Test
public void testS3AccessGrantsDisabledByDefault() {
applyVerifyS3AGPlugin(S3Client.builder(), true, false);
}

@Test
public void testS3AccessGrantsDisabledAsync() {
applyVerifyS3AGPlugin(S3AsyncClient.builder(), false, false);
}

@Test
public void testS3AccessGrantsDisabledByDefaultAsync() {
applyVerifyS3AGPlugin(S3AsyncClient.builder(), true, false);
}

private Configuration createConfig(boolean isDefault, boolean s3agEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc

Configuration conf = new Configuration();
if (!isDefault){
conf.setBoolean(AWS_S3_ACCESS_GRANTS_ENABLED, s3agEnabled);
}
return conf;
}

private <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> void
applyVerifyS3AGPlugin(BuilderT builder, boolean isDefault, boolean enabled) {
DefaultS3ClientFactory.applyS3AccessGrantsConfigurations(builder, createConfig(isDefault, enabled));
if (enabled){
assertEquals(1, builder.plugins().size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert j assert on .hasSize() for list

assertEquals("software.amazon.awssdk.s3accessgrants.plugin.S3AccessGrantsPlugin",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assertj asserts here

builder.plugins().get(0).getClass().getName()
);
}
else {
assertEquals(builder.plugins().size(), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertj

}
}

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add new line after class