Skip to content

HADOOP-16524. Reloading SSL keystore for both DataNode and NameNode #2470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 8, 2021

Conversation

bolerio
Copy link
Contributor

@bolerio bolerio commented Nov 18, 2020

Addresses https://issues.apache.org/jira/browse/HADOOP-16524, in addition covering the DataNode use case.

  • Following the existing ReloadingX509TrustManager, a new ReloadingX509KeystoreManager was created.
  • Existing code slightly refactored so both trust manager and keystore reloading managers share the monitoring logic within a single java.util.Timer (and therefore a single thread).
  • In HttpServer2, the same strategy as a previously proposed patch (see above) is used with SSLContextFactory.reload, but with the addition of cleanup upon stopping the server.
  • A new config parameter which applies to all of the above FileBasedKeyStoresFactory.SSL_STORES_RELOAD_INTERVAL_TPL_KEY supersedes the existing FileBasedKeyStoresFactory.SSL_TRUSTSTORE_RELOAD_INTERVAL_TPL_KEY which only applies to trust store reloading. Setting the value to 0 (default is 10s) to this parameter disables the reloading.

If this PR is accepted, I would need guidance where/how to update the docs with that new configuration parameter.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 1s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 56s trunk passed
+1 💚 compile 21m 45s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 compile 18m 27s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 checkstyle 0m 49s trunk passed
+1 💚 mvnsite 1m 29s trunk passed
+1 💚 shadedclient 19m 56s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 28s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+0 🆗 spotbugs 2m 22s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 20s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 56s the patch passed
+1 💚 compile 21m 14s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javac 21m 14s the patch passed
+1 💚 compile 18m 25s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 javac 18m 25s the patch passed
-0 ⚠️ checkstyle 0m 49s /diff-checkstyle-hadoop-common-project_hadoop-common.txt hadoop-common-project/hadoop-common: The patch generated 270 new + 68 unchanged - 11 fixed = 338 total (was 79)
+1 💚 mvnsite 1m 26s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 17m 15s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javadoc 1m 28s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 findbugs 2m 29s the patch passed
_ Other Tests _
+1 💚 unit 10m 7s hadoop-common in the patch passed.
+1 💚 asflicense 0m 47s The patch does not generate ASF License warnings.
185m 34s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2470/1/artifact/out/Dockerfile
GITHUB PR #2470
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux c8a7c4d33c05 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a7b923c
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2470/1/testReport/
Max. process+thread count 1879 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2470/1/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@sunchao
Copy link
Member

sunchao commented Nov 30, 2020

cc @kihwal @saintstack

conf.getLong(FileBasedKeyStoresFactory.SSL_STORES_RELOAD_INTERVAL_TPL_KEY,
FileBasedKeyStoresFactory.DEFAULT_SSL_STORES_RELOAD_INTERVAL);

this.configurationChangeMonitor = storesReloadInterval > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this.configurationChangeMonitor is already Optional.empty? So, change this to be if (storesReloadInterval > 0) {....}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

conn.addFirstConnectionFactory(new SslConnectionFactory(sslContextFactory,
HttpVersion.HTTP_1_1.asString()));

return conn;
}

private java.util.Timer makeConfigurationChangeMonitor(long reloadInterval,
SslContextFactory.Server sslContextFactory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the style of the backing class... It doesn't do these big right-shifts on parameters that go to the second line. Be careful w/ line lengths too. The backing file seems to do 80 chars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

try {
sslContextFactory.reload(factory -> { });
}
catch (Exception ex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: See the formatting for catch in the rest of the file. It does not do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

conn.addFirstConnectionFactory(new SslConnectionFactory(sslContextFactory,
HttpVersion.HTTP_1_1.asString()));

return conn;
}

private java.util.Timer makeConfigurationChangeMonitor(long reloadInterval,
SslContextFactory.Server sslContextFactory) {
java.util.Timer timer = new java.util.Timer("SSL Certificates Store Monitor", true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the full qualification of Timer here and as function return? You have imported java.util.Timer so no need of these 'java.util' prefixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* if the trust certificates keystore file changes, the {@link TrustManager}
* is refreshed with the new trust certificate entries (using a
* {@link ReloadingX509TrustManager} trustmanager).
* If either the truststore or the keystore certificates file changes, it would be refreshed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: line lengths

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

private void createTrustManagersFromConfiguration(SSLFactory.Mode mode,
String truststoreType,
String truststoreLocation,
long storesReloadInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the rest of this class do this extreme rightward shifting of parameters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... maybe it does. Ignore the above comment then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but it's true the most of the code base doesn't have it. Should I change? I don't mind...just inadvertently sneaking in my old habits I suppose....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like your code habits align w/ the backing file here. I'd leave it as is.

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. A few nits in below.

storesReloadInterval);

if (LOG.isDebugEnabled()) {
LOG.debug(mode.toString() + " TrustStore: " + truststoreLocation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log the interval found in config here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

private void createTrustManagersFromConfiguration(SSLFactory.Mode mode,
String truststoreType,
String truststoreLocation,
long storesReloadInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... maybe it does. Ignore the above comment then.

import java.util.function.Consumer;

/**
* <p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need of the

wrappers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


private String type;
private String storePassword;
private String keyPassword;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

@Override
public String chooseEngineClientAlias(String[] strings, Principal[] principals, SSLEngine sslEngine) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line lengths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

this.keyManagerRef.set(loadKeyManager(path));
} catch (Exception ex) {
// The Consumer.accept interface forces us to convert to unchecked
throw new RuntimeException(ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen when this comes out? Where will it be caught? Will it cause damage causing process exit or thread exit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets caught by the FileMonitoringTimerTask and from there it's logged. It won't cause any damage, but it will keep logging the error on every reload interval if it's not resolved, potentially filling the logs and disk space.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ docker 5m 36s Docker failed to build yetus/hadoop:4b312810ae0.
Subsystem Report/Notes
GITHUB PR #2470
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2470/2/console
versions git=2.17.1
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 30m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 35m 34s trunk passed
+1 💚 compile 21m 27s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 compile 18m 4s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 checkstyle 0m 49s trunk passed
+1 💚 mvnsite 1m 27s trunk passed
+1 💚 shadedclient 19m 40s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 57s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javadoc 1m 29s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+0 🆗 spotbugs 2m 21s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 19s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 54s the patch passed
+1 💚 compile 21m 55s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javac 21m 55s the patch passed
+1 💚 compile 20m 20s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 javac 20m 20s the patch passed
-0 ⚠️ checkstyle 1m 0s /diff-checkstyle-hadoop-common-project_hadoop-common.txt hadoop-common-project/hadoop-common: The patch generated 168 new + 68 unchanged - 11 fixed = 236 total (was 79)
+1 💚 mvnsite 1m 42s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 19m 27s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 11s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javadoc 1m 39s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 findbugs 2m 50s the patch passed
_ Other Tests _
-1 ❌ unit 10m 40s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
+1 💚 asflicense 0m 57s The patch does not generate ASF License warnings.
217m 30s
Reason Tests
Failed junit tests hadoop.security.TestFixKerberosTicketOrder
hadoop.metrics2.source.TestJvmMetrics
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2470/3/artifact/out/Dockerfile
GITHUB PR #2470
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 055faef77fb3 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / aaf9e3d
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2470/3/testReport/
Max. process+thread count 1345 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2470/3/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Have you tried it? If so, want to dump in here what you've done?

The original patch posted did not cover DN (See JIRA). This works for DN and NN?

Thanks.

private void createTrustManagersFromConfiguration(SSLFactory.Mode mode,
String truststoreType,
String truststoreLocation,
long storesReloadInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like your code habits align w/ the backing file here. I'd leave it as is.

@bolerio
Copy link
Contributor Author

bolerio commented Dec 11, 2020

LGTM.

Have you tried it? If so, want to dump in here what you've done?

The original patch posted did not cover DN (See JIRA). This works for DN and NN?

Thanks.

Yes, both DN and NN are covered and this was integration tested (manually) in a pre-prod environment on all nodes. DevOps team happy with it.

Thanks so much for the review!

@saintstack
Copy link
Contributor

Rerunning tests to see if failures related (Getting ready to merge)

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 23s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 23s trunk passed
+1 💚 compile 21m 56s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 compile 18m 21s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 checkstyle 0m 48s trunk passed
+1 💚 mvnsite 1m 30s trunk passed
+1 💚 shadedclient 19m 30s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 59s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+0 🆗 spotbugs 2m 20s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 19s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 56s the patch passed
+1 💚 compile 25m 46s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javac 25m 46s the patch passed
+1 💚 compile 22m 7s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 javac 22m 7s the patch passed
-0 ⚠️ checkstyle 0m 49s /diff-checkstyle-hadoop-common-project_hadoop-common.txt hadoop-common-project/hadoop-common: The patch generated 167 new + 68 unchanged - 11 fixed = 235 total (was 79)
+1 💚 mvnsite 1m 32s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 17m 54s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 0s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javadoc 1m 32s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 findbugs 2m 30s the patch passed
_ Other Tests _
+1 💚 unit 10m 19s hadoop-common in the patch passed.
+1 💚 asflicense 0m 50s The patch does not generate ASF License warnings.
191m 58s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2470/4/artifact/out/Dockerfile
GITHUB PR #2470
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 75ee856a1074 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1b17910
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2470/4/testReport/
Max. process+thread count 2328 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2470/4/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@saintstack saintstack merged commit e306f59 into apache:trunk Jan 8, 2021
saintstack pushed a commit to saintstack/hadoop that referenced this pull request Jan 8, 2021
…pache#2470)

Co-authored-by: Borislav Iordanov <biordanov@apple.com>
Signed-off-by: stack <stack@apache.org>
saintstack added a commit that referenced this pull request Jan 8, 2021
…2470) (#2609)

Co-authored-by: Borislav Iordanov <biordanov@apple.com>
Signed-off-by: stack <stack@apache.org>

Co-authored-by: Borislav Iordanov <borislav.iordanov@gmail.com>
Co-authored-by: Borislav Iordanov <biordanov@apple.com>
@amahussein
Copy link
Contributor

@saintstack Thanks for your contribution.
This PR caused failures to other unit tests. Can you please take a look to the list of failures.
https://issues.apache.org/jira/browse/HADOOP-16524?focusedCommentId=17262765&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17262765

asfgit pushed a commit that referenced this pull request Jan 11, 2021
asfgit pushed a commit that referenced this pull request Jan 11, 2021
bolerio pushed a commit to bolerio/hadoop that referenced this pull request Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants