Skip to content

HDFS-16852. Skip KeyProviderCache shutdown hook registration if already shutting down #5160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 16, 2022

Conversation

xinglin
Copy link
Contributor

@xinglin xinglin commented Nov 22, 2022

Description of PR

Check whether shutdown is in-progress before registering a new shutdown. Registering a shutdown hook during shutdown is not supported. For more details, please check HDFS-16852.

How was this patch tested?

mvn test -Dtest="TestKeyProviderCache"

[INFO] Running org.apache.hadoop.hdfs.TestKeyProviderCache
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.475 s - in org.apache.hadoop.hdfs.TestKeyProviderCache
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@xinglin
Copy link
Contributor Author

xinglin commented Nov 22, 2022

@xkrogen Can you give a review? 😄

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 57s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 42m 9s trunk passed
+1 💚 compile 1m 9s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 1m 0s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 39s trunk passed
+1 💚 mvnsite 1m 4s trunk passed
+1 💚 javadoc 0m 57s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 43s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 2m 58s trunk passed
+1 💚 shadedclient 25m 31s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 52s the patch passed
+1 💚 compile 0m 58s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 58s the patch passed
+1 💚 compile 0m 50s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 50s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 21s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 mvnsite 0m 53s the patch passed
+1 💚 javadoc 0m 37s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 33s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 2m 45s the patch passed
+1 💚 shadedclient 25m 4s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 24s hadoop-hdfs-client in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
112m 2s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5160/1/artifact/out/Dockerfile
GITHUB PR #5160
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 34d69e3a6755 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 61a10bb
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5160/1/testReport/
Max. process+thread count 601 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5160/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@brahmareddybattula
Copy link
Contributor

@xinglin curious to know, did you observe any connection leak b/w KMS and NN here when prelaunch is failed..? how does this will help closing the connection.?

Copy link
Contributor

@xkrogen xkrogen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back in Hadoop 2.1.0, HDFS-4841 was added which made it possible to instantiate a FileSystem object inside of a shutdown hook. There are valid situations where this happens, as evidenced by both HDFS-4841 and SPARK-3900.

The logic being changed in this PR was added only in March 2022 as part of HDFS-16518. The change has been released ONLY in 2.10.2, since neither 3.3.5 nor 3.4.0 have been released at this time. This is likely why we are not (yet) seeing other complaints about this issue, since usage of 2.10.2 may be limited.

The biggest problem here is that this blocks creation of any HDFS client in a shutdown hook, regardless of whether or not you're using the KeyProviderCache, since the shutdown hook is added in the constructor and a KeyProviderCache is always initialized as part of ClientContext initialization:

Regarding concern about leaks, if desired, we can add a flag in the KeyProviderCache like finalizerInitialized and refuse to serve KeyProvider instances if initialization was unsuccessful. This at least makes it possible to initialize/use FS objects that don't make use of the cache, limiting the problem scope. I don't have enough context on HDFS-16518 to say if this is necessary or not.

cc @li-leyang @omalley @ibuenros since you were all involved in HDFS-16518.

@xinglin xinglin changed the title HDFS-16852 Swallow IllegalStateException in KeyProviderCache constructor HDFS-16852 Register the shutdown hook only when not in shutdown for KeyProviderCache constructor Dec 3, 2022
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 45s trunk passed
+1 💚 compile 1m 12s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 1m 4s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 44s trunk passed
+1 💚 mvnsite 1m 9s trunk passed
+1 💚 javadoc 1m 0s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 48s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 2m 56s trunk passed
+1 💚 shadedclient 26m 7s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 54s the patch passed
+1 💚 compile 1m 1s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javac 1m 1s the patch passed
+1 💚 compile 0m 50s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 50s the patch passed
+1 💚 blanks 0m 1s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 23s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 mvnsite 0m 54s the patch passed
+1 💚 javadoc 0m 39s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 36s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 2m 45s the patch passed
+1 💚 shadedclient 25m 32s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 29s hadoop-hdfs-client in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
113m 17s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5160/2/artifact/out/Dockerfile
GITHUB PR #5160
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux c98922e24817 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b8eb012
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5160/2/testReport/
Max. process+thread count 582 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5160/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 58s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 42m 20s trunk passed
+1 💚 compile 1m 1s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 50s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 31s trunk passed
+1 💚 mvnsite 0m 57s trunk passed
+1 💚 javadoc 0m 46s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 35s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 2m 49s trunk passed
+1 💚 shadedclient 24m 40s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 48s the patch passed
+1 💚 compile 0m 54s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 54s the patch passed
+1 💚 compile 0m 45s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 45s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 17s the patch passed
+1 💚 mvnsite 0m 49s the patch passed
+1 💚 javadoc 0m 34s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 31s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 2m 45s the patch passed
+1 💚 shadedclient 25m 1s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 18s hadoop-hdfs-client in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
109m 7s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5160/3/artifact/out/Dockerfile
GITHUB PR #5160
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 22a616d2418f 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3c8fd54
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5160/3/testReport/
Max. process+thread count 534 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5160/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@li-leyang
Copy link
Contributor

li-leyang commented Dec 13, 2022

I think the original intention was to explicitly invalidate KeyProviderCache at each DFSClient close(For KMS use case). We need to explicit close this singleton cache at DFSClient close hence this was added in JVM ShutdownHook.
For some cases that KeyProviderCache is not used, I agree that we can swallow the exception and not add shutdownhook.

@li-leyang
Copy link
Contributor

LGTM

@xkrogen
Copy link
Contributor

xkrogen commented Dec 13, 2022

For some cases that KeyProviderCache is not used, I agree that we can swallow the exception and not add shutdownhook.

Right now we skip adding the shutdown hook regardless of whether or not KeyProviderCache is actually used. Will this be an issue? There is a chance that, during the execution of shutdown hooks, we add something to the cache which will never be closed. Is this an acceptable scenario?

If it's not, we can make it so that KeyProviderCache is non-functional if the shutdown hook couldn't be added (set some flag inside of the cache which rejects future calls to get(), or something along those lines). But obviously this is added complexity. Do you think it is necessary @li-leyang ?

@xinglin
Copy link
Contributor Author

xinglin commented Dec 15, 2022

@li-leyang you gave a LGTM. Do you think the current change is good enough?

@li-leyang
Copy link
Contributor

For some cases that KeyProviderCache is not used, I agree that we can swallow the exception and not add shutdownhook.

Right now we skip adding the shutdown hook regardless of whether or not KeyProviderCache is actually used. Will this be an issue? There is a chance that, during the execution of shutdown hooks, we add something to the cache which will never be closed. Is this an acceptable scenario?

If it's not, we can make it so that KeyProviderCache is non-functional if the shutdown hook couldn't be added (set some flag inside of the cache which rejects future calls to get(), or something along those lines). But obviously this is added complexity. Do you think it is necessary @li-leyang ?

I think this should be fine. The KeyProviderCache also has expiration mechanism so it will get cleared eventually. This was introduced to tackle a bug where each DFSClient instance closes a global KeyProviderCache.

@xkrogen
Copy link
Contributor

xkrogen commented Dec 16, 2022

Yetus is failing the tests4tests check since no tests are added/modified, but given this is just a subtle change to shutdown hook behavior, I don't think we can realistically unit test it.

I am merging to trunk. Thanks for the contribution @xinglin and the review @li-leyang !

@xkrogen xkrogen changed the title HDFS-16852 Register the shutdown hook only when not in shutdown for KeyProviderCache constructor HDFS-16852. Skip KeyProviderCache shutdown hook registration if already shutting down Dec 16, 2022
@xkrogen xkrogen merged commit f7bdf6c into apache:trunk Dec 16, 2022
xkrogen pushed a commit that referenced this pull request Dec 16, 2022
…dy shutting down (#5160)

Signed-off-by: Erik Krogen <xkrogen@apache.org>

(cherry picked from commit f7bdf6c)
xkrogen pushed a commit that referenced this pull request Dec 16, 2022
…dy shutting down (#5160)

Signed-off-by: Erik Krogen <xkrogen@apache.org>

(cherry picked from commit f7bdf6c)
(cherry picked from commit d43fa95)
@xkrogen
Copy link
Contributor

xkrogen commented Dec 16, 2022

Backported to branch-3.3 and branch-2.10 as well; all of the places that HDFS-16518 landed.

@xinglin xinglin deleted the HDFS-16852 branch December 17, 2022 00:46
slfan1989 pushed a commit to slfan1989/hadoop that referenced this pull request Dec 20, 2022
…dy shutting down (apache#5160)

Signed-off-by: Erik Krogen <xkrogen@apache.org>
NyteKnight pushed a commit to NyteKnight/hadoop that referenced this pull request Jun 25, 2024
…dy shutting down (apache#5160)

Signed-off-by: Erik Krogen <xkrogen@apache.org>

(cherry picked from commit f7bdf6c)
(cherry picked from commit d43fa95)

ACLOVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants