-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-18568. magic committer optional cleanup #7693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💔 -1 overall
This message was automatically generated. |
1d69cce
to
5c1a9a6
Compare
🎊 +1 overall
This message was automatically generated. |
@cnauroth Kindly request the PR review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blcksrx , thank you for the patch. This mostly looks good. I requested a few minor changes.
@steveloughran , cc'ing you since you had commented on the JIRA earlier.
.../hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTrackerUtils.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
Show resolved
Hide resolved
5c1a9a6
to
b7b20ba
Compare
Signed-off-by: Hossein Torabi <blcksrx@pm.me>
b7b20ba
to
53b5339
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Thank you. I'll wait to see if Steve has any other feedback.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Thank you very much, let's wait then |
@cnauroth Hi Chris, just checking is this PR going to merge? |
This happens because deleting many small files in S3 is a slow and expensive operation, especially at scale. | ||
In some cases, the cleanup phase alone can take several minutes or more — even after all data has already been written. | ||
|
||
To reduce this overhead, Hadoop 3.4.1+ introduced a configuration option in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to commit this with a small change here, "3.4.2+", as we're currently making release candidates for 3.4.2.
I committed this to trunk, branch-3.4 and branch-3.4.2. @blcksrx , thank you for the contribution. |
On trunk, we ran into a conflict with the recent JUnit 5 migration for the test file. I pushed up this change to add a required import: |
Description of PR
magic committer optional cleanup, make it up to the user to decide the magic committer cleanup committer
_magic
path or not to reduce the cleanup overhead.How was this patch tested?
It tested by checking the existing of the job commit path against this option:
the cleanup is enabled -> committer path does not exists after JobCommit
the cleanup is disabled -> committer path exists after JobCommit
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?