-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17386. Change default fs.s3a.buffer.dir to be under Yarn container path on yarn applications #3908
HADOOP-17386. Change default fs.s3a.buffer.dir to be under Yarn container path on yarn applications #3908
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
a7eb3a4
to
27d64c4
Compare
This comment was marked as outdated.
This comment was marked as outdated.
27d64c4
to
f5efaa8
Compare
Rebased and tested in
Test result
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @monthonk
Hi @steveloughran, would you review this PR? |
@aajisaka Thank you for your time reviewing my PR! |
…iner path on yarn applications
f5efaa8
to
456c576
Compare
💔 -1 overall
This message was automatically generated. |
It is important that HADOOP-17631 is in first, otherwise am applications running in secure mode won't get a valid path here. given that is the case on this branch, all is good here. |
…iner path on yarn applications (apache#3908) Co-authored-by: Monthon Klongklaew <monthonk@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
…iner path on yarn applications (#3908) Co-authored-by: Monthon Klongklaew <monthonk@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
Description of PR
fs.s3a.buffer.dir defaults to hadoop.tmp.dir which is /tmp or similar. A lot of systems don't clean up /tmp until reboot -and if they stay up for a long time then they accrue files written through s3a staging committer from spark containers which fail.
Fix: use ${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/s3a as the option so that if env.LOCAL_DIRS is set is used over hadoop.tmp.dir. YARN-deployed apps will use that for the buffer dir. When the app container is destroyed, so is the directory.
How was this patch tested?
Injected LOCAL_DIRS env and verified that it was picked up by S3A. Also when it is not set, verified that hadoop.tmp.dir would be used as a fallback.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?