-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28294][CORE] Support spark.history.fs.cleaner.maxNum
configuration
#25072
Conversation
Test build #107329 has finished for PR 25072 at commit
|
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Outdated
Show resolved
Hide resolved
Thank you for review, @srowen ! I'll update soon. |
Could you review once more? I addressed some of the comments. For the followings, please let me know your opinion if I missed the point. I can update more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me. I usually dislike yet another config but I can see the need for this. This default should virtually never affect any cases that work today, right? like if someone had more than a million files it's already probably causing problems? just wanting to make sure this doesn't surprise anyone as a behavior change if possible.
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Outdated
Show resolved
Hide resolved
For the default value, I agree with your concerns. Although this default value, 1M, is big enough not to surprise general HDFS-default users. But, there might be two exceptions.
So, not to surprise all users including S3, do you want me to use |
The default value is increased to |
This comment has been minimized.
This comment has been minimized.
Test build #107369 has finished for PR 25072 at commit
|
Test build #107368 has finished for PR 25072 at commit
|
Hi, @srowen . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I wonder if @vanzin has any thoughts as he wrote the maxAge cleanup bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Meant to approve)
Retest this please. |
Test build #107417 has finished for PR 25072 at commit
|
Merged to master. Thank you, @srowen . |
What changes were proposed in this pull request?
Up to now, Apache Spark maintains the given event log directory by time policy,
spark.history.fs.cleaner.maxAge
. However, there are two issues.dfs.namenode.fs-limits.max-directory-items
is 1024 * 1024 by default.https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
To handle both (1) and (2), this PR aims to support an additional policy configuration for the maximum number of files in the event log directory,
spark.history.fs.cleaner.maxNum
. Spark will try to keep the number of files in the event log directory according to this policy.How was this patch tested?
Pass the Jenkins with a newly added test case.