Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28294][CORE] Support spark.history.fs.cleaner.maxNum configuration #25072

Closed
wants to merge 6 commits into from
Closed

[SPARK-28294][CORE] Support spark.history.fs.cleaner.maxNum configuration #25072

wants to merge 6 commits into from

Conversation

dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

Up to now, Apache Spark maintains the given event log directory by time policy, spark.history.fs.cleaner.maxAge. However, there are two issues.

  1. Some file system has a limitation on the maximum number of files in a single directory. For example, HDFS dfs.namenode.fs-limits.max-directory-items is 1024 * 1024 by default.
    https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
  2. Spark is sometimes unable to to clean up some old log files due to permission issues (mainly, security policy).

To handle both (1) and (2), this PR aims to support an additional policy configuration for the maximum number of files in the event log directory, spark.history.fs.cleaner.maxNum. Spark will try to keep the number of files in the event log directory according to this policy.

How was this patch tested?

Pass the Jenkins with a newly added test case.

@SparkQA
Copy link

SparkQA commented Jul 8, 2019

Test build #107329 has finished for PR 25072 at commit 59277d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @srowen ! I'll update soon.

@dongjoon-hyun
Copy link
Member Author

Could you review once more? I addressed some of the comments. For the followings, please let me know your opinion if I missed the point. I can update more.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me. I usually dislike yet another config but I can see the need for this. This default should virtually never affect any cases that work today, right? like if someone had more than a million files it's already probably causing problems? just wanting to make sure this doesn't surprise anyone as a behavior change if possible.

docs/monitoring.md Outdated Show resolved Hide resolved
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 8, 2019

For the default value, I agree with your concerns.

Although this default value, 1M, is big enough not to surprise general HDFS-default users. But, there might be two exceptions.

  1. If they increase dfs.namenode.fs-limits.max-directory-items configuration already, the maximum is 6 * 1024 * 1024.
  2. If they are using non-HDFS storage, maybe S3?

So, not to surprise all users including S3, do you want me to use Int.MaxValue instead? I can change like that. Technically, that will disable this feature, but spark.history.fs.cleaner.enabled itself is disabled by default, too.

@dongjoon-hyun
Copy link
Member Author

The default value is increased to Int.MaxValue, too.

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Jul 8, 2019

Test build #107369 has finished for PR 25072 at commit f2c93a0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 8, 2019

Test build #107368 has finished for PR 25072 at commit 3444b73.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @srowen .
Could you review this once more please?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I wonder if @vanzin has any thoughts as he wrote the maxAge cleanup bit.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Meant to approve)

@dongjoon-hyun
Copy link
Member Author

Thank you so much for review and approval, @srowen !
Yes. I'll wait for @vanzin 's comment for one day.

Hi, @vanzin . Could you take a look at this if you have some time?

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107417 has finished for PR 25072 at commit f2c93a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Merged to master. Thank you, @srowen .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants