Skip to content

[SPARK-23550][core] Cleanup Utils. #20706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Mar 1, 2018

A few different things going on:

  • Remove unused methods.
  • Move JSON methods to the only class that uses them.
  • Move test-only methods to TestUtils.
  • Make getMaxResultSize() a config constant.
  • Reuse functionality from existing libraries (JRE or JavaUtils) where possible.

The change also includes changes to a few tests to call Utils.createTempFile correctly,
so that temp dirs are created under the designated top-level temp dir instead of
potentially polluting git index.

A few different things going on:
- Remove unused methods.
- Move JSON methods to the only class that uses them.
- Move test-only methods to TestUtils.
- Make getMaxResultSize() a config constant.
- Reuse functionality from existing libraries (JRE or JavaUtils) where possible.

There is a slight change in the behavior of "createDirectory()": it does not create
a full directory tree anymore, just the directory being requested. That's better since
the code would only set up proper permissions for the leaf directory, and this change
requires its parent to already exist.
dir = null
}
} catch { case e: SecurityException => dir = null; }
val prefix = namePrefix + "-"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there a reason you rewriting this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was written when Spark still supported Java 1.6, and java.nio.file did not exist.

val prefix = namePrefix + "-"
val dir = if (root != null) {
val rootDir = new File(root)
require(rootDir.isDirectory(), s"Root path $root must be an existing directory.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the nio api's? e.g.:

val rootDir = Paths.get(root)
require(Files.isDirectory(rootDir), s"Root path $root must be an existing directory.")
Files.createTempDirectory(rootDir, prefix)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing in the end, no? This method needs to return File regardless of what it does internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also an argument for not changing this method at all. Just use File.createTempFile(...) and be done with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for a directory, not file. File.createTempFile does not work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I missed that sorry. However my point still stands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what that point really is, but if you feel so strongly about using the nio APIs, that's easy enough.

@gatorsmile
Copy link
Member

The behavior changes have to be separated from the code refactoring.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 1, 2018

The behavior changes have to be separated from the code refactoring.

The behavior change only affects tests, which have been fixed in this PR.

@gatorsmile
Copy link
Member

Unfortunately, I do not know whether this change only affects the test cases. Any behavior change require a conf. We can get rid of it later if nobody hit the issue.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 1, 2018

I do not know whether this change only affects the test cases.

I do, because I checked. The tests were using the API incorrectly, actually, and could leave garbage outside the temp dirs, polluting the source tree. That wrong usage pattern was actually the thing that made tests be affected by the change. No code outside of tests uses the API in that way.

And adding a conf would kinda defeat the purpose of using the existing JRE API, since it would require maintaining the old code.

In summary, not all behavior changes are equal, especially if you're talking about an internal API.

That being said, I'm going to run some tests with YARN+Kerneros+ShuffleService to make sure things are the same - I think I may need to adjust something in DiskBlockManager.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 1, 2018

Ok, I'm going to revert the createDirectory changes because the Files.createTempDirectory api forces permissions to 700, and secure YARN requires that the block manager directories respect the umask set for the container or things break. I could fix that in DiskBlockManager but reverting sounds safer.

I'll keep the test changes since they're correct and fix a potential issue with garbage being left around the working dir.

@SparkQA
Copy link

SparkQA commented Mar 1, 2018

Test build #87850 has finished for PR 20706 at commit 4b8b8e2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87853 has finished for PR 20706 at commit c239d96.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87855 has finished for PR 20706 at commit d026fff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 2, 2018

I'm almost sure there's a bug somewhere for that flaky test, but if there isn't I'll file one.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 2, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87863 has finished for PR 20706 at commit d026fff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at some of the changes that move methods out of Utils, they seems valid till now, but I'm not sure whether these are good move because I'm afraid we may have to use them somewhere else in the future and then we have to move them back to Utils. I really feel we shall be conservative to make these changes.

/**
* Lists files recursively.
*/
def recursiveList(f: File): Array[File] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is only used by test suites for now, but can you still keep this in Utils so we don't have to move it back when we want to use it in the production code (I assume this may happen someday).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that we should keep things in production code "just in case".

This is also an example of a function that if used without care is dangerous. Files.walk is a much better way of achieving this.

@@ -1876,17 +1822,6 @@ private[spark] object Utils extends Logging {
obj.getClass.getSimpleName.replace("$", "")
}

/** Return an option that translates JNothing to None */
def jsonOption(json: JValue): Option[JValue] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used in JSONProtocol for now, but this is also a useful JSON method that could apply to some other JSON operations in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be using json4s for new things, so it's better to have these hidden away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we should not use json4s for new things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It encourages writing manual serialization and deserialization code, which is error prone.

@hvanhovell
Copy link
Contributor

@jiangxb1987 keeping dead code around will inevitably lead to code rot. If there is not use for something then we should remove it. Adding things back is not a problem.

@SparkQA
Copy link

SparkQA commented Mar 6, 2018

Test build #87984 has finished for PR 20706 at commit 427a977.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 7, 2018

Any more comments?

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master.

@asfgit asfgit closed this in c99fc9a Mar 7, 2018
@vanzin vanzin deleted the SPARK-23550 branch March 12, 2018 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants