-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-23550][core] Cleanup Utils
.
#20706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A few different things going on: - Remove unused methods. - Move JSON methods to the only class that uses them. - Move test-only methods to TestUtils. - Make getMaxResultSize() a config constant. - Reuse functionality from existing libraries (JRE or JavaUtils) where possible. There is a slight change in the behavior of "createDirectory()": it does not create a full directory tree anymore, just the directory being requested. That's better since the code would only set up proper permissions for the leaf directory, and this change requires its parent to already exist.
dir = null | ||
} | ||
} catch { case e: SecurityException => dir = null; } | ||
val prefix = namePrefix + "-" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was there a reason you rewriting this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was written when Spark still supported Java 1.6, and java.nio.file
did not exist.
val prefix = namePrefix + "-" | ||
val dir = if (root != null) { | ||
val rootDir = new File(root) | ||
require(rootDir.isDirectory(), s"Root path $root must be an existing directory.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the nio api's? e.g.:
val rootDir = Paths.get(root)
require(Files.isDirectory(rootDir), s"Root path $root must be an existing directory.")
Files.createTempDirectory(rootDir, prefix)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing in the end, no? This method needs to return File
regardless of what it does internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is also an argument for not changing this method at all. Just use File.createTempFile(...)
and be done with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for a directory, not file. File.createTempFile
does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I missed that sorry. However my point still stands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what that point really is, but if you feel so strongly about using the nio APIs, that's easy enough.
The behavior changes have to be separated from the code refactoring. |
The behavior change only affects tests, which have been fixed in this PR. |
Unfortunately, I do not know whether this change only affects the test cases. Any behavior change require a conf. We can get rid of it later if nobody hit the issue. |
I do, because I checked. The tests were using the API incorrectly, actually, and could leave garbage outside the temp dirs, polluting the source tree. That wrong usage pattern was actually the thing that made tests be affected by the change. No code outside of tests uses the API in that way. And adding a conf would kinda defeat the purpose of using the existing JRE API, since it would require maintaining the old code. In summary, not all behavior changes are equal, especially if you're talking about an internal API. That being said, I'm going to run some tests with YARN+Kerneros+ShuffleService to make sure things are the same - I think I may need to adjust something in |
Ok, I'm going to revert the I'll keep the test changes since they're correct and fix a potential issue with garbage being left around the working dir. |
Test build #87850 has finished for PR 20706 at commit
|
Test build #87853 has finished for PR 20706 at commit
|
Test build #87855 has finished for PR 20706 at commit
|
I'm almost sure there's a bug somewhere for that flaky test, but if there isn't I'll file one. |
retest this please |
Test build #87863 has finished for PR 20706 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at some of the changes that move methods out of Utils
, they seems valid till now, but I'm not sure whether these are good move because I'm afraid we may have to use them somewhere else in the future and then we have to move them back to Utils
. I really feel we shall be conservative to make these changes.
/** | ||
* Lists files recursively. | ||
*/ | ||
def recursiveList(f: File): Array[File] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is only used by test suites for now, but can you still keep this in Utils
so we don't have to move it back when we want to use it in the production code (I assume this may happen someday).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree that we should keep things in production code "just in case".
This is also an example of a function that if used without care is dangerous. Files.walk
is a much better way of achieving this.
@@ -1876,17 +1822,6 @@ private[spark] object Utils extends Logging { | |||
obj.getClass.getSimpleName.replace("$", "") | |||
} | |||
|
|||
/** Return an option that translates JNothing to None */ | |||
def jsonOption(json: JValue): Option[JValue] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used in JSONProtocol
for now, but this is also a useful JSON method that could apply to some other JSON operations in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be using json4s for new things, so it's better to have these hidden away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason we should not use json4s for new things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It encourages writing manual serialization and deserialization code, which is error prone.
@jiangxb1987 keeping dead code around will inevitably lead to code rot. If there is not use for something then we should remove it. Adding things back is not a problem. |
Test build #87984 has finished for PR 20706 at commit
|
Any more comments? |
LGTM Thanks! Merged to master. |
A few different things going on:
The change also includes changes to a few tests to call
Utils.createTempFile
correctly,so that temp dirs are created under the designated top-level temp dir instead of
potentially polluting git index.