-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-6144]When in cluster mode using ADD JAR with a hdfs:// sourced ja... #4881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
Jenkins, this is ok to test. By the way in the future it would be good to open this against the master branch. In this case it's fine because the 1.3 branch hasn't diverged that much yet. |
Test build #625 has started for PR 4881 at commit
|
if (!targetDir.mkdir()) { | ||
fileOverwrite: Boolean, | ||
filename: Option[String] = None): Unit = { | ||
if (!targetDir.exists() && !targetDir.mkdir()) { | ||
throw new IOException(s"Failed to create directory ${targetDir.getPath}") | ||
} | ||
fs.listStatus(path).foreach { fileStatus => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To revive the discussion of the previows PR...
Both the problems @andrewor14 and I mentioned exist because of this line. listStatus
behaves differently for files and directories, so when path
is a directory here you'll get both unwanted behaviors:
- children will be copied to
targetDir
instead of$targetDir/${filename.getOrElse(path.getName())}
- the code will try to copy all children to the target directory with the same
filename
in the first call fromdoFetchFile
So this code needs to behave differently depending on whether path
is a directory or not. The code Andrew posted here is pretty close to what needs to be done.
Test build #625 has finished for PR 4881 at commit
|
So to confirm, i think this function needs to be able to handle 5 states: Path is a dir which has subdirs Path is a file you want to copy with the same name Path is a file which you want to copy with a different name Path is a dir which contains multiple files that you want to copy with the same names Path is a dir which contains multiple files and subdirs you want to copy Anything I have missed? I have some code in testing now that achieves this ill post it when i've run it on my cluster. |
I tested with this version of
|
@trystanleftwich I believe that's correct. To summarize:
The code snippet @vanzin posted should fix both cases. |
Ok, i've pushed my changes, I've added tests that should cover all the states, I was getting errors with @vanzin code snippet, if you pass in dir i.e path = hdfs://foo and it contains a foo.jar i.e hdfs://foo/foo.jar, The code will create a directory local_dir/foo/foo.jar and not a file local_dir/foo/foo.jar unless im mistaken. |
Hmm. Let me check that. |
Hi @trystanleftwich , just tested my code again with more strict checks, and files show up as files, directories show up as directories. |
Leaving a link to an alternate fix in #4894 |
Utils.fetchHcfsFile(path, targetDir, fs, new SparkConf(), conf, false) | ||
assert(targetDir.exists()) | ||
assert(targetDir.isDirectory()) | ||
// Testing to make sure it doesn't error if the dir already exists | ||
Utils.fetchHcfsFile(path, targetDir, fs, new SparkConf(), conf, false) | ||
val newTempFile = new File(targetDir, tempFile3.getName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is where our patches diverge a bit.
The behavior I expect when I upload a directory "foo", is that there will be a directory called "foo" inside the target directory, so that when I do SparkFiles.get("foo")
I get the location of that directory.
Your patch seems to place the contents of "foo" under the target dir, which is not what I'd expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fair enough misunderstanding on my behalf, I've tested your patch and it works in my case, which I expected. I'm happy to close this as likewise I just want this solved for the next release.
Let's close this PR in favor of #4894, which I just merged. Thanks for reporting this blocker. |
...r will fail