-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-4764] Ensure that files are fetched atomically #2855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tempFile is created in the same directory than targetFile, so that the move from tempFile to targetFile is always atomic
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
add to whitelist |
QA tests have started for PR 2855 at commit
|
QA tests have finished for PR 2855 at commit
|
Test PASSed. |
Test build #22815 has started for PR 2855 at commit
|
tempFile is created in the same directory than targetFile, so that the move from tempFile to targetFile is always atomic
Test build #22816 has started for PR 2855 at commit
|
I've just resync my local fork so that the patch merges cleanly (i.e. there are no other modifications compared to the initial patch) |
Test build #22815 has finished for PR 2855 at commit
|
Test PASSed. |
Test build #22816 has finished for PR 2855 at commit
|
Test PASSed. |
Hey @preaudc can you explain how this fixes SPARK-3967? |
Also, how does this relate to #2848? Can you and @ryan-williams sort out which PR makes more sense? |
#2484 seems beneficial no matter what; why would we ever copy a JAR we've just confirmed is identical to the thing already at the destination? |
I don't think these two PRs are necessarily that related though, other than each having been borne as a workaround to the same stack-trace. There may be a common root cause but I don't know, so no need to consider them as coupled or mutually exclusive. |
As @ryan-williams pointed out, this is initially only a workaround to SPARK-3967. |
Test build #23617 has started for PR 2855 at commit
|
Test build #23617 has finished for PR 2855 at commit
|
Test FAILed. |
I have no clue what could bring this MiMa test failure, and how to fix it. Can anybody give me a hand? |
Jenkins, retest this please. |
Test build #23968 has started for PR 2855 at commit
|
Test build #23968 has finished for PR 2855 at commit
|
Test PASSed. |
This looks good to me, since I think that there's no reason to download the temporary file to a different directory than its intended destination. I'd like to be able to merge this independently of #2848, so do you mind creating a new JIRA that describes only this change rather than SPARK-3967 and updating this PR's title to reference that JIRA? |
Thanks for the review, @JoshRosen, I've created a new JIRA as requested. |
Thanks for creating the new JIRA. This looks good to me, so I'm going to merge it into |
tempFile is created in the same directory than targetFile, so that the move from tempFile to targetFile is always atomic Author: Christophe Préaud <christophe.preaud@kelkoo.com> Closes #2855 from preaudc/master and squashes the following commits: 9ba89ca [Christophe Préaud] Ensure that files are fetched atomically 54419ae [Christophe Préaud] Merge remote-tracking branch 'upstream/master' c6a5590 [Christophe Préaud] Revert commit 8ea871f 7456a33 [Christophe Préaud] Merge remote-tracking branch 'upstream/master' 8ea871f [Christophe Préaud] Ensure that files are fetched atomically (cherry picked from commit ab2abcb) Signed-off-by: Josh Rosen <rosenville@gmail.com> Conflicts: core/src/main/scala/org/apache/spark/util/Utils.scala
tempFile is created in the same directory than targetFile, so that the move from tempFile to targetFile is always atomic Author: Christophe Préaud <christophe.preaud@kelkoo.com> Closes #2855 from preaudc/master and squashes the following commits: 9ba89ca [Christophe Préaud] Ensure that files are fetched atomically 54419ae [Christophe Préaud] Merge remote-tracking branch 'upstream/master' c6a5590 [Christophe Préaud] Revert commit 8ea871f 7456a33 [Christophe Préaud] Merge remote-tracking branch 'upstream/master' 8ea871f [Christophe Préaud] Ensure that files are fetched atomically (cherry picked from commit ab2abcb) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
I've finished backporting this to |
@@ -425,8 +425,7 @@ private[spark] object Utils extends Logging { | |||
conf: SparkConf, | |||
securityMgr: SecurityManager, | |||
hadoopConf: Configuration) { | |||
val tempDir = getLocalDir(conf) | |||
val tempFile = File.createTempFile("fetchFileTemp", null, new File(tempDir)) | |||
val tempFile = File.createTempFile("fetchFileTemp", null, new File(targetDir.getAbsolutePath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need to create a new File here: new File(targetDir.getAbsolutePath)
? Can't we use targetDir
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we probably can just use targetDir
directly; I guess I overlooked this when reviewing the patch.
Yes, my bad, |
@preaudc We can't re-open this PR, so you'd have to create a new one. This is a really minor issue, though, so I'm not in a huge rush to fix it (it's more of a code-cleanliness issue than a correctness issue). |
tempFile is created in the same directory than targetFile, so that the
move from tempFile to targetFile is always atomic