Skip to content

Commit 4db1e19

Browse files
gengliangwangmccheah
authored andcommitted
[SPARK-26673][FOLLOWUP][SQL] File Source V2: check existence of output path before delete it
## What changes were proposed in this pull request? This is a followup PR to resolve comment: apache#23601 (review) When Spark writes DataFrame with "overwrite" mode, it deletes the output path before actual writes. To safely handle the case that the output path doesn't exist, it is suggested to follow the V1 code by checking the existence. ## How was this patch tested? Apply apache#23836 and run unit tests Closes apache#23889 from gengliangwang/checkFileBeforeOverwrite. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
1 parent 85d0f08 commit 4db1e19

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriteBuilder.scala

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
*/
1717
package org.apache.spark.sql.execution.datasources.v2
1818

19+
import java.io.IOException
1920
import java.util.UUID
2021

2122
import scala.collection.JavaConverters._
@@ -83,7 +84,9 @@ abstract class FileWriteBuilder(options: DataSourceOptions)
8384
null
8485

8586
case SaveMode.Overwrite =>
86-
committer.deleteWithJob(fs, path, true)
87+
if (fs.exists(path) && !committer.deleteWithJob(fs, path, true)) {
88+
throw new IOException(s"Unable to clear directory $path prior to writing to it")
89+
}
8790
committer.setupJob(job)
8891
new FileBatchWrite(job, description, committer)
8992

0 commit comments

Comments
 (0)