Skip to content

Commit d19efed

Browse files
vlyubinaarondav
authored andcommitted
[SPARK-6330] Fix filesystem bug in newParquet relation
If I'm running this locally and my path points to S3, this would currently error out because of incorrect FS. I tested this in a scenario that previously didn't work, this change seemed to fix the issue. Author: Volodymyr Lyubinets <vlyubin@gmail.com> Closes #5020 from vlyubin/parquertbug and squashes the following commits: a645ad5 [Volodymyr Lyubinets] Fix filesystem bug in newParquet relation
1 parent 12a345a commit d19efed

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ package org.apache.spark.sql.parquet
1919
import java.io.IOException
2020
import java.lang.{Double => JDouble, Float => JFloat, Long => JLong}
2121
import java.math.{BigDecimal => JBigDecimal}
22+
import java.net.URI
2223
import java.text.SimpleDateFormat
2324
import java.util.{Date, List => JList}
2425

@@ -244,11 +245,10 @@ private[sql] case class ParquetRelation2(
244245
* Refreshes `FileStatus`es, footers, partition spec, and table schema.
245246
*/
246247
def refresh(): Unit = {
247-
val fs = FileSystem.get(sparkContext.hadoopConfiguration)
248-
249248
// Support either reading a collection of raw Parquet part-files, or a collection of folders
250249
// containing Parquet files (e.g. partitioned Parquet table).
251250
val baseStatuses = paths.distinct.map { p =>
251+
val fs = FileSystem.get(URI.create(p), sparkContext.hadoopConfiguration)
252252
val qualified = fs.makeQualified(new Path(p))
253253

254254
if (!fs.exists(qualified) && maybeSchema.isDefined) {
@@ -262,6 +262,7 @@ private[sql] case class ParquetRelation2(
262262

263263
// Lists `FileStatus`es of all leaf nodes (files) under all base directories.
264264
val leaves = baseStatuses.flatMap { f =>
265+
val fs = FileSystem.get(f.getPath.toUri, sparkContext.hadoopConfiguration)
265266
SparkHadoopUtil.get.listLeafStatuses(fs, f.getPath).filter { f =>
266267
isSummaryFile(f.getPath) ||
267268
!(f.getPath.getName.startsWith("_") || f.getPath.getName.startsWith("."))

0 commit comments

Comments
 (0)