Skip to content

Commit b268455

Browse files
srowenAndrew Or
authored andcommitted
[SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles
Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' Author: Sean Owen <sowen@cloudera.com> Closes apache#7036 from srowen/SPARK-8437 and squashes the following commits: 0e813ae [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (cherry picked from commit 5d30eae) Signed-off-by: Andrew Or <andrew@databricks.com>
1 parent cdfa388 commit b268455

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

core/src/main/scala/org/apache/spark/SparkContext.scala

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -824,6 +824,8 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
824824
* }}}
825825
*
826826
* @note Small files are preferred, large file is also allowable, but may cause bad performance.
827+
* @note On some filesystems, `.../path/*` can be a more efficient way to read all files in a directory
828+
* rather than `.../path/` or `.../path`
827829
*
828830
* @param minPartitions A suggestion value of the minimal splitting number for input data.
829831
*/
@@ -871,9 +873,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
871873
* (a-hdfs-path/part-nnnnn, its content)
872874
* }}}
873875
*
874-
* @param minPartitions A suggestion value of the minimal splitting number for input data.
875-
*
876876
* @note Small files are preferred; very large files may cause bad performance.
877+
* @note On some filesystems, `.../path/*` can be a more efficient way to read all files in a directory
878+
* rather than `.../path/` or `.../path`
879+
*
880+
* @param minPartitions A suggestion value of the minimal splitting number for input data.
877881
*/
878882
@Experimental
879883
def binaryFiles(

0 commit comments

Comments
 (0)