Skip to content

Commit c64bd2e

Browse files
committed
Add read local file option
1 parent 6fb3145 commit c64bd2e

File tree

1 file changed

+19
-1
lines changed

1 file changed

+19
-1
lines changed

L06-Data-Processing-with-Spark/L06.1-Structured-Data-Processing-with-Spark.ipynb

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -390,7 +390,25 @@
390390
}
391391
],
392392
"source": [
393-
"sdf = spark.read.csv(\"data/people.txt\") \n",
393+
"sdf = spark.read.csv(\"data/people.txt\") # read hdfs file \n",
394+
"sdf.show() # Displays the content of the DataFrame to stdout"
395+
]
396+
},
397+
{
398+
"cell_type": "code",
399+
"execution_count": null,
400+
"metadata": {
401+
"slideshow": {
402+
"slide_type": "fragment"
403+
}
404+
},
405+
"outputs": [],
406+
"source": [
407+
"# If your data are available locally, you could explicitly specify with \"file://\"\n",
408+
"# But for this to work, the copy of the file needs to be on every worker or \n",
409+
"# every worker need to have access to common shared drive as in a NFS mount.\n",
410+
"\n",
411+
"sdf = spark.read.csv(\"file:///home/fli/data/people.txt\") # read hdfs file \n",
394412
"sdf.show() # Displays the content of the DataFrame to stdout"
395413
]
396414
},

0 commit comments

Comments
 (0)