Skip to content

Commit

Permalink
usage of split() function
Browse files Browse the repository at this point in the history
  • Loading branch information
pyspark-in-action committed Apr 13, 2016
1 parent fcd6f4e commit e4855b8
Showing 1 changed file with 33 additions and 0 deletions.
33 changes: 33 additions & 0 deletions tutorial/split-function/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
How To Use Split Function
=========================

````
# ./bin/pyspark
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Python version 2.7.10 (default, Oct 23 2015 19:19:21)
SparkContext available as sc, HiveContext available as sqlContext.
>>> data = ["abc,de", "abc,de,ze", "abc,de,ze,pe"]
>>> data
['abc,de', 'abc,de,ze', 'abc,de,ze,pe']
>>> rdd = sc.parallelize(data)
>>> rdd.collect()
['abc,de', 'abc,de,ze', 'abc,de,ze,pe']
>>> rdd.count()
3
>>> rdd2 = rdd.flatMap(lambda x : x.split(","))
>>> rdd2.collect()
['abc', 'de', 'abc', 'de', 'ze', 'abc', 'de', 'ze', 'pe']
>>> rdd2.count()
9
````

0 comments on commit e4855b8

Please sign in to comment.