Skip to content

Commit

Permalink
usage of split() function
Browse files Browse the repository at this point in the history
  • Loading branch information
pyspark-in-action committed Apr 13, 2016
1 parent e4855b8 commit 75f1313
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions tutorial/split-function/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
How To Use Split Function
=========================

* Example-1: Split ````RDD<String>```` into Tokens

````
# ./bin/pyspark
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
Expand Down Expand Up @@ -31,3 +33,19 @@ SparkContext available as sc, HiveContext available as sqlContext.
>>> rdd2.count()
9
````

* Example-2: Create Key-Value Pairs

````
>>> data2 = ["abc,de", "xyz,deeee,ze", "abc,de,ze,pe", "xyz,bababa"]
>>> data2
['abc,de', 'xyz,deeee,ze', 'abc,de,ze,pe', 'xyz,bababa']
>>> rdd4 = sc.parallelize(data2)
>>> rdd4.collect()
['abc,de', 'xyz,deeee,ze', 'abc,de,ze,pe', 'xyz,bababa']
>>> rdd5 = rdd4.map(lambda x : (x.split(",")[0], x.split(",")[1]))
>>> rdd5.collect()
[('abc', 'de'), ('xyz', 'deeee'), ('abc', 'de'), ('xyz', 'bababa')]
````

0 comments on commit 75f1313

Please sign in to comment.