Skip to content

Commit

Permalink
added cartesian example...
Browse files Browse the repository at this point in the history
  • Loading branch information
pyspark-in-action committed May 8, 2015
1 parent 8145070 commit 0b8f614
Showing 1 changed file with 35 additions and 0 deletions.
35 changes: 35 additions & 0 deletions cartesian/cartesian.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# ./pyspark
Python 2.6.9 (unknown, Sep 9 2014, 15:05:12)
...
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.3.0
/_/

Using Python version 2.6.9 (unknown, Sep 9 2014 15:05:12)
SparkContext available as sc, SQLContext available as sqlCtx.
>>> a = [('k1','v1'), ('k2', 'v2')]
>>> a
[('k1', 'v1'), ('k2', 'v2')]
>>> b = [('k3','v3'), ('k4', 'v4'), ('k5', 'v5') ]
>>> b
[('k3', 'v3'), ('k4', 'v4'), ('k5', 'v5')]
>>> rdd1= sc.parallelize(a)
>>> rdd1.collect()
[('k1', 'v1'), ('k2', 'v2')]
>>> rdd2= sc.parallelize(b)
>>> rdd2.collect()
[('k3', 'v3'), ('k4', 'v4'), ('k5', 'v5')]
>>> rdd3 = rdd1.cartesian(rdd2)
>>> rdd3.collect()
[
(('k1', 'v1'), ('k3', 'v3')),
(('k1', 'v1'), ('k4', 'v4')),
(('k1', 'v1'), ('k5', 'v5')),
(('k2', 'v2'), ('k3', 'v3')),
(('k2', 'v2'), ('k4', 'v4')),
(('k2', 'v2'), ('k5', 'v5'))
]
>>>

0 comments on commit 0b8f614

Please sign in to comment.