SPARK-1509: add zipWithIndex zipWithUniqueId methods to java api #423

witgo · 2014-04-16T05:00:54Z

No description provided.

AmplabJenkins · 2014-04-16T05:03:11Z

Can one of the admins verify this patch?

mengxr · 2014-04-22T19:12:45Z

core/src/test/java/org/apache/spark/JavaAPISuite.java

+  @Test
+  public void zipWithUniqueId() {
+    List<Integer> correct = Arrays.asList(1, 2, 3, 4);
+    JavaPairRDD<Integer, Long> zip = sc.parallelize(correct).zipWithUniqueId();


Should test with more than one partitions.

…Index

mengxr · 2014-04-23T20:02:48Z

core/src/test/java/org/apache/spark/JavaAPISuite.java

+    List<Integer> dataArray = Arrays.asList(1, 2, 3, 4);
+    JavaPairRDD<Integer, Long> zip = sc.parallelize(dataArray).zipWithIndex();
+    JavaRDD<Long> indexes = zip.values();
+    HashSet<Long> correctIndexes = new HashSet<Long>(Arrays.asList(0l, 1l, 2l, 3l));


You should use a list or an Array instead of a set here, because you want to assert on the exact order.

Also, use L instead of l.

…Index

mengxr · 2014-04-24T05:48:03Z

Jenkins, test this please.

AmplabJenkins · 2014-04-24T05:52:56Z

Merged build triggered.

AmplabJenkins · 2014-04-24T05:53:06Z

Merged build started.

AmplabJenkins · 2014-04-24T06:31:44Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-24T06:31:44Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14428/

mengxr · 2014-04-29T05:28:36Z

core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala

+   * 2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method
+   * won't trigger a spark job, which is different from [[org.apache.spark.rdd.RDD#zipWithIndex]].
+   */
+  def zipWithUniqueId[Long](): JavaPairRDD[T, Long] = {


Just saw this. Why do you need [Long] here?

When remove the [Long]. The type of return value is JavaPairRDD<T,Object>

def zipWithUniqueId(): JavaPairRDD[T, Long]

would return JavaPairRDD<T, Object>?

Yes,in my test

@mengxr already found this out - but the reason is you'd want to declare the type as java.lang.Double instead of Long.

basically what you created here is a type parameter named "Long" (surprisingly not a keyword in Scala), and you got the compiler to infer the type when you were calling it from Java.

Try:

def zipWithUniqueId(): JavaPairRDD[T, java.lang.Long] = { JavaPairRDD.fromRDD(rdd.zipWithUniqueId().map(x => (x._1, new java.lang.Long(x._2))))

def zipWithUniqueId(): JavaPairRDD[T, JLong] = { JavaPairRDD.fromRDD(rdd.zipWithUniqueId()).asInstanceOf[JavaPairRDD[T, JLong]] }

is better?

let's just put java.lang.Long. It is not that "long" anyway.

@rxin You're right, has been modified.

@mengxr

def zipWithUniqueId(): JavaPairRDD[T, java.lang.Long] = { JavaPairRDD.fromRDD(rdd.zipWithUniqueId().map(x => (x._1, new java.lang.Long(x._2)))

create too many objects.

…Index

mengxr · 2014-04-29T07:38:51Z

LGTM if Jenkins is happy.

mengxr · 2014-04-29T07:41:21Z

Jenkins, test this please.

AmplabJenkins · 2014-04-29T07:42:57Z

Merged build triggered.

AmplabJenkins · 2014-04-29T07:43:07Z

Merged build started.

AmplabJenkins · 2014-04-29T08:17:42Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-29T08:17:42Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14561/

rxin · 2014-04-29T18:30:52Z

Thanks. I've merged this.

Author: witgo <witgo@qq.com> Closes #423 from witgo/zipWithIndex and squashes the following commits: 039ec04 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 24d74c9 [witgo] review commit 763a5e4 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 59747d1 [witgo] review commit 7bf4d06 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex daa8f84 [witgo] review commit 4070613 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 18e6c97 [witgo] java api zipWithIndex test 11e2e7f [witgo] add zipWithIndex zipWithUniqueId methods to java api (cherry picked from commit 7d15058) Signed-off-by: Reynold Xin <rxin@apache.org>

Improving the graphx-programming-guide This PR will track a few minor improvements to the content and formatting of the graphx-programming-guide.

Author: witgo <witgo@qq.com> Closes apache#423 from witgo/zipWithIndex and squashes the following commits: 039ec04 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 24d74c9 [witgo] review commit 763a5e4 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 59747d1 [witgo] review commit 7bf4d06 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex daa8f84 [witgo] review commit 4070613 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 18e6c97 [witgo] java api zipWithIndex test 11e2e7f [witgo] add zipWithIndex zipWithUniqueId methods to java api

Improving the graphx-programming-guide This PR will track a few minor improvements to the content and formatting of the graphx-programming-guide. (cherry picked from commit 3fcc68b) Signed-off-by: Reynold Xin <rxin@apache.org>

…tion (apache#423)

…lish-fix Ensure bintray upload happens before repository is no clean.

add zipWithIndex zipWithUniqueId methods to java api

11e2e7f

witgo changed the title ~~SPARK-1509: add zipWithIndex zipWithUniqueId methods to java api~~ [WIP]SPARK-1509: add zipWithIndex zipWithUniqueId methods to java api Apr 16, 2014

java api zipWithIndex test

18e6c97

witgo changed the title ~~[WIP]SPARK-1509: add zipWithIndex zipWithUniqueId methods to java api~~ SPARK-1509: add zipWithIndex zipWithUniqueId methods to java api Apr 17, 2014

mengxr reviewed Apr 22, 2014
View reviewed changes

witgo added 3 commits April 23, 2014 09:48

Merge branch 'master' of https://github.com/apache/spark into zipWith…

4070613

…Index

review commit

daa8f84

Merge branch 'master' of https://github.com/apache/spark into zipWith…

7bf4d06

…Index

mengxr reviewed Apr 23, 2014
View reviewed changes

witgo added 2 commits April 24, 2014 10:16

review commit

59747d1

Merge branch 'master' of https://github.com/apache/spark into zipWith…

763a5e4

…Index

mengxr reviewed Apr 29, 2014
View reviewed changes

witgo added 2 commits April 29, 2014 15:26

review commit

24d74c9

Merge branch 'master' of https://github.com/apache/spark into zipWith…

039ec04

…Index

asfgit closed this in 7d15058 Apr 29, 2014

witgo deleted the zipWithIndex branch April 30, 2014 01:37

markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 7, 2017

Updated devloper doc to include a install step for first time compila…

58cebd1

…tion (apache#423)

mccheah added a commit to mccheah/spark that referenced this pull request Nov 28, 2018

Merge pull request apache#423 from palantir/bintray-docker-plugin-pub…

141b82e

…lish-fix Ensure bintray upload happens before repository is no clean.

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Update the huaweicloud account password (apache#423)

fdcb120

maropu mentioned this pull request Sep 8, 2020

[SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans #29585

Closed

SPARK-1509: add zipWithIndex zipWithUniqueId methods to java api #423

SPARK-1509: add zipWithIndex zipWithUniqueId methods to java api #423

Uh oh!

Conversation

witgo commented Apr 16, 2014

Uh oh!

AmplabJenkins commented Apr 16, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mengxr commented Apr 24, 2014

Uh oh!

AmplabJenkins commented Apr 24, 2014

Uh oh!

AmplabJenkins commented Apr 24, 2014

Uh oh!

AmplabJenkins commented Apr 24, 2014

Uh oh!

AmplabJenkins commented Apr 24, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mengxr commented Apr 29, 2014

Uh oh!

mengxr commented Apr 29, 2014

Uh oh!

AmplabJenkins commented Apr 29, 2014

Uh oh!

AmplabJenkins commented Apr 29, 2014

Uh oh!

AmplabJenkins commented Apr 29, 2014

Uh oh!

AmplabJenkins commented Apr 29, 2014

Uh oh!

rxin commented Apr 29, 2014

Uh oh!

Uh oh!