[SPARK-3519] add distinct(n) to PySpark #2383

mattf · 2014-09-13T21:24:53Z

Added missing rdd.distinct(numPartitions) and associated tests

SparkQA · 2014-09-13T21:29:29Z

QA tests have started for PR 2383 at commit fcfc05e.

This patch merges cleanly.

SparkQA · 2014-09-13T21:30:33Z

QA tests have finished for PR 2383 at commit fcfc05e.

This patch fails unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class JavaSparkContext(val sc: SparkContext)
- throw new IllegalStateException("The main method in the given main class must be static")
- class TaskCompletionListenerException(errorMessages: Seq[String]) extends Exception
- class Dummy(object):
- class JavaStreamingContext(val ssc: StreamingContext) extends Closeable

JoshRosen · 2014-09-13T22:44:13Z

python/pyspark/tests.py

@@ -586,6 +586,17 @@ def test_repartitionAndSortWithinPartitions(self):
        self.assertEquals(partitions[0], [(0, 5), (0, 8), (2, 6)])
        self.assertEquals(partitions[1], [(1, 3), (3, 8), (3, 8)])

+    def test_distinct(self):
+        rdd = self.sc.parallelize((1,2,3)*10).distinct()


Jenkins failed because the Python style checks didn't pass; the problem is that PEP8 requires whitespace after commas and space around operators like *.

thanks, i forgot to run lint. i've updated the patch...

$ ./dev/lint-python
PEP 8 checks passed.

Added missing rdd.distinct(numPartitions) and associated tests

SparkQA · 2014-09-14T00:09:28Z

QA tests have started for PR 2383 at commit 7a17f2b.

This patch merges cleanly.

SparkQA · 2014-09-14T01:14:20Z

QA tests have finished for PR 2383 at commit 7a17f2b.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class RatingDeserializer(FramedSerializer):
- class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T]
- class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T]
- class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T]
- class Encoder extends compression.Encoder[IntegerType.type]
- class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])
- class Encoder extends compression.Encoder[LongType.type]
- class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[LongType.type])

davies · 2014-09-15T03:59:41Z

python/pyspark/tests.py

+        rdd = self.sc.parallelize((1, 2, 3)*10).distinct()
+        self.assertEquals(rdd.count(), 3)
+
+    def test_distinct_numPartitions(self):


It's better to put them into single test case, because each test cases will create a new jvm, which has some overhead. We should keep the number of test cases not increase too much.

can i have a pass? it looks like the python tests could use some attention during the test speed increase effort, but it'd rather wait for a big speedup recommendation before altering these cases.

though, if this is important to you, i'll do it

BTW, this two cases are about distinct(), just for different cases, it's also better to put them together. I will really appreciate if you could do it.

i'll do that now.

SparkQA · 2014-09-15T14:04:19Z

QA tests have started for PR 2383 at commit 6bc4a2c.

This patch merges cleanly.

SparkQA · 2014-09-15T15:10:14Z

QA tests have finished for PR 2383 at commit 6bc4a2c.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2014-09-15T17:49:11Z

QA tests have started for PR 2383 at commit 30b837a.

This patch merges cleanly.

SparkQA · 2014-09-15T18:55:41Z

QA tests have finished for PR 2383 at commit 30b837a.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2014-09-16T18:39:22Z

This looks good to me, so I'm going to merge it. Thanks!

JoshRosen reviewed Sep 13, 2014
View reviewed changes

[SPARK-3519] add distinct(n) to PySpark

7a17f2b

Added missing rdd.distinct(numPartitions) and associated tests

mattf force-pushed the SPARK-3519 branch from fcfc05e to 7a17f2b Compare September 14, 2014 00:05

davies reviewed Sep 15, 2014
View reviewed changes

[SPARK-3519] add distinct(n) to SchemaRDD in PySpark

6bc4a2c

Combine test cases to save on JVM startups

30b837a

asfgit closed this in 9d5fa76 Sep 16, 2014

mattf deleted the SPARK-3519 branch September 16, 2014 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-3519] add distinct(n) to PySpark #2383

[SPARK-3519] add distinct(n) to PySpark #2383

Uh oh!

mattf commented Sep 13, 2014

Uh oh!

SparkQA commented Sep 13, 2014

Uh oh!

SparkQA commented Sep 13, 2014

Uh oh!

JoshRosen Sep 13, 2014

Uh oh!

mattf Sep 14, 2014

Uh oh!

SparkQA commented Sep 14, 2014

Uh oh!

SparkQA commented Sep 14, 2014

Uh oh!

davies Sep 15, 2014

Uh oh!

mattf Sep 15, 2014

Uh oh!

davies Sep 15, 2014

Uh oh!

mattf Sep 15, 2014

Uh oh!

SparkQA commented Sep 15, 2014

Uh oh!

SparkQA commented Sep 15, 2014

Uh oh!

SparkQA commented Sep 15, 2014

Uh oh!

SparkQA commented Sep 15, 2014

Uh oh!

JoshRosen commented Sep 16, 2014

Uh oh!

Uh oh!

[SPARK-3519] add distinct(n) to PySpark #2383

[SPARK-3519] add distinct(n) to PySpark #2383

Uh oh!

Conversation

mattf commented Sep 13, 2014

Uh oh!

SparkQA commented Sep 13, 2014

Uh oh!

SparkQA commented Sep 13, 2014

Uh oh!

JoshRosen Sep 13, 2014

Choose a reason for hiding this comment

Uh oh!

mattf Sep 14, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 14, 2014

Uh oh!

SparkQA commented Sep 14, 2014

Uh oh!

davies Sep 15, 2014

Choose a reason for hiding this comment

Uh oh!

mattf Sep 15, 2014

Choose a reason for hiding this comment

Uh oh!

davies Sep 15, 2014

Choose a reason for hiding this comment

Uh oh!

mattf Sep 15, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 15, 2014

Uh oh!

SparkQA commented Sep 15, 2014

Uh oh!

SparkQA commented Sep 15, 2014

Uh oh!

SparkQA commented Sep 15, 2014

Uh oh!

JoshRosen commented Sep 16, 2014

Uh oh!

Uh oh!