[SPARK-17641][SQL] Collect_list/Collect_set should not collect null values. #15208

hvanhovell · 2016-09-23T01:48:59Z

What changes were proposed in this pull request?

We added native versions of collect_set and collect_list in Spark 2.0. These currently also (try to) collect null values, this is different from the original Hive implementation. This PR fixes this by adding a null check to the Collect.update method.

How was this patch tested?

Added a regression test to DataFrameAggregateSuite.

hvanhovell · 2016-09-23T01:49:05Z

cc @mengxr

SparkQA · 2016-09-23T03:53:42Z

Test build #65806 has finished for PR 15208 at commit 37c4539.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sameeragarwal

LGTM, pending jenkins

sameeragarwal · 2016-09-27T18:09:01Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala

@@ -65,7 +65,10 @@ abstract class Collect extends ImperativeAggregate {
  }

  override def update(b: MutableRow, input: InternalRow): Unit = {
-    buffer += child.eval(input)
+    val value = child.eval(input)
+    if (value != null) {


It'd be great to add a comment here that this mimics the hive semantics

SparkQA · 2016-09-27T20:31:50Z

Test build #65992 has finished for PR 15208 at commit 80b2166.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-09-27T20:35:00Z

retest this please

SparkQA · 2016-09-27T23:02:36Z

Test build #66000 has finished for PR 15208 at commit 80b2166.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-09-28T23:24:31Z

Merging in master/2.0.

…alues. ## What changes were proposed in this pull request? We added native versions of `collect_set` and `collect_list` in Spark 2.0. These currently also (try to) collect null values, this is different from the original Hive implementation. This PR fixes this by adding a null check to the `Collect.update` method. ## How was this patch tested? Added a regression test to `DataFrameAggregateSuite`. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #15208 from hvanhovell/SPARK-17641. (cherry picked from commit 7d09232) Signed-off-by: Reynold Xin <rxin@databricks.com>

Do not collect null values.

37c4539

sameeragarwal approved these changes Sep 27, 2016

View reviewed changes

Add comment on filtering null semantic

80b2166

asfgit closed this in 7d09232 Sep 28, 2016

peter-toth mentioned this pull request Jun 21, 2020

[SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse #28885

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-17641][SQL] Collect_list/Collect_set should not collect null values. #15208

[SPARK-17641][SQL] Collect_list/Collect_set should not collect null values. #15208

Uh oh!

hvanhovell commented Sep 23, 2016

Uh oh!

hvanhovell commented Sep 23, 2016

Uh oh!

SparkQA commented Sep 23, 2016

Uh oh!

sameeragarwal left a comment

Uh oh!

sameeragarwal Sep 27, 2016

Uh oh!

hvanhovell Sep 27, 2016

Uh oh!

SparkQA commented Sep 27, 2016

Uh oh!

hvanhovell commented Sep 27, 2016

Uh oh!

SparkQA commented Sep 27, 2016

Uh oh!

rxin commented Sep 28, 2016

Uh oh!

Uh oh!

[SPARK-17641][SQL] Collect_list/Collect_set should not collect null values. #15208

[SPARK-17641][SQL] Collect_list/Collect_set should not collect null values. #15208

Uh oh!

Conversation

hvanhovell commented Sep 23, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

hvanhovell commented Sep 23, 2016

Uh oh!

SparkQA commented Sep 23, 2016

Uh oh!

sameeragarwal left a comment

Choose a reason for hiding this comment

Uh oh!

sameeragarwal Sep 27, 2016

Choose a reason for hiding this comment

Uh oh!

hvanhovell Sep 27, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 27, 2016

Uh oh!

hvanhovell commented Sep 27, 2016

Uh oh!

SparkQA commented Sep 27, 2016

Uh oh!

rxin commented Sep 28, 2016

Uh oh!

Uh oh!