[SPARK-13410][SQL] Support unionAll for DataFrames with UDT columns. #11333

damnMeddlingKid · 2016-02-23T21:52:28Z

What changes were proposed in this pull request?

This PR adds equality operators to UDT classes so that they can be correctly tested for dataType equality during union operations.

This was previously causing "AnalysisException: u"unresolved operator 'Union;"" when trying to unionAll two dataframes with UDT columns as below.

from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
from pyspark.sql import types

schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)])

a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema)
b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema)

c = a.unionAll(b)

How was the this patch tested?

Tested using two unit tests in sql/test.py and the DataFrameSuite.

Additional information here : https://issues.apache.org/jira/browse/SPARK-13410

@rxin

davies · 2016-02-23T22:00:43Z

LGTM

damnMeddlingKid · 2016-02-23T23:29:27Z

Hoping to get this into 1.6.1.

SparkQA · 2016-02-23T23:31:58Z

Test build #2570 has finished for PR 11333 at commit c14d1ba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-23T23:33:53Z

Thanks - merging in branch-1.6.

## What changes were proposed in this pull request? This PR adds equality operators to UDT classes so that they can be correctly tested for dataType equality during union operations. This was previously causing `"AnalysisException: u"unresolved operator 'Union;""` when trying to unionAll two dataframes with UDT columns as below. ``` from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT from pyspark.sql import types schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)]) a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema) b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema) c = a.unionAll(b) ``` ## How was the this patch tested? Tested using two unit tests in sql/test.py and the DataFrameSuite. Additional information here : https://issues.apache.org/jira/browse/SPARK-13410 rxin Author: Franklyn D'souza <franklynd@gmail.com> Closes #11333 from damnMeddlingKid/udt-union-patch.

SparkQA · 2016-02-23T23:44:40Z

Test build #2571 has finished for PR 11333 at commit c14d1ba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-24T00:16:05Z

@damnMeddlingKid can you close the pull request? Since it was not merged in master, github won't auto close it.

support unionAll for dataframes with UDT columns

c14d1ba

damnMeddlingKid closed this Feb 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-13410][SQL] Support unionAll for DataFrames with UDT columns. #11333

[SPARK-13410][SQL] Support unionAll for DataFrames with UDT columns. #11333

Uh oh!

damnMeddlingKid commented Feb 23, 2016

Uh oh!

davies commented Feb 23, 2016

Uh oh!

damnMeddlingKid commented Feb 23, 2016

Uh oh!

SparkQA commented Feb 23, 2016

Uh oh!

rxin commented Feb 23, 2016

Uh oh!

SparkQA commented Feb 23, 2016

Uh oh!

rxin commented Feb 24, 2016

Uh oh!

Uh oh!

[SPARK-13410][SQL] Support unionAll for DataFrames with UDT columns. #11333

[SPARK-13410][SQL] Support unionAll for DataFrames with UDT columns. #11333

Uh oh!

Conversation

damnMeddlingKid commented Feb 23, 2016

What changes were proposed in this pull request?

How was the this patch tested?

Uh oh!

davies commented Feb 23, 2016

Uh oh!

damnMeddlingKid commented Feb 23, 2016

Uh oh!

SparkQA commented Feb 23, 2016

Uh oh!

rxin commented Feb 23, 2016

Uh oh!

SparkQA commented Feb 23, 2016

Uh oh!

rxin commented Feb 24, 2016

Uh oh!

Uh oh!