Skip to content

[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base #25090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

imback82
Copy link
Contributor

@imback82 imback82 commented Jul 10, 2019

What changes were proposed in this pull request?

This PR adds some tests converted from except-all.sql to test UDFs. Please see contribution guide of this umbrella ticket - SPARK-27921.

Diff comparing to 'except-all.sql'

diff --git a/sql/core/src/test/resources/sql-tests/results/except-all.sql.out b/sql/core/src/test/resources/sql-tests/results/udf/udf-except-all.sql.out
index 01091a2f75..b7bfad0e53 100644
--- a/sql/core/src/test/resources/sql-tests/results/except-all.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/udf/udf-except-all.sql.out
@@ -49,11 +49,11 @@ struct<>
 
 
 -- !query 4
-SELECT * FROM tab1
+SELECT udf(c1) FROM tab1
 EXCEPT ALL
-SELECT * FROM tab2
+SELECT udf(c1) FROM tab2
 -- !query 4 schema
-struct<c1:int>
+struct<CAST(udf(cast(c1 as string)) AS INT):int>
 -- !query 4 output
 0
 2
@@ -62,11 +62,11 @@ NULL
 
 
 -- !query 5
-SELECT * FROM tab1
+SELECT udf(c1) FROM tab1
 MINUS ALL
-SELECT * FROM tab2
+SELECT udf(c1) FROM tab2
 -- !query 5 schema
-struct<c1:int>
+struct<CAST(udf(cast(c1 as string)) AS INT):int>
 -- !query 5 output
 0
 2
@@ -75,11 +75,11 @@ NULL
 
 
 -- !query 6
-SELECT * FROM tab1
+SELECT udf(c1) FROM tab1
 EXCEPT ALL
-SELECT * FROM tab2 WHERE c1 IS NOT NULL
+SELECT udf(c1) FROM tab2 WHERE udf(c1) IS NOT NULL
 -- !query 6 schema
-struct<c1:int>
+struct<CAST(udf(cast(c1 as string)) AS INT):int>
 -- !query 6 output
 0
 2
@@ -89,21 +89,21 @@ NULL
 
 
 -- !query 7
-SELECT * FROM tab1 WHERE c1 > 5
+SELECT udf(c1) FROM tab1 WHERE udf(c1) > 5
 EXCEPT ALL
-SELECT * FROM tab2
+SELECT udf(c1) FROM tab2
 -- !query 7 schema
-struct<c1:int>
+struct<CAST(udf(cast(c1 as string)) AS INT):int>
 -- !query 7 output
 
 
 
 -- !query 8
-SELECT * FROM tab1
+SELECT udf(c1) FROM tab1
 EXCEPT ALL
-SELECT * FROM tab2 WHERE c1 > 6
+SELECT udf(c1) FROM tab2 WHERE udf(c1 > udf(6))
 -- !query 8 schema
-struct<c1:int>
+struct<CAST(udf(cast(c1 as string)) AS INT):int>
 -- !query 8 output
 0
 1
@@ -117,11 +117,11 @@ NULL
 
 
 -- !query 9
-SELECT * FROM tab1
+SELECT udf(c1) FROM tab1
 EXCEPT ALL
-SELECT CAST(1 AS BIGINT)
+SELECT CAST(udf(1) AS BIGINT)
 -- !query 9 schema
-struct<c1:bigint>
+struct<CAST(udf(cast(c1 as string)) AS INT):bigint>
 -- !query 9 output
 0
 2
@@ -134,7 +134,7 @@ NULL
 
 
 -- !query 10
-SELECT * FROM tab1
+SELECT udf(c1) FROM tab1
 EXCEPT ALL
 SELECT array(1)
 -- !query 10 schema
@@ -145,62 +145,62 @@ ExceptAll can only be performed on tables with the compatible column types. arra
 
 
 -- !query 11
-SELECT * FROM tab3
+SELECT udf(k), v FROM tab3
 EXCEPT ALL
-SELECT * FROM tab4
+SELECT k, udf(v) FROM tab4
 -- !query 11 schema
-struct<k:int,v:int>
+struct<CAST(udf(cast(k as string)) AS INT):int,v:int>
 -- !query 11 output
 1	2
 1	3
 
 
 -- !query 12
-SELECT * FROM tab4
+SELECT k, udf(v) FROM tab4
 EXCEPT ALL
-SELECT * FROM tab3
+SELECT udf(k), v FROM tab3
 -- !query 12 schema
-struct<k:int,v:int>
+struct<k:int,CAST(udf(cast(v as string)) AS INT):int>
 -- !query 12 output
 2	2
 2	20
 
 
 -- !query 13
-SELECT * FROM tab4
+SELECT udf(k), udf(v) FROM tab4
 EXCEPT ALL
-SELECT * FROM tab3
+SELECT udf(k), udf(v) FROM tab3
 INTERSECT DISTINCT
-SELECT * FROM tab4
+SELECT udf(k), udf(v) FROM tab4
 -- !query 13 schema
-struct<k:int,v:int>
+struct<CAST(udf(cast(k as string)) AS INT):int,CAST(udf(cast(v as string)) AS INT):int>
 -- !query 13 output
 2	2
 2	20
 
 
 -- !query 14
-SELECT * FROM tab4
+SELECT udf(k), v FROM tab4
 EXCEPT ALL
-SELECT * FROM tab3
+SELECT k, udf(v) FROM tab3
 EXCEPT DISTINCT
-SELECT * FROM tab4
+SELECT udf(k), udf(v) FROM tab4
 -- !query 14 schema
-struct<k:int,v:int>
+struct<CAST(udf(cast(k as string)) AS INT):int,v:int>
 -- !query 14 output
 
 
 
 -- !query 15
-SELECT * FROM tab3
+SELECT k, udf(v) FROM tab3
 EXCEPT ALL
-SELECT * FROM tab4
+SELECT udf(k), udf(v) FROM tab4
 UNION ALL
-SELECT * FROM tab3
+SELECT udf(k), v FROM tab3
 EXCEPT DISTINCT
-SELECT * FROM tab4
+SELECT k, udf(v) FROM tab4
 -- !query 15 schema
-struct<k:int,v:int>
+struct<k:int,CAST(udf(cast(v as string)) AS INT):int>
 -- !query 15 output
 1	3
 
@@ -217,83 +217,83 @@ ExceptAll can only be performed on tables with the same number of columns, but t
 
 
 -- !query 17
-SELECT * FROM tab3
+SELECT udf(k), udf(v) FROM tab3
 EXCEPT ALL
-SELECT * FROM tab4
+SELECT udf(k), udf(v) FROM tab4
 UNION
-SELECT * FROM tab3
+SELECT udf(k), udf(v) FROM tab3
 EXCEPT DISTINCT
-SELECT * FROM tab4
+SELECT udf(k), udf(v) FROM tab4
 -- !query 17 schema
-struct<k:int,v:int>
+struct<CAST(udf(cast(k as string)) AS INT):int,CAST(udf(cast(v as string)) AS INT):int>
 -- !query 17 output
 1	3
 
 
 -- !query 18
-SELECT * FROM tab3
+SELECT udf(k), udf(v) FROM tab3
 MINUS ALL
-SELECT * FROM tab4
+SELECT k, udf(v) FROM tab4
 UNION
-SELECT * FROM tab3
+SELECT udf(k), udf(v) FROM tab3
 MINUS DISTINCT
-SELECT * FROM tab4
+SELECT k, udf(v) FROM tab4
 -- !query 18 schema
-struct<k:int,v:int>
+struct<CAST(udf(cast(k as string)) AS INT):int,CAST(udf(cast(v as string)) AS INT):int>
 -- !query 18 output
 1	3
 
 
 -- !query 19
-SELECT * FROM tab3
+SELECT k, udf(v) FROM tab3
 EXCEPT ALL
-SELECT * FROM tab4
+SELECT udf(k), v FROM tab4
 EXCEPT DISTINCT
-SELECT * FROM tab3
+SELECT k, udf(v) FROM tab3
 EXCEPT DISTINCT
-SELECT * FROM tab4
+SELECT udf(k), v FROM tab4
 -- !query 19 schema
-struct<k:int,v:int>
+struct<k:int,CAST(udf(cast(v as string)) AS INT):int>
 -- !query 19 output
 
 
 
 -- !query 20
 SELECT * 
-FROM   (SELECT tab3.k, 
-               tab4.v 
+FROM   (SELECT tab3.k,
+               udf(tab4.v)
         FROM   tab3 
                JOIN tab4 
-                 ON tab3.k = tab4.k)
+                 ON udf(tab3.k) = tab4.k)
 EXCEPT ALL 
 SELECT * 
-FROM   (SELECT tab3.k, 
-               tab4.v 
+FROM   (SELECT udf(tab3.k),
+               tab4.v
         FROM   tab3 
                JOIN tab4 
-                 ON tab3.k = tab4.k)
+                 ON tab3.k = udf(tab4.k))
 -- !query 20 schema
-struct<k:int,v:int>
+struct<k:int,CAST(udf(cast(v as string)) AS INT):int>
 -- !query 20 output
 
 
 
 -- !query 21
 SELECT * 
-FROM   (SELECT tab3.k, 
-               tab4.v 
+FROM   (SELECT udf(udf(tab3.k)),
+               udf(tab4.v)
         FROM   tab3 
                JOIN tab4 
-                 ON tab3.k = tab4.k) 
+                 ON udf(udf(tab3.k)) = udf(tab4.k))
 EXCEPT ALL 
 SELECT * 
-FROM   (SELECT tab4.v AS k, 
-               tab3.k AS v 
+FROM   (SELECT udf(tab4.v) AS k,
+               udf(udf(tab3.k)) AS v
         FROM   tab3 
                JOIN tab4 
-                 ON tab3.k = tab4.k)
+                 ON udf(tab3.k) = udf(tab4.k))
 -- !query 21 schema
-struct<k:int,v:int>
+struct<CAST(udf(cast(cast(udf(cast(k as string)) as int) as string)) AS INT):int,CAST(udf(cast(v as string)) AS INT):int>
 -- !query 21 output
 1	2
 1	2
@@ -305,11 +305,11 @@ struct<k:int,v:int>
 
 
 -- !query 22
-SELECT v FROM tab3 GROUP BY v
+SELECT udf(v) FROM tab3 GROUP BY v
 EXCEPT ALL
-SELECT k FROM tab4 GROUP BY k
+SELECT udf(k) FROM tab4 GROUP BY k
 -- !query 22 schema
-struct<v:int>
+struct<CAST(udf(cast(v as string)) AS INT):int>
 -- !query 22 output
 3
 

How was this patch tested?

Tested as guided in SPARK-27921.

@HyukjinKwon
Copy link
Member

add to whitelist

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107434 has finished for PR 25090 at commit a512ef8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented Jul 10, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107451 has finished for PR 25090 at commit a512ef8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 11, 2019

Test build #107501 has finished for PR 25090 at commit a09df4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@imback82, #25130 is merged. Can you sync this PR to master and rebase please?

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

Looks fine otherwise.

@SparkQA
Copy link

SparkQA commented Jul 18, 2019

Test build #107819 has finished for PR 25090 at commit a09df4b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

@HyukjinKwon, I think I addressed all your comments. Please re-review this. Thanks!

-- Empty right relation
SELECT udf(c1) FROM tab1
EXCEPT ALL
SELECT udf(c1) FROM tab2 WHERE udf(c1 > udf(6));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentionally to do udf(6)? Not udf(c1) > 6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am trying a different combination of udfs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it's kind of a bit random but let's test different cases while we're here..

@SparkQA
Copy link

SparkQA commented Jul 18, 2019

Test build #107825 has finished for PR 25090 at commit 2c8cc19.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Jul 18, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Jul 18, 2019

Test build #107832 has finished for PR 25090 at commit 2c8cc19.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Looks there is no notable diff.

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants