-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32106][SQL] Implement script transform in sql/core #29414
Conversation
FYI @maropu @cloud-fan |
Test build #127370 has finished for PR 29414 at commit
|
Test build #127373 has finished for PR 29414 at commit
|
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala
Outdated
Show resolved
Hide resolved
FIELDS TERMINATED BY '|' | ||
LINES TERMINATED BY '\n' | ||
NULL DEFINED AS 'NULL' | ||
USING 'cat' AS (a, b, c, d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can ROW FORMAT DELIMETED
with output schema work correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could you add test cases for the parser, too? https://github.com/apache/spark/pull/29414/files#diff-36e2b29ae675caaa1fce16e74fbd8710R1135
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see two more bug founded
https://issues.apache.org/jira/browse/SPARK-32607
https://issues.apache.org/jira/browse/SPARK-32608
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could you add test cases for the parser, too? https://github.com/apache/spark/pull/29414/files#diff-36e2b29ae675caaa1fce16e74fbd8710R1135
Add some UT in transform.sql and fix origin wrong part
Test build #127402 has finished for PR 29414 at commit
|
Test build #127406 has finished for PR 29414 at commit
|
Test build #127829 has finished for PR 29414 at commit
|
retest this please |
Test build #127834 has finished for PR 29414 at commit
|
retest this please |
Test build #127841 has finished for PR 29414 at commit
|
retest this please |
Test build #128204 has finished for PR 29414 at commit
|
retest this please |
Test build #128205 has finished for PR 29414 at commit
|
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #132769 has finished for PR 29414 at commit
|
I am good with this change. I don't mind if somebody merges. |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #132794 has finished for PR 29414 at commit
|
Hope merge this and start next work |
gentle ping @maropu @cloud-fan |
gentle ping @tejasapatil @sameeragarwal |
okay, I will merge this into master in a few days if nobody has more comments. |
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #132973 has finished for PR 29414 at commit
|
retest this please |
Test build #132972 has finished for PR 29414 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #132980 has finished for PR 29414 at commit
|
okay, I believe we have much time to revisit this until the Spark v3.2.0 released, so I'll merge this into master for now so as to encourage @AngersZhuuuu 's following work. FYI: @HyukjinKwon @cloud-fan @dongjoon-hyun @gatorsmile |
Merged to master. |
Nice, I support your decision @maropu. |
Thank you for informing, @maropu ! |
struct<> | ||
-- !query output | ||
org.apache.spark.SparkException | ||
Subprocess exited with status 127. Error: /bin/bash: some_non_existent_command: command not found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to cause a test flakiness in GitHub Action. Could you take a look, @AngersZhuuuu and @maropu .
SQLQueryTestSuite.transform.sql
org.scalatest.exceptions.TestFailedException: transform.sql
Expected "...istent_command: comm[and not found]", but got "...istent_command: comm[]" Result did not match for query #2
SELECT TRANSFORM(a)
USING 'some_non_existent_command' AS (a)
FROM t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very flaky. Almost 50% failure probability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's too bad. Thanks for letting me know. I'll open a followup PR to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for taking a look at that, @maropu .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review this: #30896
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes were proposed in this pull request?
SparkScriptTransformationExec
based onBaseScriptTransformationExec
SparkScriptTransformationWriterThread
based onBaseScriptTransformationWriterThread
of writing dataSparkScripts
to support convert script LogicalPlan to SparkPlan in Spark SQL (without hive mode)SparkScriptTransformationSuite
test spark spec caseSQLQueryTestSuite
And we will close #29085 .
Why are the changes needed?
Support user use Script Transform without Hive
Does this PR introduce any user-facing change?
User can use Script Transformation without hive in no serde mode.
Such as :
**default no serde **
no serde with spec ROW FORMAT DELIMITED
How was this patch tested?
Added UT