Skip to content

[SPARK-31025][SQL] Support foldable CSV strings by schema_of_csv #27777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 3, 2020

What changes were proposed in this pull request?

In the PR, I propose to change checking of the input parameter in the SchemaOfCsv expression, and allow foldable child expression.

Why are the changes needed?

To improve user experience with Spark SQL. Currently, only string literals are acceptable as CSV examples by schema_of_csv:

spark-sql> select schema_of_csv(concat_ws(',', 0.1, 1));
Error in query: cannot resolve 'schema_of_csv(concat_ws(',', CAST(0.1BD AS STRING), CAST(1 AS STRING)))' due to data type mismatch: The input csv should be a string literal and not null; however, got concat_ws(',', CAST(0.1BD AS STRING), CAST(1 AS STRING)).; line 1 pos 7;
'Project [unresolvedalias(schema_of_csv(concat_ws(,, cast(0.1 as string), cast(1 as string))), None)]
+- OneRowRelation

Does this PR introduce any user-facing change?

Yes, after change the schema_of_csv accepts foldable expressions, for example:

spark-sql> select schema_of_csv(concat_ws(',', 0.1, 1));
struct<_c0:double,_c1:int>

How was this patch tested?

  • By existing test suites CsvFunctionsSuite and CsvExpressionsSuite.
  • Added new test to CsvFunctionsSuite to check foldable input.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 3, 2020

Max, let's fold from_csv, from_json, and this PRs in single PR. I think it's easier to discuss, review and merge.

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119250 has finished for PR 27777 at commit d4da235.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 4, 2020

Test build #119288 has finished for PR 27777 at commit 222a31c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants