[SPARK-19637][SQL] Add to_json in FunctionRegistry #16981

maropu · 2017-02-18T04:46:42Z

What changes were proposed in this pull request?

This pr added entries in FunctionRegistry and supported to_json in SQL.

How was this patch tested?

Added tests in JsonFunctionsSuite.

SparkQA · 2017-02-18T04:52:27Z

Test build #73095 has finished for PR 16981 at commit 8df67ec.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-18T05:22:49Z

Test build #73096 has finished for PR 16981 at commit ed87d9a.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-18T06:27:39Z

Test build #73099 has started for PR 16981 at commit 0488507.

maropu · 2017-02-18T08:10:16Z

Jenkins, retest this please.

SparkQA · 2017-02-18T10:10:50Z

Test build #73102 has finished for PR 16981 at commit 0488507.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon

@maropu, I just left few opinions that might help.

HyukjinKwon · 2017-02-18T12:06:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala

This is just my personal opinion but should we maybe consider minimising this import? For example, import org.json4s.jackson.JsonMethods.parse.

HyukjinKwon · 2017-02-18T12:13:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala

We could try to utilize val parse(json).extract[Map[String, String]]. Given my observation, it produces empty Map if there are no such values or empty json and it throws an exception if it is an invalid json.

HyukjinKwon · 2017-02-18T12:14:44Z

sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala

Maybe, """ ... """ if more commits should be pushed.

maropu · 2017-02-18T12:59:46Z

@HyukjinKwon Thanks! I'll check.

SparkQA · 2017-02-18T16:37:03Z

Test build #73107 has finished for PR 16981 at commit e12937d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon

Sorry, I happened to review twice. These are all from me. I hope my comments sound reasonable. Other than them, (FWIW) it looks good to me.

HyukjinKwon · 2017-02-19T06:04:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala

Could we do this as below

Try(parse(json).extract[Map[String, String]]).getOrElse { throw new AnalysisException(...) }

or maybe just a try-catch block?

HyukjinKwon · 2017-02-19T06:07:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala

trivial..

s"Must be a string literal, but: $e"

HyukjinKwon · 2017-02-19T06:23:19Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala

I guess we should check if it is StructType and throw a proper exception because it seems JsonToStruct does not check if exp is StructType and it probably throws a cast exception (if I haven't missed something here).

I just wrote this way along with here https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3010. Both is okay to me though, if we modify the code in a way you suggested, we need to modify from_json code, too?

Ah, thanks. Yes, if it throws a class cast exception, I think we should produce a better exception and message rather than just one saying A cannot be cast to B. Maybe, add a util for both places?

okay, I'll do that ;)

HyukjinKwon · 2017-02-19T06:28:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala

Probably just struct instead of `StructType` (as I found a example in

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala

Line 340 in e7f982b

usage = "_FUNC_(expr) - Explodes an array of structs into a table.",

as a reference).

maropu · 2017-02-19T14:51:02Z

I'll update in a day, thanks!

gatorsmile · 2017-02-20T04:51:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala

Return -> Returns

gatorsmile · 2017-02-20T04:53:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala

What is the reason we use the Json option string?

Aha, you mean we use a map literal, directly? Sorry, but I missed that idea. This json option is totally meaningless? If yes, I'll fix to use a map literal here.

gatorsmile · 2017-02-20T04:57:16Z

Could you add SQL test cases to SQLQueryTestSuite?

maropu · 2017-02-20T04:57:45Z

@gatorsmile okay, I'll do soon

gatorsmile · 2017-02-20T05:00:13Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala

More examples are needed to show users how to use options.

gatorsmile · 2017-02-20T05:07:26Z

sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala

Regarding the format of options, another way is to use the MapType.

For example,

from_json(value, '${schema2.json}', map("timestampFormat", "dd/MM/yyyy HH:mm"))

I am not sure whether using JSON to represent options is a good way.

okay, I'll fix in that way.

gatorsmile · 2017-02-20T05:10:43Z

sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala

collect is not needed

gatorsmile · 2017-02-20T05:16:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala

Can we let users call named_struct function to specify the schema?

@gatorsmile, do you mind if I ask to elaborate what you think wtih named_struct? I am just curious.

I checked related code though, if we use named_struc here, we need to add substantial code to convert named_struct to StructType...

SparkQA · 2017-02-20T06:26:09Z

Test build #73141 has finished for PR 16981 at commit 31ca0ff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-20T06:51:55Z

Test build #73143 has finished for PR 16981 at commit 9b1c015.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-20T08:44:28Z

Test build #73156 has finished for PR 16981 at commit f15d0d9.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-20T10:43:50Z

Test build #73157 has finished for PR 16981 at commit e97bcb8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-02-22T09:34:59Z

ping

maropu · 2017-03-01T00:12:30Z

@gatorsmile ping

maropu · 2017-03-02T12:20:10Z

@gatorsmile ping

gatorsmile · 2017-03-02T18:29:10Z

From JSON is harder because the second argument is a StructType. We could consider accepting a string in the DDL format for declaring a tables schema (i.e. a: Int, b: struct<a:Int, c: String>....

To parse the schema represented in DDL format, instead of the json format, we need to call the parser to do it. If you are not familar with the parser, maybe you only implement to_json in this PR?

maropu · 2017-03-03T02:04:37Z

So, this pr re-used DataType.fromJson here: https://github.com/apache/spark/pull/16981/files#diff-113a2b8242f0ee6ec3914f539f119619R65. But, I know this is some arguable. I also think it'd be better to drop off from_json from this pr and make a new JIRA to discuss from_json.

gatorsmile · 2017-03-03T16:55:09Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala

.toMap is needed?

gatorsmile · 2017-03-03T16:59:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala

Why we need to keep here? This is not related to JacksonUtils.scala. The function name also needs a change.

gatorsmile · 2017-03-03T17:23:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala

case m: CreateMap if m.dataType.acceptsType(MapType(StringType, StringType, false)) =>

Nit: valueContainsNull = false

SparkQA · 2017-03-03T23:47:21Z

Test build #73874 has finished for PR 16981 at commit 0f7b167.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-03T23:52:24Z

Test build #73875 has finished for PR 16981 at commit 4a49d64.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-04T02:00:58Z

Test build #73876 has finished for PR 16981 at commit 44797f7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-03-04T03:28:13Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala

This message is misleading in the following case:

df2.selectExpr("to_json(a, map('a', 1))")

Please also include the test case for this. Thanks!

How about this?
https://github.com/apache/spark/pull/16981/files#diff-6626026091295ad8c0dfb66ecbcd04b1R601

SparkQA · 2017-03-04T06:01:39Z

Test build #73892 has finished for PR 16981 at commit 0468280.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-03-04T15:52:26Z

LGTM

cc @marmbrus @brkyvz Is this impl what you expect?

marmbrus · 2017-03-06T21:14:23Z

yeah, LGTM

SparkQA · 2017-03-07T14:12:50Z

Test build #74104 has finished for PR 16981 at commit 098c61d.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-07T14:44:59Z

Test build #74105 has finished for PR 16981 at commit 6d16474.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-03-07T16:59:26Z

Thanks! Merging to master.

gatorsmile · 2017-03-16T04:13:41Z

cc @maropu #17171 is merged. Are you interested in working on from_json?

JIRA: https://issues.apache.org/jira/browse/SPARK-19967

maropu force-pushed the SPARK-19637 branch from 8df67ec to ed87d9a Compare February 18, 2017 05:15

maropu force-pushed the SPARK-19637 branch from ed87d9a to 0488507 Compare February 18, 2017 06:23

HyukjinKwon reviewed Feb 18, 2017

View reviewed changes

HyukjinKwon reviewed Feb 19, 2017

View reviewed changes

gatorsmile reviewed Feb 20, 2017

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala Outdated

Copy link

Member

gatorsmile Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return -> Returns

gatorsmile reviewed Feb 20, 2017

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala Outdated

Copy link

Member

gatorsmile Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect is not needed

gatorsmile reviewed Feb 20, 2017

View reviewed changes

maropu force-pushed the SPARK-19637 branch from f15d0d9 to e97bcb8 Compare February 20, 2017 08:43

gatorsmile reviewed Mar 3, 2017

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala Outdated

Copy link

Member

gatorsmile Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.toMap is needed?

gatorsmile reviewed Mar 3, 2017

View reviewed changes

maropu force-pushed the SPARK-19637 branch 2 times, most recently from 8093498 to 4a49d64 Compare March 3, 2017 23:45

maropu force-pushed the SPARK-19637 branch from 4a49d64 to 44797f7 Compare March 3, 2017 23:57

gatorsmile reviewed Mar 4, 2017

View reviewed changes

maropu added 9 commits March 7, 2017 21:32

Add from_json/to_json in FunctionRegistry

ffa92ba

Apply review comments

7f12e94

Apply review comments

a960dfd

Add strToStructType in JacksonUtils

1d494b2

Apply review comments

94bfe2d

Drop from_json support

53b758b

Apply some comments

bc8b07e

Remove unnecessary import

228ff37

Apply review comments

6daea90

maropu force-pushed the SPARK-19637 branch from 0468280 to 098c61d Compare March 7, 2017 12:36

Add error messages

6d16474

maropu force-pushed the SPARK-19637 branch from 098c61d to 6d16474 Compare March 7, 2017 12:37

asfgit closed this in 030acdd Mar 7, 2017

[SPARK-19637][SQL] Add to_json in FunctionRegistry #16981

[SPARK-19637][SQL] Add to_json in FunctionRegistry #16981

Uh oh!

Conversation

maropu commented Feb 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 18, 2017

Uh oh!

SparkQA commented Feb 18, 2017

Uh oh!

SparkQA commented Feb 18, 2017

Uh oh!

maropu commented Feb 18, 2017

Uh oh!

SparkQA commented Feb 18, 2017

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Feb 18, 2017

Uh oh!

SparkQA commented Feb 18, 2017

Uh oh!

HyukjinKwon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Feb 19, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Feb 20, 2017

Uh oh!

maropu commented Feb 20, 2017

Uh oh!

gatorsmile Feb 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Feb 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Feb 18, 2017 •

edited

Loading

HyukjinKwon left a comment •

edited

Loading

gatorsmile Feb 20, 2017 •

edited

Loading

gatorsmile Feb 20, 2017 •

edited

Loading