Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Feb 18, 2017

What changes were proposed in this pull request?

This pr added entries in FunctionRegistry and supported to_json in SQL.

How was this patch tested?

Added tests in JsonFunctionsSuite.

@SparkQA
Copy link

SparkQA commented Feb 18, 2017

Test build #73095 has finished for PR 16981 at commit 8df67ec.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 18, 2017

Test build #73096 has finished for PR 16981 at commit ed87d9a.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 18, 2017

Test build #73099 has started for PR 16981 at commit 0488507.

@maropu
Copy link
Member Author

maropu commented Feb 18, 2017

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 18, 2017

Test build #73102 has finished for PR 16981 at commit 0488507.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu, I just left few opinions that might help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just my personal opinion but should we maybe consider minimising this import? For example, import org.json4s.jackson.JsonMethods.parse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try to utilize val parse(json).extract[Map[String, String]]. Given my observation, it produces empty Map if there are no such values or empty json and it throws an exception if it is an invalid json.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, """ ... """ if more commits should be pushed.

@maropu
Copy link
Member Author

maropu commented Feb 18, 2017

@HyukjinKwon Thanks! I'll check.

@SparkQA
Copy link

SparkQA commented Feb 18, 2017

Test build #73107 has finished for PR 16981 at commit e12937d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I happened to review twice. These are all from me. I hope my comments sound reasonable. Other than them, (FWIW) it looks good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this as below

    Try(parse(json).extract[Map[String, String]]).getOrElse {
      throw new AnalysisException(...)
    }

or maybe just a try-catch block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial..

s"Must be a string literal, but: $e"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should check if it is StructType and throw a proper exception because it seems JsonToStruct does not check if exp is StructType and it probably throws a cast exception (if I haven't missed something here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wrote this way along with here https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3010. Both is okay to me though, if we modify the code in a way you suggested, we need to modify from_json code, too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks. Yes, if it throws a class cast exception, I think we should produce a better exception and message rather than just one saying A cannot be cast to B. Maybe, add a util for both places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I'll do that ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just struct instead of `StructType` (as I found a example in

usage = "_FUNC_(expr) - Explodes an array of structs into a table.",
as a reference).

@maropu
Copy link
Member Author

maropu commented Feb 19, 2017

I'll update in a day, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return -> Returns

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we use the Json option string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, you mean we use a map literal, directly? Sorry, but I missed that idea. This json option is totally meaningless? If yes, I'll fix to use a map literal here.

@gatorsmile
Copy link
Member

Could you add SQL test cases to SQLQueryTestSuite?

@maropu
Copy link
Member Author

maropu commented Feb 20, 2017

@gatorsmile okay, I'll do soon

Copy link
Member

@gatorsmile gatorsmile Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More examples are needed to show users how to use options.

Copy link
Member

@gatorsmile gatorsmile Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the format of options, another way is to use the MapType.

For example,

from_json(value, '${schema2.json}', map("timestampFormat", "dd/MM/yyyy HH:mm"))

I am not sure whether using JSON to represent options is a good way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I'll fix in that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect is not needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we let users call named_struct function to specify the schema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile, do you mind if I ask to elaborate what you think wtih named_struct? I am just curious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked related code though, if we use named_struc here, we need to add substantial code to convert named_struct to StructType...

@SparkQA
Copy link

SparkQA commented Feb 20, 2017

Test build #73141 has finished for PR 16981 at commit 31ca0ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 20, 2017

Test build #73143 has finished for PR 16981 at commit 9b1c015.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 20, 2017

Test build #73156 has finished for PR 16981 at commit f15d0d9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 20, 2017

Test build #73157 has finished for PR 16981 at commit e97bcb8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 22, 2017

ping

@maropu
Copy link
Member Author

maropu commented Mar 1, 2017

@gatorsmile ping

1 similar comment
@maropu
Copy link
Member Author

maropu commented Mar 2, 2017

@gatorsmile ping

@gatorsmile
Copy link
Member

From JSON is harder because the second argument is a StructType. We could consider accepting a string in the DDL format for declaring a tables schema (i.e. a: Int, b: struct<a:Int, c: String>....

To parse the schema represented in DDL format, instead of the json format, we need to call the parser to do it. If you are not familar with the parser, maybe you only implement to_json in this PR?

@maropu
Copy link
Member Author

maropu commented Mar 3, 2017

So, this pr re-used DataType.fromJson here: https://github.com/apache/spark/pull/16981/files#diff-113a2b8242f0ee6ec3914f539f119619R65. But, I know this is some arguable. I also think it'd be better to drop off from_json from this pr and make a new JIRA to discuss from_json.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.toMap is needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to keep here? This is not related to JacksonUtils.scala. The function name also needs a change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case m: CreateMap if m.dataType.acceptsType(MapType(StringType, StringType, false)) =>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: valueContainsNull = false

@maropu maropu force-pushed the SPARK-19637 branch 2 times, most recently from 8093498 to 4a49d64 Compare March 3, 2017 23:45
@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73874 has finished for PR 16981 at commit 0f7b167.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73875 has finished for PR 16981 at commit 4a49d64.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 4, 2017

Test build #73876 has finished for PR 16981 at commit 44797f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is misleading in the following case:

df2.selectExpr("to_json(a, map('a', 1))")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also include the test case for this. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Mar 4, 2017

Test build #73892 has finished for PR 16981 at commit 0468280.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

cc @marmbrus @brkyvz Is this impl what you expect?

@marmbrus
Copy link
Contributor

marmbrus commented Mar 6, 2017

yeah, LGTM

@SparkQA
Copy link

SparkQA commented Mar 7, 2017

Test build #74104 has finished for PR 16981 at commit 098c61d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 7, 2017

Test build #74105 has finished for PR 16981 at commit 6d16474.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 030acdd Mar 7, 2017
@gatorsmile
Copy link
Member

cc @maropu #17171 is merged. Are you interested in working on from_json?

JIRA: https://issues.apache.org/jira/browse/SPARK-19967

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants