-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19637][SQL] Add to_json in FunctionRegistry #16981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #73095 has finished for PR 16981 at commit
|
|
Test build #73096 has finished for PR 16981 at commit
|
|
Test build #73099 has started for PR 16981 at commit |
|
Jenkins, retest this please. |
|
Test build #73102 has finished for PR 16981 at commit
|
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu, I just left few opinions that might help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just my personal opinion but should we maybe consider minimising this import? For example, import org.json4s.jackson.JsonMethods.parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try to utilize val parse(json).extract[Map[String, String]]. Given my observation, it produces empty Map if there are no such values or empty json and it throws an exception if it is an invalid json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, """ ... """ if more commits should be pushed.
|
@HyukjinKwon Thanks! I'll check. |
|
Test build #73107 has finished for PR 16981 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I happened to review twice. These are all from me. I hope my comments sound reasonable. Other than them, (FWIW) it looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this as below
Try(parse(json).extract[Map[String, String]]).getOrElse {
throw new AnalysisException(...)
}or maybe just a try-catch block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial..
s"Must be a string literal, but: $e"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should check if it is StructType and throw a proper exception because it seems JsonToStruct does not check if exp is StructType and it probably throws a cast exception (if I haven't missed something here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wrote this way along with here https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3010. Both is okay to me though, if we modify the code in a way you suggested, we need to modify from_json code, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks. Yes, if it throws a class cast exception, I think we should produce a better exception and message rather than just one saying A cannot be cast to B. Maybe, add a util for both places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I'll do that ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just struct instead of `StructType` (as I found a example in
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
Line 340 in e7f982b
| usage = "_FUNC_(expr) - Explodes an array of structs into a table.", |
|
I'll update in a day, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return -> Returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason we use the Json option string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, you mean we use a map literal, directly? Sorry, but I missed that idea. This json option is totally meaningless? If yes, I'll fix to use a map literal here.
|
Could you add SQL test cases to SQLQueryTestSuite? |
|
@gatorsmile okay, I'll do soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More examples are needed to show users how to use options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the format of options, another way is to use the MapType.
For example,
from_json(value, '${schema2.json}', map("timestampFormat", "dd/MM/yyyy HH:mm"))I am not sure whether using JSON to represent options is a good way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I'll fix in that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collect is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we let users call named_struct function to specify the schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile, do you mind if I ask to elaborate what you think wtih named_struct? I am just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked related code though, if we use named_struc here, we need to add substantial code to convert named_struct to StructType...
|
Test build #73141 has finished for PR 16981 at commit
|
|
Test build #73143 has finished for PR 16981 at commit
|
|
Test build #73156 has finished for PR 16981 at commit
|
|
Test build #73157 has finished for PR 16981 at commit
|
|
ping |
|
@gatorsmile ping |
1 similar comment
|
@gatorsmile ping |
To parse the schema represented in DDL format, instead of the json format, we need to call the parser to do it. If you are not familar with the parser, maybe you only implement |
|
So, this pr re-used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.toMap is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to keep here? This is not related to JacksonUtils.scala. The function name also needs a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case m: CreateMap if m.dataType.acceptsType(MapType(StringType, StringType, false)) =>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: valueContainsNull = false
8093498 to
4a49d64
Compare
|
Test build #73874 has finished for PR 16981 at commit
|
|
Test build #73875 has finished for PR 16981 at commit
|
|
Test build #73876 has finished for PR 16981 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is misleading in the following case:
df2.selectExpr("to_json(a, map('a', 1))")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also include the test case for this. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Test build #73892 has finished for PR 16981 at commit
|
|
yeah, LGTM |
|
Test build #74104 has finished for PR 16981 at commit
|
|
Test build #74105 has finished for PR 16981 at commit
|
|
Thanks! Merging to master. |
What changes were proposed in this pull request?
This pr added entries in
FunctionRegistryand supportedto_jsonin SQL.How was this patch tested?
Added tests in
JsonFunctionsSuite.