-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-17699] Support for parsing JSON string columns #15274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #66016 has finished for PR 15274 at commit
|
/** | ||
* Converts an json input string to a [[StructType]] with the specified schema. | ||
*/ | ||
case class JsonToStruct(schema: StructType, options: Map[String, String], child: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this implement ExpectsInputTypes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, it definitly should. Let me update.
Might want to send a dev list email to solicit feedback on the API? |
Emailed the list. Seems like a popular feature so far :) |
Test build #66048 has finished for PR 15274 at commit
|
Test build #66052 has finished for PR 15274 at commit
|
@marmbrus I just wonder if adding |
LGTM. Merging to master. |
@HyukjinKwon absolutely. I actually changed the name from |
@marmbrus: Is there any workaround I can use to achieve a similar effect in 1.6? |
@DanielMe The best options for 1.6 are |
@yhuai thanks! My impression was that |
@DanielMe oh, I see. |
Actually, to specify the schema in SQL language, maybe we can use a JSON string. A little bit odd. So far, nobody is asking for it, I guess. Let us see whether users need it in SQL |
@gatorsmile Alternatively, one can use do what brickhouse's (For the record, I actually need this in SQL) |
Based on the comment @marmbrus in a JIRA, we prefer to using our DDL format. For example, like what we did for CREATE TABLE, we can specify the schema using |
Spark SQL has great support for reading text files that contain JSON data. However, in many cases the JSON data is just one column amongst others. This is particularly true when reading from sources such as Kafka. This PR adds a new functions
from_json
that converts a string column into a nestedStructType
with a user specified schema.Example usage:
This PR adds support for java, scala and python. I leveraged our existing JSON parsing support by moving it into catalyst (so that we could define expressions using it). I left SQL out for now, because I'm not sure how users would specify a schema.