-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-25446][R] Add schema_of_json() and schema_of_csv() to R #22939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -205,11 +205,18 @@ NULL | |
#' also supported for the schema. | ||
#' \item \code{from_csv}: a DDL-formatted string | ||
#' } | ||
#' @param ... additional argument(s). In \code{to_json}, \code{to_csv} and \code{from_json}, | ||
#' this contains additional named properties to control how it is converted, accepts | ||
#' the same options as the JSON/CSV data source. Additionally \code{to_json} supports | ||
#' the "pretty" option which enables pretty JSON generation. In \code{arrays_zip}, | ||
#' this contains additional Columns of arrays to be merged. | ||
#' @param ... additional argument(s). | ||
#' \itemize{ | ||
#' \item \code{to_json}, \code{from_json} and \code{schema_of_json}: this contains | ||
#' additional named properties to control how it is converted and accepts the | ||
#' same options as the JSON data source. | ||
#' \item \code{to_json}: it supports the "pretty" option which enables pretty | ||
#' JSON generation. | ||
#' \item \code{to_csv}, \code{from_csv} and \code{schema_of_csv}: this contains | ||
#' additional named properties to control how it is converted and accepts the | ||
#' same options as the CSV data source. | ||
#' \item \code{arrays_zip}, this contains additional Columns of arrays to be merged. | ||
#' } | ||
#' @name column_collection_functions | ||
#' @rdname column_collection_functions | ||
#' @family collection functions | ||
|
@@ -1727,12 +1734,16 @@ setMethod("to_date", | |
#' df2 <- mutate(df2, people_json = to_json(df2$people)) | ||
#' | ||
#' # Converts a map into a JSON object | ||
#' df2 <- sql("SELECT map('name', 'Bob')) as people") | ||
#' df2 <- sql("SELECT map('name', 'Bob') as people") | ||
#' df2 <- mutate(df2, people_json = to_json(df2$people)) | ||
#' | ||
#' # Converts an array of maps into a JSON array | ||
#' df2 <- sql("SELECT array(map('name', 'Bob'), map('name', 'Alice')) as people") | ||
#' df2 <- mutate(df2, people_json = to_json(df2$people))} | ||
#' df2 <- mutate(df2, people_json = to_json(df2$people)) | ||
#' | ||
#' # Converts a map into a pretty JSON object | ||
#' df2 <- sql("SELECT map('name', 'Bob') as people") | ||
#' df2 <- mutate(df2, people_json = to_json(df2$people, pretty = TRUE))} | ||
#' @note to_json since 2.2.0 | ||
setMethod("to_json", signature(x = "Column"), | ||
function(x, ...) { | ||
|
@@ -2230,6 +2241,32 @@ setMethod("from_json", signature(x = "Column", schema = "characterOrstructType") | |
column(jc) | ||
}) | ||
|
||
#' @details | ||
#' \code{schema_of_json}: Parses a JSON string and infers its schema in DDL format. | ||
#' | ||
#' @rdname column_collection_functions | ||
#' @aliases schema_of_json schema_of_json,characterOrColumn-method | ||
#' @examples | ||
#' | ||
#' \dontrun{ | ||
#' json <- "{\"name\":\"Bob\"}" | ||
#' df <- sql("SELECT * FROM range(1)") | ||
#' head(select(df, schema_of_json(json)))} | ||
#' @note schema_of_json since 3.0.0 | ||
setMethod("schema_of_json", signature(x = "characterOrColumn"), | ||
function(x, ...) { | ||
if (class(x) == "character") { | ||
col <- callJStatic("org.apache.spark.sql.functions", "lit", x) | ||
} else { | ||
col <- x@jc | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the use when x is a Column? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's actually related with Scala API. There are too many overridden versions of functions in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok but one use could be
like an actual col not a literal string? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea .. that was discussed at #22775. The usecase of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are saying this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just that I thought the shortcut syntax in scala is nicer looking then There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm .. do you mind if we go ahead for this one and talk later within 3.0? I think we're going to deal with this (general) problem within 3.0 if I am not mistaken. I need to make one followup after this anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe to think about the design of API in R and Scala and else where - what does it look like when the user passes in a column that is not a literal string? probably worthwhile to follow up separately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, I agree. It will throw an analysis exception in that case. I also sympathize the concerns here and somewhat we're unclear about this - so I just wanted to make it restricted in general for now. I'm going to open another PR related with this as a followup (for #22939 (comment)). While I am there, I will test when the user passes in a column that is not a literal string. |
||
} | ||
options <- varargsToStrEnv(...) | ||
jc <- callJStatic("org.apache.spark.sql.functions", | ||
"schema_of_json", | ||
col, options) | ||
column(jc) | ||
}) | ||
|
||
#' @details | ||
#' \code{from_csv}: Parses a column containing a CSV string into a Column of \code{structType} | ||
#' with the specified \code{schema}. | ||
|
@@ -2260,6 +2297,32 @@ setMethod("from_csv", signature(x = "Column", schema = "characterOrColumn"), | |
column(jc) | ||
}) | ||
|
||
#' @details | ||
#' \code{schema_of_csv}: Parses a CSV string and infers its schema in DDL format. | ||
#' | ||
#' @rdname column_collection_functions | ||
#' @aliases schema_of_csv schema_of_csv,characterOrColumn-method | ||
#' @examples | ||
#' | ||
#' \dontrun{ | ||
#' csv <- "Amsterdam,2018" | ||
#' df <- sql("SELECT * FROM range(1)") | ||
#' head(select(df, schema_of_csv(csv)))} | ||
#' @note schema_of_csv since 3.0.0 | ||
setMethod("schema_of_csv", signature(x = "characterOrColumn"), | ||
function(x, ...) { | ||
if (class(x) == "character") { | ||
col <- callJStatic("org.apache.spark.sql.functions", "lit", x) | ||
} else { | ||
col <- x@jc | ||
} | ||
options <- varargsToStrEnv(...) | ||
jc <- callJStatic("org.apache.spark.sql.functions", | ||
"schema_of_csv", | ||
col, options) | ||
column(jc) | ||
}) | ||
|
||
#' @details | ||
#' \code{from_utc_timestamp}: This is a common function for databases supporting TIMESTAMP WITHOUT | ||
#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, how does
pretty
work? is itpretty = TRUE
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's there before but I'd like to suggest to give an example - doc or code example below.
it's a bit different from python/scala I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I added an example