-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-17608][SPARKR]:Long type has incorrect serialization/deserialization #17640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,10 @@ test_that("SerDe of primitive types", { | |
expect_equal(x, 1) | ||
expect_equal(class(x), "numeric") | ||
|
||
x <- callJStatic("SparkRHandler", "echo", 1380742793415240) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know how to specify in R console to enforce bigint type. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did some google search. R can't specify We can remove the tests above, as we added |
||
expect_equal(x, 1380742793415240) | ||
expect_equal(class(x), "numeric") | ||
|
||
x <- callJStatic("SparkRHandler", "echo", TRUE) | ||
expect_true(x) | ||
expect_equal(class(x), "logical") | ||
|
@@ -43,6 +47,11 @@ test_that("SerDe of list of primitive types", { | |
expect_equal(x, y) | ||
expect_equal(class(y[[1]]), "integer") | ||
|
||
x <- list(1380742793415240, 13807427934152401, 13807427934152402) | ||
y <- callJStatic("SparkRHandler", "echo", x) | ||
expect_equal(x, y) | ||
expect_equal(class(y[[1]]), "numeric") | ||
|
||
x <- list(1, 2, 3) | ||
y <- callJStatic("SparkRHandler", "echo", x) | ||
expect_equal(x, y) | ||
|
@@ -66,7 +75,8 @@ test_that("SerDe of list of primitive types", { | |
|
||
test_that("SerDe of list of lists", { | ||
x <- list(list(1L, 2L, 3L), list(1, 2, 3), | ||
list(TRUE, FALSE), list("a", "b", "c")) | ||
list(TRUE, FALSE), list("a", "b", "c"), | ||
list(1380742793415240, 1380742793415240)) | ||
y <- callJStatic("SparkRHandler", "echo", x) | ||
expect_equal(x, y) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3188,6 +3188,23 @@ test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", { | |
expect_equal(dbs[[1]], "default") | ||
}) | ||
|
||
test_that("dapply with bigint type", { | ||
df <- createDataFrame( | ||
list(list(1380742793415240, 1, "1"), list(1380742793415240, 2, "2"), | ||
list(1380742793415240, 3, "3")), c("a", "b", "c")) | ||
schema <- structType(structField("a", "bigint"), structField("b", "bigint"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one tests bigint There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually, I'm not sure. |
||
structField("c", "string"), structField("d", "bigint")) | ||
df1 <- dapply( | ||
df, | ||
function(x) { | ||
y <- x[x[1] > 1, ] | ||
y <- cbind(y, y[1] + 1L) | ||
}, | ||
schema) | ||
result <- collect(df1) | ||
expect_equal(result$a[1], 1380742793415240) | ||
}) | ||
|
||
test_that("catalog APIs, listTables, listColumns, listFunctions", { | ||
tb <- listTables() | ||
count <- count(tables()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and another thing, there is no
bigint
in Rso I'm not sure how we would hit this path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When specifying schema with
bigint
, we will hit the bigint path. Without this change, it will thrown an error of type mismatch. But as you said, we can't specifybigint
type in R console.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If R doesn't have
bigint
type, we should remove allbigint
related logic. I don't know the history ofbigint
mapping in the Types.R file. Why should we have it since every big number is numeric (Double in the backend)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung Any thoughts? Thanks!
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are referring to https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25
like it says,
so
bigint
there is Scala type, not R typeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. But as you mentioned, we don't know how to trigger the write path on the R side, because both bigint and double are
numeric
. I think we can just remove the test in the R side.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is different though, for PRIMITIVE_TYPES, it is used when you create a schema with structField in R. In this case you can definitely define a column as bigint and then pass a R numeric value to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using createDataFrame, R uses
serialize
to send data to the backend. When taking an action, say,collect
, scala side logic refers to the schema field and calls thereadTypedObjects
where the newly added read logic kicks in. When it returns back to R side, the newly added write logic kicks in and R side can interpret it due to the R side read logic. It seems that thewrite
logic in R side is not called, because we don't have specific typebigint
in R. Right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness purpose, I think we can keep the write logic in R side.