-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-10048][SPARKR] Support arbitrary nested Java array in serde. #8276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #41127 has finished for PR 8276 at commit
|
@@ -210,22 +213,31 @@ private[spark] object SerDe { | |||
writeType(dos, "void") | |||
} else { | |||
value.getClass.getName match { | |||
case "java.lang.Character" => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Just for completeness of handling of primitive types.
@sun-rui Generally, the changes looks good to me, could you add unit tests for ArrayType? Do we want to support create create DataFrame from ArrayType (could be another PR). |
@davies , that will be done in another PR. |
@sun-rui We can do the |
Test build #41188 has finished for PR 8276 at commit
|
BTW does this fix SPARK-9302 as well or does that require |
@shivaram, now ArrayType in a DataFrame is still not supported, as ArrayType's class is something like scala.collection.mutable.WrappedArray$ofRef, it will be passed as a jobj to R side. So some conversion needs to be done to covert it to Java Array, so that SerDe can pass its content to R side. I planned to do it in another PR as https://issues.apache.org/jira/browse/SPARK-10049, do you want me to do it in this PR? As for SPARK-9302, it needs support for both ArrayType and StructType. Once these two types are supported, it should be fixed. |
I see. I think a separate PR for ArrayType is fine, its just harder to review this change if we can't test it. Its ok, I will take one more closer look tomorrow |
Test build #41220 timed out for PR 8276 at commit |
@shivaram, I tried to support ArrayType. By adding code like: Adding support for ArrayType would make this PR big and hard to review. I would be better you could look at this PR and merge it. I know you have a concern as we don't have test cases for this PR. Maybe I can add test cases for SerDe: |
Its fine. Lets not complicate this PR with the |
if (nrow <= 0) { | ||
df <- data.frame() | ||
} else { | ||
df <- data.frame(row.names = c(1 : nrow)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess 1:nrow
is enough here ? (no need for c()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are correct
I took a more detailed look at the code and I only had some minor comments inline. So I think it looks pretty good but I think taking this opportunity to add some tests to SerDe is a good idea. |
will add test cases. |
Test build #41301 has finished for PR 8276 at commit
|
val obj: Object = row(idx).asInstanceOf[Object] | ||
SerDe.writeObject(dos, obj) | ||
} | ||
val cols = (0 until row.length).map { idx => row(idx).asInstanceOf[Object]}.toArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have one extra space after [Object]
before }
. In fact if the whole map fits on one line you can just use map(idx => ...).toArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@shivaram , test cases for SerDe added. Now the SerDe does not support transferring a list of different element types from R side to JVM side. Let's leave it for future's PR. |
Test build #41445 has finished for PR 8276 at commit
|
Test build #41446 has finished for PR 8276 at commit
|
rebased to master |
Test build #41453 timed out for PR 8276 at commit |
Jenkins, retest this please |
Test build #41462 has finished for PR 8276 at commit
|
Jenkins, retest this please |
Test build #41469 has finished for PR 8276 at commit
|
@sun-rui looks like the test failures are related to this PR |
The test passed on my machine, I don't know the reason. Anyway, add spark context initialization into test_Serde to see if it can pass on Jenkins. |
x <- list(list(1L, 2L, 3L), list(1, 2, 3), | ||
list(TRUE, FALSE), list("a", "b", "c")) | ||
y <- callJStatic("SparkRHandler", "echo", x) | ||
expect_equal(x, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add some tests with empty columns / empty lists (as we have some code paths just to handle these)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
Test build #41504 timed out for PR 8276 at commit |
Test build #41518 has finished for PR 8276 at commit
|
Jenkins, retest this please |
LGTM |
Test build #41537 has finished for PR 8276 at commit
|
Alright I'm merging this to master. Note that I'm not porting this branch-1.5 as I think this is a relatively big change and it doesn't have any immediate feature benefits. |
This PR:
from a DataFrame.