[SPARK-8492] [SQL] support binaryType in UnsafeRow#6911
[SPARK-8492] [SQL] support binaryType in UnsafeRow#6911davies wants to merge 7 commits intoapache:masterfrom
Conversation
|
cc @JoshRosen |
|
Test build #35339 has finished for PR 6911 at commit
|
|
Test build #35346 has finished for PR 6911 at commit
|
|
The 2GB row limit isn't an issue since we already implicitly have that limit in |
|
Test build #35357 has finished for PR 6911 at commit
|
|
Test build #35359 has finished for PR 6911 at commit
|
|
Test build #943 has finished for PR 6911 at commit
|
|
Test build #947 has finished for PR 6911 at commit
|
There was a problem hiding this comment.
I guess the precedence is kind of obvious from usage / context, but it wouldn't hurt to add parens to disambiguate the order in which the shifts are applied.
|
This looks good to me overall. I like the idea of storing the length in the fixed-length values section alongside the pointer to the variable-length data. I wonder whether there's a natural point to document / explicitly call out this encoding, though, in order to make it a bit more obvious to any new readers of this file. |
There was a problem hiding this comment.
Do we need to mask out the upper 32 bits before converting to a long? I guess the uppermost bit probably can't be 1 because the offset can't be negative, so I guess we don't need to worry about sign-extension during the shift.
Conflicts: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
|
Test build #35479 has finished for PR 6911 at commit
|
|
Test build #35480 has finished for PR 6911 at commit
|
|
Merged into master |
Support BinaryType in UnsafeRow, just like StringType. Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM). Author: Davies Liu <davies@databricks.com> Closes apache#6911 from davies/unsafe_bin and squashes the following commits: d68706f [Davies Liu] update comment 519f698 [Davies Liu] address comment 98a964b [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_bin 180b49d [Davies Liu] fix zero-out 22e4c0a [Davies Liu] zero-out padding bytes 6abfe93 [Davies Liu] fix style 447dea0 [Davies Liu] support binaryType in UnsafeRow
Support BinaryType in UnsafeRow, just like StringType.
Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM).