-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-8223][SPARK-8224][SQL] shift left and shift right #7178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Rebase in order to incorporate changes of [SPARK-8770]
Can one of the admins verify this patch? |
# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
@davies can you review this, if you have time? |
@since(1.5) | ||
def shiftLeft(col, numBits): | ||
"""Shift the the given value numBits left. Returns int for tinyint, smallint and int and | ||
bigint for bigint a. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot understand the last sentence
Thanks for your feedback. I removed the type information from the python description and changed it for the dataframe api. I hope it's clear now. One comment to python: The max integer value is the max long value of Java/Scala. Because of that there is no value in specifying the result type for python.
|
In python 3, there is no |
case l: Long => l >> valueRight.asInstanceOf[Integer] | ||
case i: Integer => i >> valueRight.asInstanceOf[Integer] | ||
case s: Short => s >> valueRight.asInstanceOf[Integer] | ||
case b: Byte => b >> valueRight.asInstanceOf[Integer] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use >>>
(keeping the sign bit) or >>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a special Jira ticket for that: https://issues.apache.org/jira/browse/SPARK-8226. Someone created already a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not pattern matching the data type? instead of the value? Will that cause extra box/unbox for primtives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be a good hint. I am going to take a look on the generated code and will come back to this and create maybe a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenghao-intel I investigated it a little bit, see the gist: https://gist.github.com/tarekauel/6994983b83a51668c5dc . The interesting part is that the match on the value is even faster, did I something wrong?
LGTM. OK to test |
@davies I guess Jenkins didn't got it. |
ok to test |
Test build #993 has started for PR 7178 at commit |
Merged build triggered. |
Merged build started. |
Test build #36365 has started for PR 7178 at commit |
Test build #993 has finished for PR 7178 at commit
|
Test build #36365 has finished for PR 7178 at commit
|
Merged build finished. Test FAILed. |
Merged build triggered. |
Merged build started. |
Test build #36373 has started for PR 7178 at commit |
Test build #36373 has finished for PR 7178 at commit
|
Merged build finished. Test PASSed. |
Merged into master, thanks! |
Jira:
https://issues.apache.org/jira/browse/SPARK-8223
https://issues.apache.org/jira/browse/SPARK-8224
I am aware of #7174 and will update this pr, if it's merged.DoneI don't know if #7034 can simplify this, but we can have a look on it, if it gets merged
@rxin In the Jira ticket the function as no second argument. I added a
numBits
argument that allows to specify the number of bits. I guess this improves the usability. I wanted to addshiftleft(value)
as well, but theselectExpr
dataframe tests crashes, if I have both. I order to do this, I added the following to the functions.scaladef shiftRight(e: Column): Column = ShiftRight(e.expr, lit(1).expr)
, but as I mentioned this doesn't pass tests likedf.selectExpr("shiftRight(a)", ...
(not enough arguments exception).If we need the bitwise shift in order to be hive compatible, I suggest to add
shiftLeft
and something likeshiftLeftX