Skip to content

[SPARK-8223][SPARK-8224][SQL] shift left and shift right #7178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

tarekbecker
Copy link
Contributor

Jira:
https://issues.apache.org/jira/browse/SPARK-8223
https://issues.apache.org/jira/browse/SPARK-8224

I am aware of #7174 and will update this pr, if it's merged. Done
I don't know if #7034 can simplify this, but we can have a look on it, if it gets merged

@rxin In the Jira ticket the function as no second argument. I added a numBits argument that allows to specify the number of bits. I guess this improves the usability. I wanted to add shiftleft(value) as well, but the selectExpr dataframe tests crashes, if I have both. I order to do this, I added the following to the functions.scala def shiftRight(e: Column): Column = ShiftRight(e.expr, lit(1).expr), but as I mentioned this doesn't pass tests like df.selectExpr("shiftRight(a)", ... (not enough arguments exception).

If we need the bitwise shift in order to be hive compatible, I suggest to add shiftLeft and something like shiftLeftX

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
@tarekbecker
Copy link
Contributor Author

@davies can you review this, if you have time?

@since(1.5)
def shiftLeft(col, numBits):
"""Shift the the given value numBits left. Returns int for tinyint, smallint and int and
bigint for bigint a.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot understand the last sentence

@tarekbecker
Copy link
Contributor Author

Thanks for your feedback. I removed the type information from the python description and changed it for the dataframe api. I hope it's clear now.

One comment to python: The max integer value is the max long value of Java/Scala. Because of that there is no value in specifying the result type for python.

>>> type(sqlContext.createDataFrame([(sys.maxint,)], ['a']).select(shiftLeft('a', 1).alias('r')).first().asDict().get('r'))
<type 'int'>

@davies
Copy link
Contributor

davies commented Jul 2, 2015

In python 3, there is no long type. So we always use int in Python for all integral types.

case l: Long => l >> valueRight.asInstanceOf[Integer]
case i: Integer => i >> valueRight.asInstanceOf[Integer]
case s: Short => s >> valueRight.asInstanceOf[Integer]
case b: Byte => b >> valueRight.asInstanceOf[Integer]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use >>> (keeping the sign bit) or >> ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a special Jira ticket for that: https://issues.apache.org/jira/browse/SPARK-8226. Someone created already a PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pattern matching the data type? instead of the value? Will that cause extra box/unbox for primtives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a good hint. I am going to take a look on the generated code and will come back to this and create maybe a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenghao-intel I investigated it a little bit, see the gist: https://gist.github.com/tarekauel/6994983b83a51668c5dc . The interesting part is that the match on the value is even faster, did I something wrong?

@davies
Copy link
Contributor

davies commented Jul 2, 2015

LGTM.

OK to test

@tarekbecker
Copy link
Contributor Author

@davies I guess Jenkins didn't got it.

@davies
Copy link
Contributor

davies commented Jul 2, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #993 has started for PR 7178 at commit f3f64e6.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36365 has started for PR 7178 at commit f3f64e6.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #993 has finished for PR 7178 at commit f3f64e6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Heartbeat(workerId: String, worker: RpcEndpointRef) extends DeployMessage
    • case class RegisteredWorker(master: RpcEndpointRef, masterWebUiUrl: String) extends DeployMessage
    • case class RegisterApplication(appDescription: ApplicationDescription, driver: RpcEndpointRef)
    • case class RegisteredApplication(appId: String, master: RpcEndpointRef) extends DeployMessage
    • case class SubmitDriverResponse(
    • case class KillDriverResponse(
    • case class MasterChanged(master: RpcEndpointRef, masterWebUiUrl: String)
    • class DCT(override val uid: String)
    • class MinMaxScaler(override val uid: String)
    • class PCA (override val uid: String) extends Estimator[PCAModel] with PCAParams
    • class StreamingLinearAlgorithm(object):
    • class StreamingLinearRegressionWithSGD(StreamingLinearAlgorithm):
    • class AnalysisException(Exception):
    • class FlumeUtils(object):
    • case class Cast(child: Expression, dataType: DataType) extends UnaryExpression with Logging
    • trait ExpectsInputTypes
    • trait AutoCastInputTypes
    • abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression]
    • abstract class BinaryOperator extends BinaryExpression
    • abstract class BinaryArithmetic extends BinaryOperator
    • class SpecificOrdering extends $
    • class SpecificProjection extends $
    • final class SpecificRow extends $
    • case class ShiftLeft(left: Expression, right: Expression) extends BinaryExpression
    • case class ShiftRight(left: Expression, right: Expression) extends BinaryExpression
    • case class UnHex(child: Expression) extends UnaryExpression with Serializable
    • case class Crc32(child: Expression)
    • abstract class BinaryComparison extends BinaryOperator with Predicate
    • // compiled class file for the closure here will conflict with the one in callUDF (upper case).

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36365 has finished for PR 7178 at commit f3f64e6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36373 has started for PR 7178 at commit 8023bb5.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36373 has finished for PR 7178 at commit 8023bb5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ShiftLeft(left: Expression, right: Expression) extends BinaryExpression
    • case class ShiftRight(left: Expression, right: Expression) extends BinaryExpression

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@davies
Copy link
Contributor

davies commented Jul 2, 2015

Merged into master, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants