[SPARK-8223][SPARK-8224][SQL] shift left and shift right #7178

tarekbecker · 2015-07-02T04:26:43Z

Jira:
https://issues.apache.org/jira/browse/SPARK-8223
https://issues.apache.org/jira/browse/SPARK-8224

~~I am aware of #7174 and will update this pr, if it's merged.~~ Done
I don't know if #7034 can simplify this, but we can have a look on it, if it gets merged

@rxin In the Jira ticket the function as no second argument. I added a numBits argument that allows to specify the number of bits. I guess this improves the usability. I wanted to add shiftleft(value) as well, but the selectExpr dataframe tests crashes, if I have both. I order to do this, I added the following to the functions.scala def shiftRight(e: Column): Column = ShiftRight(e.expr, lit(1).expr), but as I mentioned this doesn't pass tests like df.selectExpr("shiftRight(a)", ... (not enough arguments exception).

If we need the bitwise shift in order to be hive compatible, I suggest to add shiftLeft and something like shiftLeftX

Rebase in order to incorporate changes of [SPARK-8770]

AmplabJenkins · 2015-07-02T04:28:10Z

Can one of the admins verify this patch?

# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala

tarekbecker · 2015-07-02T05:34:16Z

@davies can you review this, if you have time?

davies · 2015-07-02T06:03:18Z

python/pyspark/sql/functions.py

+@since(1.5)
+def shiftLeft(col, numBits):
+    """Shift the the given value numBits left. Returns int for tinyint, smallint and int and
+    bigint for bigint a.


Cannot understand the last sentence

tarekbecker · 2015-07-02T06:57:20Z

Thanks for your feedback. I removed the type information from the python description and changed it for the dataframe api. I hope it's clear now.

One comment to python: The max integer value is the max long value of Java/Scala. Because of that there is no value in specifying the result type for python.

>>> type(sqlContext.createDataFrame([(sys.maxint,)], ['a']).select(shiftLeft('a', 1).alias('r')).first().asDict().get('r'))
<type 'int'>

davies · 2015-07-02T07:07:36Z

In python 3, there is no long type. So we always use int in Python for all integral types.

davies · 2015-07-02T07:09:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala

+          case l: Long => l >> valueRight.asInstanceOf[Integer]
+          case i: Integer => i >> valueRight.asInstanceOf[Integer]
+          case s: Short => s >> valueRight.asInstanceOf[Integer]
+          case b: Byte => b >> valueRight.asInstanceOf[Integer]


Should we use >>> (keeping the sign bit) or >> ?

There is a special Jira ticket for that: https://issues.apache.org/jira/browse/SPARK-8226. Someone created already a PR.

Why not pattern matching the data type? instead of the value? Will that cause extra box/unbox for primtives?

That might be a good hint. I am going to take a look on the generated code and will come back to this and create maybe a follow-up.

@chenghao-intel I investigated it a little bit, see the gist: https://gist.github.com/tarekauel/6994983b83a51668c5dc . The interesting part is that the match on the value is even faster, did I something wrong?

davies · 2015-07-02T07:21:20Z

LGTM.

OK to test

tarekbecker · 2015-07-02T07:33:02Z

@davies I guess Jenkins didn't got it.

davies · 2015-07-02T07:34:42Z

ok to test

SparkQA · 2015-07-02T07:35:43Z

Test build #993 has started for PR 7178 at commit f3f64e6.

AmplabJenkins · 2015-07-02T07:38:11Z

Merged build triggered.

AmplabJenkins · 2015-07-02T07:38:20Z

Merged build started.

SparkQA · 2015-07-02T07:39:24Z

Test build #36365 has started for PR 7178 at commit f3f64e6.

SparkQA · 2015-07-02T07:52:10Z

Test build #993 has finished for PR 7178 at commit f3f64e6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class Heartbeat(workerId: String, worker: RpcEndpointRef) extends DeployMessage
- case class RegisteredWorker(master: RpcEndpointRef, masterWebUiUrl: String) extends DeployMessage
- case class RegisterApplication(appDescription: ApplicationDescription, driver: RpcEndpointRef)
- case class RegisteredApplication(appId: String, master: RpcEndpointRef) extends DeployMessage
- case class SubmitDriverResponse(
- case class KillDriverResponse(
- case class MasterChanged(master: RpcEndpointRef, masterWebUiUrl: String)
- class DCT(override val uid: String)
- class MinMaxScaler(override val uid: String)
- class PCA (override val uid: String) extends Estimator[PCAModel] with PCAParams
- class StreamingLinearAlgorithm(object):
- class StreamingLinearRegressionWithSGD(StreamingLinearAlgorithm):
- class AnalysisException(Exception):
- class FlumeUtils(object):
- case class Cast(child: Expression, dataType: DataType) extends UnaryExpression with Logging
- trait ExpectsInputTypes
- trait AutoCastInputTypes
- abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression]
- abstract class BinaryOperator extends BinaryExpression
- abstract class BinaryArithmetic extends BinaryOperator
- class SpecificOrdering extends $
- class SpecificProjection extends $
- final class SpecificRow extends $
- case class ShiftLeft(left: Expression, right: Expression) extends BinaryExpression
- case class ShiftRight(left: Expression, right: Expression) extends BinaryExpression
- case class UnHex(child: Expression) extends UnaryExpression with Serializable
- case class Crc32(child: Expression)
- abstract class BinaryComparison extends BinaryOperator with Predicate
- // compiled class file for the closure here will conflict with the one in callUDF (upper case).

SparkQA · 2015-07-02T07:54:30Z

Test build #36365 has finished for PR 7178 at commit f3f64e6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-07-02T07:54:37Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-07-02T08:23:12Z

Merged build triggered.

AmplabJenkins · 2015-07-02T08:23:21Z

Merged build started.

SparkQA · 2015-07-02T08:27:06Z

Test build #36373 has started for PR 7178 at commit 8023bb5.

SparkQA · 2015-07-02T10:08:14Z

Test build #36373 has finished for PR 7178 at commit 8023bb5.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ShiftLeft(left: Expression, right: Expression) extends BinaryExpression
- case class ShiftRight(left: Expression, right: Expression) extends BinaryExpression

AmplabJenkins · 2015-07-02T10:08:52Z

Merged build finished. Test PASSed.

davies · 2015-07-02T17:02:55Z

Merged into master, thanks!

tarekbecker added 3 commits July 1, 2015 21:05

[SPARK-8223][SPARK-8224] right and left bit shift

ac7fe9d

[SPARK-8223][SPARK-8224] docu fix

44ee324

Merge remote-tracking branch 'origin/master' into 8223

9434a28

Rebase in order to incorporate changes of [SPARK-8770]

tarekbecker added 2 commits July 1, 2015 21:45

[SPARK-8223][SPARK-8224] minor fix and style fix

5189690

Merge remote-tracking branch 'origin/master' into 8223

3b56f2a

# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala

davies reviewed Jul 2, 2015
View reviewed changes

[SPARK-8223][SPARK-8224] removed toString; updated function description

f628706

[SPARK-8223][SPARK-8224] Integer -> Int

f3f64e6

davies reviewed Jul 2, 2015
View reviewed changes

[SPARK-8223][SPARK-8224] fixed test

8023bb5

asfgit closed this in 5b33381 Jul 2, 2015

zhichao-li mentioned this pull request Jul 3, 2015

[SPARK-8226][SQL]Add function shiftrightunsigned #7035

Closed

[SPARK-8223][SPARK-8224][SQL] shift left and shift right #7178

[SPARK-8223][SPARK-8224][SQL] shift left and shift right #7178

Uh oh!

Conversation

tarekbecker commented Jul 2, 2015

Uh oh!

AmplabJenkins commented Jul 2, 2015

Uh oh!

tarekbecker commented Jul 2, 2015

Uh oh!

davies Jul 2, 2015

Choose a reason for hiding this comment

Uh oh!

tarekbecker commented Jul 2, 2015

Uh oh!

davies commented Jul 2, 2015

Uh oh!

davies Jul 2, 2015

Choose a reason for hiding this comment

Uh oh!

tarekbecker Jul 2, 2015

Choose a reason for hiding this comment

Uh oh!

davies Jul 2, 2015

Choose a reason for hiding this comment

Uh oh!

chenghao-intel Jul 3, 2015

Choose a reason for hiding this comment

Uh oh!

tarekbecker Jul 3, 2015

Choose a reason for hiding this comment

Uh oh!

tarekbecker Jul 3, 2015

Choose a reason for hiding this comment

Uh oh!

davies commented Jul 2, 2015

Uh oh!

tarekbecker commented Jul 2, 2015

Uh oh!

davies commented Jul 2, 2015

Uh oh!

SparkQA commented Jul 2, 2015

Uh oh!

AmplabJenkins commented Jul 2, 2015

Uh oh!

AmplabJenkins commented Jul 2, 2015

Uh oh!

SparkQA commented Jul 2, 2015

Uh oh!

SparkQA commented Jul 2, 2015

Uh oh!

SparkQA commented Jul 2, 2015

Uh oh!

AmplabJenkins commented Jul 2, 2015

Uh oh!

AmplabJenkins commented Jul 2, 2015

Uh oh!

AmplabJenkins commented Jul 2, 2015

Uh oh!

SparkQA commented Jul 2, 2015

Uh oh!

SparkQA commented Jul 2, 2015

Uh oh!

AmplabJenkins commented Jul 2, 2015

Uh oh!

davies commented Jul 2, 2015

Uh oh!

Uh oh!