[SPARK-24834][CORE] use java comparison for float and double #21794

bavardage · 2018-07-17T14:26:12Z

The methods in Utils, nanSafeCompareDoubles and nanSafeCompareFloats have
the same semantics as java's java.lang.Double.compare and java.lang.Float.compare
respectively.

What changes were proposed in this pull request?

Remove the two functions in Utils, and replace the call sites (only two for each function) with the corresponding java comparison function.

How was this patch tested?

Existing spark tests should cover this.

The methods in `Util`, `nanSafeCompareDoubles` and `nanSafeCompareFloats` had the same semantics as java's `java.lang.Double.compare` and `java.lang.Float.compare` respectively.

AmplabJenkins · 2018-07-17T14:28:20Z

Can one of the admins verify this patch?

bavardage · 2018-07-17T14:29:48Z

cc @JoshRosen who introduced this code originally, I think

bavardage · 2018-07-17T15:34:29Z

There is a way in which the nanSafeCompare* methods do differ from java built-in, but that wasn't captured in the test cases (or in the suggestive naming of the method), namely the handling of -0.0 vs 0.0.

java.lang.Double.compare(-0.0d, 0.0d) == -1
Utils.nanSafeCompareDoubles(-0.0d, 0.0d) == 0

If that behaviour is indeed required, I could abandon this PR and instead fix the test cases to capture this.

kiszk · 2018-07-17T16:05:05Z

I think this is required since SparkSQL does not distinguish 0.0 from -0.0. Am I correct?
cc @gatorsmile @maropu

kiszk · 2018-07-17T16:07:48Z

It would be good to add test cases for them since it is not covered now.

srowen · 2018-07-17T16:37:20Z

BTW if it becomes necessary to not change the semantics, I think the methods could at least be streamlined a bit:

if (x < y) {
  -1
} else if (x > y) {
  1
} else if (x == y) {
  0
} else {
  if (java.lang.Double.isNaN(x)) {
    if (java.lang.Double.isNaN(y)) {
      0
    } else {
      1
    }
  } else {
    -1
  }
}

More tests can't hurt, too.

bavardage · 2018-07-18T20:55:25Z

it does seem that spark currently does distinguish -0 and 0, at least as far as groupbys go

scala> case class Thing(x : Float)
defined class Thing

scala> val df = Seq(Thing(0.0f), Thing(-0.0f),Thing(0.0f), Thing(-0.0f),Thing(0.0f), Thing(-0.0f),Thing(0.0f), Thing(-0.0f)).toDF
2018-07-17 13:17:08 WARN  ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
df: org.apache.spark.sql.DataFrame = [x: float]

scala> df.groupBy($"x").count
res0: org.apache.spark.sql.DataFrame = [x: float, count: bigint]

scala> res0.collect
res1: Array[org.apache.spark.sql.Row] = Array([-0.0,4], [0.0,4])

doubles are hashed via doubleToLongBits https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L338 which gives different bitwise representations of positive and negative 0

srowen · 2018-07-18T21:04:40Z

spark-sql suggests that -0 and 0 are considered the same though. SELECT -0.0 == 0.0; returns true. It's probably essential not to change behavior here, but if performance is the issue, I think the method can be optimized.

srowen · 2018-08-17T04:17:16Z

I think we'd have to close this due to the behavior change, but would merge an optimization of the existing behavior.

bavardage · 2018-08-22T08:41:51Z

yep fair - the intent I think was clarity rather than necessarily perf: it's misleading to have a method named 'nan safe' which has no special handling of nans. I'll look at opening a different PR which could increase clarity/may have minor perf benefit.

use java comparison for float and double

43435b9

The methods in `Util`, `nanSafeCompareDoubles` and `nanSafeCompareFloats` had the same semantics as java's `java.lang.Double.compare` and `java.lang.Float.compare` respectively.

bavardage changed the title ~~[SPARK-24834] use java comparison for float and double~~ [SPARK-24834][spark-core] use java comparison for float and double Jul 17, 2018

bavardage changed the title ~~[SPARK-24834][spark-core] use java comparison for float and double~~ [SPARK-24834][CORE] use java comparison for float and double Jul 17, 2018

srowen mentioned this pull request Aug 20, 2018

[BUILD] Close stale PRs #22159

Closed

asfgit closed this in b8788b3 Aug 21, 2018

srowen mentioned this pull request Nov 15, 2018

[SPARK-26021][SQL] replace minus zero with zero in Platform.putDouble/Float #23043

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-24834][CORE] use java comparison for float and double #21794

[SPARK-24834][CORE] use java comparison for float and double #21794

Uh oh!

bavardage commented Jul 17, 2018 •

edited

Loading

Uh oh!

AmplabJenkins commented Jul 17, 2018

Uh oh!

bavardage commented Jul 17, 2018

Uh oh!

bavardage commented Jul 17, 2018

Uh oh!

kiszk commented Jul 17, 2018

Uh oh!

kiszk commented Jul 17, 2018

Uh oh!

srowen commented Jul 17, 2018

Uh oh!

bavardage commented Jul 18, 2018

Uh oh!

srowen commented Jul 18, 2018

Uh oh!

srowen commented Aug 17, 2018

Uh oh!

bavardage commented Aug 22, 2018

Uh oh!

Uh oh!

[SPARK-24834][CORE] use java comparison for float and double #21794

[SPARK-24834][CORE] use java comparison for float and double #21794

Uh oh!

Conversation

bavardage commented Jul 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Jul 17, 2018

Uh oh!

bavardage commented Jul 17, 2018

Uh oh!

bavardage commented Jul 17, 2018

Uh oh!

kiszk commented Jul 17, 2018

Uh oh!

kiszk commented Jul 17, 2018

Uh oh!

srowen commented Jul 17, 2018

Uh oh!

bavardage commented Jul 18, 2018

Uh oh!

srowen commented Jul 18, 2018

Uh oh!

srowen commented Aug 17, 2018

Uh oh!

bavardage commented Aug 22, 2018

Uh oh!

Uh oh!

bavardage commented Jul 17, 2018 •

edited

Loading