[SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap

sylvinus · rxin · commit f46a074510e4 · 2016-07-28T09:53:06.000-07:00
Avoid overflow of Long type causing a NegativeArraySizeException a few lines later. Unit tests for HashedRelationSuite still pass. I can confirm the python script I included in https://issues.apache.org/jira/browse/SPARK-16740 works fine with this patch. Unfortunately I don't have the knowledge/time to write a Scala test case for HashedRelationSuite right now. As the patch is pretty obvious I hope it can be included without this. Thanks! Author: Sylvain Zimmer <sylvain@sylvainzimmer.com> Closes #14373 from sylvinus/master. (cherry picked from commit 1178d61) Signed-off-by: Reynold Xin <rxin@databricks.com>
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
@@ -608,7 +608,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
   def optimize(): Unit = {
     val range = maxKey - minKey
     // Convert to dense mode if it does not require more memory or could fit within L1 cache
-    if (range < array.length || range < 1024) {
+    // SPARK-16740: Make sure range doesn't overflow if minKey has a large negative value
+    if (range >= 0 && (range < array.length || range < 1024)) {
       try {
         ensureAcquireMemory((range + 1) * 8L)
       } catch {