Skip to content

Commit 13a4cdb

Browse files
committed
Aded ManagedBuffer special-case to SizeEstimator.
The SizeEstimator's code for estimating object size results in a size that is dramatically too large for ManagedBuffers. This commit adds a special case to the SizeEstimator, that just uses the size() method when estimate() is called on a ManagedBuffer. Fixes issue apache#23.
1 parent 560cf18 commit 13a4cdb

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

core/src/main/scala/org/apache/spark/util/SizeEstimator.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ import java.util.concurrent.ConcurrentHashMap
2828
import scala.collection.mutable.ArrayBuffer
2929

3030
import org.apache.spark.Logging
31+
import org.apache.spark.network.buffer.ManagedBuffer
3132
import org.apache.spark.util.collection.OpenHashSet
3233

3334
/**
@@ -171,6 +172,9 @@ private[spark] object SizeEstimator extends Logging {
171172
// Hadoop JobConfs created in the interpreter have a ClassLoader, which greatly confuses
172173
// the size estimator since it references the whole REPL. Do nothing in this case. In
173174
// general all ClassLoaders and Classes will be shared between objects anyway.
175+
} else if (obj.isInstanceOf[ManagedBuffer]) {
176+
// ManagedBuffers also greatly confuse the size estimator, so just rely on the buffer's size.
177+
state.size += obj.asInstanceOf[ManagedBuffer].size()
174178
} else {
175179
val classInfo = getClassInfo(cls)
176180
state.size += classInfo.shellSize

0 commit comments

Comments
 (0)