manually inline IntArrayList for BitOps.Writer perf#27
manually inline IntArrayList for BitOps.Writer perf#27
Conversation
|
That's odd because that diff came with improved perf on my machine, at the time. |
|
maybe masked by other improvements? |
isaacl
left a comment
There was a problem hiding this comment.
This patch is negative on my machine.
jmh:run -wi 3 -i 3 -f2 OverallEncodingTest.testEncode
- master (e64faf9)
239.229 ±(99.9%) 47.073 ns/op [Average]
(min, avg, max) = (217.726, 239.229, 259.115), stdev = 16.787
CI (99.9%): [192.156, 286.302] (assumes normal distribution)
- pr/27:
345.767 ±(99.9%) 136.290 ns/op [Average]
(min, avg, max) = (278.333, 345.767, 389.225), stdev = 48.602
CI (99.9%): [209.477, 482.057] (assumes normal distribution)
| final class Writer: | ||
| private val buffer = new IntArrayList() | ||
| final class Writer(initialCapacity: Int = 10): | ||
| private var buffer = Array[Int](initialCapacity) |
There was a problem hiding this comment.
😱 This makes Array of length 1 ([10])
|
So with that bug, it turns out we're essentially benchmarking initialCapacity 10 vs 1. And indeed I see the same improvement from just --- a/src/main/scala/BitOps.scala
+++ b/src/main/scala/BitOps.scala
@@ -96,7 +96,7 @@ object BitOps:
bb.array
private final class IntArrayList:
- private var data = new Array[Int](10)
+ private var data = new Array[Int](1)
private var index = 0
def add(elt: Int): Unit =as from this PR. Now it's still weird that our PCs have different tastes with regard to that value, but it makes no sense to tune this for the fixed input size in the benchmark in any case. |
|
Agree on not optimizing initial array length to this benchmark. But given you see a difference from initial length, I wonder if initial size and growth factor are tuned well. These were copied from ArrayList, but there they are worried about memory usage of a long living object. OTOH, IntArrayList is always transient. So maybe it should have larger initial capacity and/or grow faster? |
|
Or maybe have the encoders provide a guess for the required capacity, probably slightly overestimating, so that the growth strategy would rarely be used? |
so for me, all benchmarks are now improved since
3.1.2~4, exceptOverallEncodingTest.testEncode(withbenchmarks/jmh:run -i 4 -wi 5 -f10):it comes down the the attached diff:
@isaacl thoughts? could be worth keeping as is, because the separate
IntArrayListis a bit clearer. or maybe we should merge this? or maybe you can spot something in the diff? or maybe it's just a thing on my machine?