[WIP]: Fix array growth thresholding #32035

NHDaly · 2019-05-15T15:18:36Z

To address problem (1) in #28588 (comment): Fixing the quadratic total insertion time complexity.

In this PR, currently I just always grow at a rate of 1.5x (after the number of elements is greater than 10). Happy to explore other options!

Closes #8269.

Instead of capping array growth to a constant (currently 1% of physical RAM), which leads to quadratic growth complexity, we simply lower the _rate_ of growth when we hit a threshold, maintaining the ammortized linear growth rate, but trading off slightly more CPU usage for more memory usage. This commit also increases the triggering threshold from 1% of physical RAM to 30% of physical RAM.

This should be the fastest possible option, I think, at the expense of larger memory use. On my machine, compared with master, this brings the time to push! 2^30 elements one at a time onto a 2^30-size array down from 1000s to 18s. The results of that calculation are below: Benchmark: ```julia for s in (2^27, 2^30,) vs = samerand(s) g["push_single_large!", s] = @benchmarkable push!(x, $(samerand())) setup=(x = copy($vs)) g["push_multiple_large!", s] = @benchmarkable perf_push_multiple!(x, $vs) setup=(x = copy($vs)) end ``` Results on `master`: ``` ("push_single_large!", 134217728) => Trial(768.667 ms) ("push_multiple_large!", 134217728) => Trial(6.169 s) ("push_single_large!", 1073741824) => Trial(28.277 s) ("push_multiple_large!", 1073741824) => Trial(1069.542 s) ``` Results after this commit: ``` ("push_single_large!", 134217728) => Trial(766.012 ms) ("push_multiple_large!", 134217728) => Trial(2.211 s) ("push_single_large!", 1073741824) => Trial(20.854 s) ("push_multiple_large!", 1073741824) => Trial(18.894 s) ``` (Note though that the very large numbers fluctuate wildly, since the process takes more virtual memory than I have physical memory, and is swapping out to disk.)

- has a small cutoff at 10 elements, because for small numbers 1.5x grows very often. (For example, 1*1.5 == 1, 2*1.5 == 3, 3*1.5 == 4)

JeffBezanson · 2019-05-15T17:37:07Z

It seems to me there should still be some kind of limit; 1% of RAM is just really stingy.

NHDaly · 2019-05-15T19:51:09Z

In case you didn't see it, I've summarized my latest thoughts in this comment:
#28588 (comment)

My main concern is that there should never be a constant-size growth increment; it should always be a scaling growth factor. As long as we do that, I'm happy.

I think we could consider lowering that growth factor based on the array size (to address your suggestion), but doing will still (I think) have implications on the time complexity. And given the benefits of virtual memory, it doesn't seem to actually be so terrible to let arrays grow larger than physical RAM anyway. Interested to hear your thoughts! :)

JeffBezanson · 2019-05-16T04:19:02Z

Ok, that's a good argument. I'm ok with just doing 1.5x growth for now. I'd also really like to use realloc again. Large allocations are highly likely to be well aligned, so the try-and-check approach seems safe enough to me.

StefanKarpinski · 2019-05-16T04:51:02Z

Isn't the golden ratio supposed to be the optimal growth factor?

JeffBezanson · 2019-05-17T18:37:44Z

Yes, the golden ratio is optimal, and I suppose we don't mind using floating point, so we could use it. I wonder why it's not often used in practice though. Maybe you want some wiggle room in case the new size doesn't exactly fit for implementation reasons, or in case part of the old space has been allocated to something else?

KristofferC · 2019-05-17T18:55:23Z

Ref #16305

JeffBezanson · 2019-05-17T19:06:51Z

Ah yes. That discussion also reminds me that we should grow faster (e.g. 2x) up to some not-too-big, not-too-small threshold like 1000. Under that size wasting memory doesn't matter as much, but doing less reallocation will probably bring measurable speedups.

vtjnash · 2019-05-17T19:20:43Z

Yes, fwiw, we discussed during JuliaCon using some other growth function like n + k sqrt n or n + k log n, because they can be made to exhibit the behavior you describe but with smoother cut-off points. It's then straight forward to pick the cross-over point such that it strictly superior to the current exponential growth strategy for all finite array sizes.

oscardssmith · 2019-05-20T21:32:35Z

@JeffBezanson The main reason the golden ratio is typically not used is that arrays are not that frequently grown, and if anything gets allocated in between it would go badly. IIRC, python uses 1.125 because they looked at a bunch of real world code and found that repeated pushes aren't that frequent.

JeffBezanson · 2019-05-20T21:41:53Z

I agree that it's probably rare to do lots of unpredictable pushes to an array over its lifetime, but the case I worry about is where push! is used to initially populate an array. In that case growing too slowly to the final size can hurt performance quite a bit. 1.125 seems a bit too small for that case.

StefanKarpinski · 2019-05-22T19:41:39Z

How about using powers of two until the array size is an entire OS page? That should help avoid fragmentation since powers of two are easier to pack and it makes the growth for small dicts higher, addressing what Jeff is concerned about. Once the array size is ≥ a page, one can use any growth fact at all since it will always be in terms of whole pages.

NHDaly · 2019-05-22T20:50:52Z

@JeffBezanson The main reason the golden ratio is typically not used is that arrays are not that frequently grown, and if anything gets allocated in between it would go badly. IIRC, python uses 1.125 because they looked at a bunch of real world code and found that repeated pushes aren't that frequent.

Makes sense. Thanks for the explanation! :)

Yes, fwiw, we discussed during JuliaCon using some other growth function like n + k sqrt n or n + k log n, because they can be made to exhibit the behavior you describe but with smoother cut-off points. It's then straight forward to pick the cross-over point such that it strictly superior to the current exponential growth strategy for all finite array sizes.

@vtjnash yeah, i was really excited about that at the time too. But after thinking about it more, I now think it's a bad idea, per the reasons i wrote in #28588 (comment), section "Changing growth factor based on physical RAM": Basically i think that if you shrink (even slightly) the size you grow by after each growth, you will end up with an ammortized insertion time bigger than O(1), I think probably O(n). I think you need to keep the growth rate constant in order to stay ammortized constant insert time.

How about using powers of two until the array size is an entire OS page? That should help avoid fragmentation since powers of two are easier to pack and it makes the growth for small dicts higher, addressing what Jeff is concerned about. Once the array size is ≥ a page, one can use any growth fact at all since it will always be in terms of whole pages.

This seems like a reasonable idea to me! I'll think about it more.

Also, lemme add @tveldhui for some more input. He's thought about this a bit as well.

tveldhui · 2019-05-22T21:08:17Z

Julia is designed to be high performance. To my mind, doubling the array size is most consistent with high performance, it reduces copies per element in situations where realloc doesn't just do MREMAP.

The physical memory limit seems a red herring. There aren't many people who are going to accidentally coincidentally create one single array that approaches the size of physical memory. Most people working on big interesting problems have many data structures of varying sizes.

But consider the case where someone does have an interesting problem that requires one huge vector. If you picture a log-scale graph of problem sizes, the ones right around physical memory size are a razor-thin transition. If your problem fits in memory, you want to minimize copying for performance reasons. As you approach physical memory size you've got swap and overcommit to fall back on. If you're tackling problems bigger than main memory and determined to use a single Array/Vector, you're going to be relying on NVMe swap or somesuch.

From what I can tell, the main motivation for picking a smaller growth size than 2x, or having one that trails off, is to allow VERY LARGE arrays get slightly closer to the physical RAM limit without OOMing on machines that don't have any swap available. This seems like a very rare use-case to me, and to me, it doesn't make sense to penalize 99.9% of use cases by choosing a lower-performance resizing heuristic to help a tiny fraction of users tackle marginally bigger problems before they OOM.

NHDaly · 2019-05-22T21:22:35Z

Ah, yeah, that's a good point. My main motivation for suggesting 1.5x instead of 2x was to avoid this packing problem leaving holes in memory. But I guess if we tackle this at the same time as fixing realloc to actually do realloc, then the packing problem might not even be a real problem anymore, in which case I'm also down for just constant 2x growth.

I'll try to add the realloc fix to this PR, and then maybe get some graphs of memory use and CPU use for different growth factors?

oscardssmith · 2019-05-22T22:21:43Z

@tveldhui the counter argument would be that frequent array growth is rare in performance critical work. (And if it matters, you can always manually set capacity). As a result the place memory efficiency matters most is large numbers of small arrays, which to me implies a fairly small growth factor consistently.

NHDaly · 2019-05-22T22:33:02Z

Oscar, for many small vectors I would feel even more strongly about wanting a larger growth factor. There's obviously a CPU performance vs memory tradeoff: a smaller growth rate would grow more often (more CPU expensive) in exchange for wasting less memory from overallocating. But from my perspective, the memory gains aren't that significant from a smaller growth rate. In the 2x growth scenario, if you assume the absolute worst case, every vector would be exacty twice as big as it should be. So in the absolute best case from a memory perspective, you only have a constant factor improvement to make by using a smaller rate. But you pay the cpu cost _every time_ you grow the array, so the potential cost there is large. And smaller arrays necessarily grow more often. So with many small arrays a lot of your CPU would be spent in the allocator. So for many small arrays, I would favor a larger growth rate even more strongly, since I wouldn't want to be regrowing them all the time every few inserts. Does that seem reasonable?

…

On Wed, May 22, 2019, 6:22 PM Oscar Smith ***@***.***> wrote: @tveldhui <https://github.com/tveldhui> the counter argument would be that frequent array growth is rare in performance critical work. (And if it matters, you can always manually set capacity). As a result the place memory efficiency matters most is large numbers of small arrays, which to me implies a fairly small growth factor consistently. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#32035?email_source=notifications&email_token=AAMCIELWNTJG2GKVP4QXMLTPWXBSDA5CNFSM4HNEL7A2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWAQUVA#issuecomment-494996052>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMCIENCB53HXQGAMUTBZFDPWXBSDANCNFSM4HNEL7AQ> .

timholy · 2019-05-22T23:07:50Z

@oscardssmith

IIRC, python uses 1.125 because they looked at a bunch of real world code and found that repeated pushes aren't that frequent.

I doubt that's directly translatable. In a slow language like Python you don't want to create arrays element-by-element because the interpreter cost kills you---you do everything you can to create arrays all at once. Julia doesn't have that overhead, so there is less disincentive to use push!. FWIW, there are currently 619 uses of push! in base/ alone; I didn't take the time to see how many of them occur inside a loop, but I bet it's not small.

tveldhui · 2019-05-23T00:48:44Z

@tveldhui the counter argument would be that frequent array growth is rare in performance critical work. (And if it matters, you can always manually set capacity). As a result the place memory efficiency matters most is large numbers of small arrays, which to me implies a fairly small growth factor consistently.

We're implementing an ultra high performance database engine/machine learning system in Julia. I work on database queries. (And for the record I am in love with Julia for this, we are outperforming the very best database engines single-core.) As you evaluate a query you're often doing vector push! to accumulate the results. With meticulous sampling and hairy algorithms you can approximate the expected result size, but that doesn't help you set an appropriate size for the vector of results - if your estimate is slightly under you still have to resize. Or you can go for an upper bound on the result size, which is computationally intensive and can be off by orders of magnitude (e.g. you reserve 100 times more space than needed). Database theory is a sticky wicket.

So I disagree very much that frequent array growth is rare and that you can predetermine capacity. Most of our cpu time is spent in queries where we cannot predict in advance how many results will be produced, so a fast push! is critical.

stevengj · 2020-09-17T14:51:13Z

src/array.c

-        return alen + inc + a->offset + (jl_arr_xtralloc_limit / es);
-    }
-    return newlen;
+    return curlen <= 10 ? curlen * 2 : curlen * 1.5;


Not that it matters much, but in #16305 I used ((curlen*3)>>1) to avoid floating-point conversion here.

musm · 2020-09-17T17:31:15Z

src/array.c

-        return alen + inc + a->offset + (jl_arr_xtralloc_limit / es);
-    }
-    return newlen;
+    return curlen <= 10 ? curlen * 2 : curlen * 1.5;


Should the threshold be increased to 1000 as per Jeff's comment?

Seems like it should be more like curlen * elsz <= 4096, e.g. about the typical page size (or I guess we could get the actual page size with sysconf).

musm · 2020-12-15T18:52:27Z

@NHDaly any interest in picking this up again? Almost all the arguments are in favor of growing the arrays by a factor less than 2x, and based on this and the previous PR a 1.5x growth factor was deemed a reasonable compromise.

eschnett · 2020-12-15T21:37:52Z

I hate to be that guy, but I notice that none of the arguments presented here are supported by either memory usage or run time benchmarks...

Benchmarks could include building Julia, running the test cases, or some sparse matrix algebra that might use push! to populate sparse matrices.

KristofferC · 2021-06-09T07:29:07Z

I think this is closed via #40453.

NHDaly added 4 commits May 6, 2019 19:05

Change growth rate from 2x to sqrt(2)*x at 25% RAM

d6dc839

Basically just always grow array by 1.5x

e09f4b9

- has a small cutoff at 10 elements, because for small numbers 1.5x grows very often. (For example, 1*1.5 == 1, 2*1.5 == 3, 3*1.5 == 4)

NHDaly changed the title ~~Nhdaly/array grow end memory growth threshold~~ [WIP]: Fix array growth thresholding May 15, 2019

NHDaly mentioned this pull request May 15, 2019

Array push! seems to grow amortized O(N)... #28588

Closed

KristofferC added the needs nanosoldier run This PR should have benchmarks run on it label May 15, 2019

stevengj mentioned this pull request Sep 17, 2020

close #8269: grow arrays incrementally by factors of 1.5 rather than 2 #16305

Closed

stevengj reviewed Sep 17, 2020

View reviewed changes

musm reviewed Sep 17, 2020

View reviewed changes

musm added the arrays [a, r, r, a, y, s] label Dec 15, 2020

vtjnash mentioned this pull request Apr 13, 2021

alter array alloc threshold to remove O(n^2) behavior #40453

Merged

KristofferC closed this Jun 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]: Fix array growth thresholding #32035

[WIP]: Fix array growth thresholding #32035

NHDaly commented May 15, 2019 •

edited by stevengj

Loading

JeffBezanson commented May 15, 2019

NHDaly commented May 15, 2019

JeffBezanson commented May 16, 2019

StefanKarpinski commented May 16, 2019

JeffBezanson commented May 17, 2019

KristofferC commented May 17, 2019

JeffBezanson commented May 17, 2019

vtjnash commented May 17, 2019

oscardssmith commented May 20, 2019

JeffBezanson commented May 20, 2019

StefanKarpinski commented May 22, 2019 •

edited

Loading

NHDaly commented May 22, 2019

tveldhui commented May 22, 2019

NHDaly commented May 22, 2019

oscardssmith commented May 22, 2019

NHDaly commented May 22, 2019 via email

timholy commented May 22, 2019

tveldhui commented May 23, 2019

stevengj Sep 17, 2020

musm Sep 17, 2020

stevengj Sep 17, 2020 •

edited

Loading

musm commented Dec 15, 2020 •

edited

Loading

eschnett commented Dec 15, 2020

KristofferC commented Jun 9, 2021

[WIP]: Fix array growth thresholding #32035

[WIP]: Fix array growth thresholding #32035

Conversation

NHDaly commented May 15, 2019 • edited by stevengj Loading

JeffBezanson commented May 15, 2019

NHDaly commented May 15, 2019

JeffBezanson commented May 16, 2019

StefanKarpinski commented May 16, 2019

JeffBezanson commented May 17, 2019

KristofferC commented May 17, 2019

JeffBezanson commented May 17, 2019

vtjnash commented May 17, 2019

oscardssmith commented May 20, 2019

JeffBezanson commented May 20, 2019

StefanKarpinski commented May 22, 2019 • edited Loading

NHDaly commented May 22, 2019

tveldhui commented May 22, 2019

NHDaly commented May 22, 2019

oscardssmith commented May 22, 2019

NHDaly commented May 22, 2019 via email

timholy commented May 22, 2019

tveldhui commented May 23, 2019

stevengj Sep 17, 2020

Choose a reason for hiding this comment

musm Sep 17, 2020

Choose a reason for hiding this comment

stevengj Sep 17, 2020 • edited Loading

Choose a reason for hiding this comment

musm commented Dec 15, 2020 • edited Loading

eschnett commented Dec 15, 2020

KristofferC commented Jun 9, 2021

NHDaly commented May 15, 2019 •

edited by stevengj

Loading

StefanKarpinski commented May 22, 2019 •

edited

Loading

stevengj Sep 17, 2020 •

edited

Loading

musm commented Dec 15, 2020 •

edited

Loading