Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

close #8269: grow arrays incrementally by factors of 1.5 rather than 2 #16305

Closed
wants to merge 1 commit into from

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented May 10, 2016

As discussed in #8269, many authors seem to consider this to be a good idea. The patch seems trivial.

I tried a quick benchmark, and the performance difference was barely measurable (if anything, a 1% slowdown) (whoops, screwed up the patch) was _3–15% faster_ after this patch.

function grow(n)
    a = Int[]
    for i = 1:n
        push!(a, i)
    end
    return a
end
@time grow(10^8);

It's apparently faster in both theory and practice, and wastes less memory. Seems worthwhile.

@JaredCrean2
Copy link
Contributor

Should there be some docs about this? I don't see anything in the current docs about the efficiency of functions like push!

@stevengj
Copy link
Member Author

@JaredCrean2, good point; I added some docs to push!.

@@ -1406,6 +1406,10 @@ julia> push!([1, 2, 3], 4, 5, 6)
6
```

Internally, the storage for the array is increased exponentially (by 50%)
as needed, so that calling `push!` ``n`` times on an empty array
involves only ``O(\log n)`` allocations and ``O(n)`` time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should be \\log n

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@dpsanders
Copy link
Contributor

Is there a sensible way to try factors for the growth other than 2 and 1.5?

@stevengj
Copy link
Member Author

@TotalVerb, probably most C compilers do this, but C programmers traditionally read >> 1 as / 2 anyway. 😉

@stevengj
Copy link
Member Author

stevengj commented May 11, 2016

@dpsanders, you can try anything you want. Honestly, I doubt the cost of the arithmetic is significant here, so you could just try (int) (size * 1.xxx) (though you have to be careful about minimum-size thresholds to make sure that the size strictly increases when it is small). But since a lot of other authors seems to have settled on 1.5 as a good heuristic, and it indeed seems to be better in practice than 2, I'm not sure I'm enthusiastic enough to do that experiment myself.

@stevengj
Copy link
Member Author

stevengj commented May 11, 2016

Okay, I ran a benchmark of grow(10^8) (using the Benchmarks.jl package with a benchmark time of 30s) for size = 1 + (int) (size * growth) as a function of the growth factor. The results are shown below.

There is a clear penalty for making the growth factor too small, and if anything the optimum growth factor below 2 is around 1.7. However, I would take small differences in timing here with a large grain of salt, because it depends on things like our garbage-collection algorithm (not to mention on the hardware, in this case my 2014 2.15GHz i7 laptop).

1.5 still seems like a reasonable compromise, minimizing over-allocation without sacrificing much if any performance.

image

@vchuravy
Copy link
Member

I always found that https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md gives a nice reasoning for choosing 1.5

@stevengj
Copy link
Member Author

stevengj commented May 11, 2016

@vchuravy, that argument, which eventually leads to the golden ratio (as you cited on #8269) is indeed very cute. (But you never want to trust simple performance models too much without checking them on actual hardware, which is almost always vastly more complicated than the theory.)

@mschauer
Copy link
Contributor

mschauer commented May 11, 2016

Interesting. The statement from https://blog.mozilla.org/nnethercote/2014/11/04/please-grow-your-buffers-exponentially/ cited

"Now, you wouldn’t want to use exactly this value [the golden ratio] to do your memory reallocation, because you’d end up waiting forever to be able to reuse your old memory "

is misleading, "Forever" here is only three allocations as 1 + ϕ^1 = ϕ^2 . The next interesting number is χ = 1.8392867552 = "golden number corresponding to the tribonacci sequence" with 1 + χ + χ^2 = χ^3 which allows reuse after four steps. Choosing 1.75 (a bit smaller than 1.83 to allow for book keeping) also makes sense.

Edit: Whether 1 + χ + χ^2 = χ^3 or 1 + χ + χ^2 = χ^4 is relevant depends on the implementation of realloc.

@jrevels
Copy link
Member

jrevels commented May 11, 2016

Specifically, it'll be good to see how the benchmarks in the "growth" group fair. Something within 3%-15% might be too small a deviation with our current noise tolerance, though.

@nanosoldier runbenchmarks("array", vs = ":master")

@jrevels
Copy link
Member

jrevels commented May 11, 2016

Reading the most recent status made me realize that ":master" was comparing against stevengj/julia:master, not JuliaLang/julia:master, like it should've been. I just fixed it - this should restart the build here:

@nanosoldier runbenchmarks("array", vs = ":master")

@stevengj
Copy link
Member Author

stevengj commented May 11, 2016

@jrevels, that is cool stuff. (@nanosoldier please get me lunch.)

@stevengj
Copy link
Member Author

Technically, shouldn't it compare to the master branch at the commit where this PR was forked? Otherwise you could get spurious performance differences due to changes merged after this PR. (Or are you comparing to this PR rebased/merged onto the current master?)

@jrevels
Copy link
Member

jrevels commented May 11, 2016

Or are you comparing to this PR rebased/merged onto the current master?

If the merge can performed automatically (without conflicts), then it will use the merge commit of the PR and compare it against whatever commit you specified with vs. If the merge can't be performed automatically, it will simply use the head commit of the PR instead. There's more documentation on @nanosoldier's behavior here if you're interested.

@stevengj
Copy link
Member Author

stevengj commented May 11, 2016

@jrevels, any way to see the full error log on nanosoldier? Hopefully the failure is not the fault of this PR since Travis and Appveyor passed?

@jrevels
Copy link
Member

jrevels commented May 11, 2016

any way to see the full error log on nanosoldier?

Not at this time, unfortunately. It seems Nanosoldier just had trouble posting the final "job completed" comment, it wasn't the fault of this PR. Weird, since some other jobs just finished and the comments were correctly posted there...seems I still have some kinks to work out since the most recent refactor.

Besides that last comment error, the entire job completed and the report was uploaded here. You can play with these same benchmarks locally using BaseBenchmarks.jl.

@KristofferC
Copy link
Member

It is relevant to note that all the growth benchmarks have a final size of 2^k.

@StefanKarpinski
Copy link
Member

@vchuravy, that argument, which eventually leads to the golden ratio (as you cited on #8269) is indeed very cute. (But you never want to trust simple performance models too much without checking them on actual hardware, which is almost always vastly more complicated than the theory.)

Doesn't the theoretical argument for the golden ration being optimal plus your benchmarks showing that the optimal ratio is around 1.6 argue pretty strongly for the golden ratio actually being a good choice?

@mschauer
Copy link
Contributor

mschauer commented May 12, 2016

@StefanKarpinski The conclusion of the theoretical arguments really depends on whether the new resized array should fit into the space occupied by all previous resized arrays excluding or including the space occupied by the current array. If it is excluding, then 1.6... is not optimal, but to the contrary the smallest factor for which all previous sizes excluding the current one never is enough. Took me also a moment. I would go with @stevengj 's dictum "But you never want to trust simple performance models too much without checking them on actual hardware, which is almost always vastly more complicated than the theory."

@jrevels
Copy link
Member

jrevels commented May 12, 2016

It is relevant to note that all the growth benchmarks have a final size of 2^k.

Good point, it would be better to test more varied sizes...in the meantime, it's probably worth running the whole suite here to test all of our application-based benchmarks (the previous run was only the "array" benchmarks).

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@StefanKarpinski
Copy link
Member

Isn't "checking them on actual hardware" exactly what @stevengj did?

@JeffBezanson
Copy link
Member

Looks like the nanosoldier is not happy with this.

@mschauer
Copy link
Contributor

Simulations are good, it is the theoretical arguments which do not give clear answers. By the way, @stevengj, I think

function grow(n)
    a = Int[]
    b = Int[]
    for i = 1:n
        push!(a, i)
        push!(b, i)
    end
    return a,b
end

would be a relevant test: to have at least one array which is not at the top of the heap where growing to the right/upwards is easy.

@stevengj
Copy link
Member Author

stevengj commented May 13, 2016

@StefanKarpinski, if you take my benchmark as gospel, then 1.7 is even better. However:

  • That is just a single benchmark (on a single machine, for a single version of Julia's gc). As the nanosoldier pointed out, different benchmarks may give different results. That variance means that we don't want to place too much weight on small differences in performance when deciding on a factor.
  • It's not just a question of maximizing speed: The bigger the growth factor, the more memory will get wasted in general.

@stevengj
Copy link
Member Author

stevengj commented Aug 2, 2016

Do we want to revisit this at some point?

@eschnett
Copy link
Contributor

eschnett commented Aug 2, 2016

Growing an array to a size of 10^8 must be rare. The common case is probably growing thousands of arrays to sizes of 10^3 or 10^5. I'd benchmark these cases as well before making a decision.

@stevengj
Copy link
Member Author

stevengj commented Aug 2, 2016

Yes, but I should emphasize that it's not completely a question of performance (and there is bound to be some variability between different benchmarks); there's also a space/time tradeoff here.

@eschnett
Copy link
Contributor

eschnett commented Aug 2, 2016

If the array is large, and on a 64-bit system, the additional space should only be "address space", not "allocated memory". With paging, the system will also be able to reuse previously allocated memory since fragmentation is (in principle) limited to the OS page size, and (in practice) to 128 kByte or so, for GNU malloc.

@stevengj
Copy link
Member Author

Rebased, re-running benchmarks to see if anything has changed:

@nanosoldier runbenchmarks("array", vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@oscardssmith
Copy link
Member

Something worth noting is that none of these have a memory difference of more than 10% in phi's favor. To me, this seems that either our theory is really wrong, or something is messed up with the implementation. (or allocated but unused memory isn't included. Is that true, because if so, it should probably be fixed).

@yuyichao
Copy link
Contributor

Note that there shouldn't be any expected memory saving in the benchmarks that has repeated array growing at all since those numbers does not take into account the memory reuse.

@musm
Copy link
Contributor

musm commented Sep 17, 2020

Bump. Seems like the decision is to use 1.5 as a good compromise, any reason this got stale?

@KristofferC
Copy link
Member

#16305 (comment)

@musm
Copy link
Contributor

musm commented Sep 17, 2020

Probably worth to try again after four years.

@stevengj
Copy link
Member Author

stevengj commented Sep 17, 2020

Rebased.

As @JeffBezanson commented in #32035 (comment), we may want to grow by a larger factor for small-size arrays until the length reaches some threshold.

@stevengj
Copy link
Member Author

But we could always close this PR in favor of #32035.

@musm
Copy link
Contributor

musm commented Dec 15, 2020

closing this in favor of #32035

@musm musm closed this Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.