Description
openedon Apr 28, 2021
Julia has a generational mark and sweep GC.
One of the properties of a mark and sweep GC is that it needs to go through and mark every reference object in memory to decide if it should be swept whenever the GC runs.
This is very expensive if you have a lot of objects in memory, and the cost is incurred even if those objects are not being freed (or even read).
Which is where the generational aspect comes in.
It applies a heuristic to break the objects in memory into generations.
Things that have been referenced for a long time (in some sense) are likely to continue to be referenced, and so are moved into the older generations.
The generational GC should mark and sweep the young generations very often, and the older generations more rarely.
This should mean if you have some large object with lots of references that are e.g. allocated at the start of your program and kept until the end of the program, it will make the GC slow at first since it is so large and takes so long to mark, but after a little while it will move into the older generations and very rarely be marked, and so cease to have an effect on the time it takes to do normal GCs.
The following examples use
ZonedDateTimes
from TimeZones@1.5.3
which are for now not isbits
though that is being worked on (JuliaTime/TimeZones.jl#271).
You can create a similar example that uses any non-isbits
objects such as mutable objects, things with abstract typed fields, or things with non-isbits fields (such as String
or any Vector
).
This one I had handy.
You can see in this first example the generational GC working.
At first, after the zdts
are allocated there is a sharp jump in the time spent on GC, up to around 90%.
But then after a few rounds, the zdts
have moved into the older generation,
and the time shifts spent on GC goes back down nicely to what it was at the start: around 20%
julia> # A function that creates a decent amount of garbage,
little_burn() = sum(hash, ["trashfire"^min(1000, i) for i in 1:10_000])
little_burn (generic function with 1 method)
julia> @time little_burn();
0.091142 seconds (10.00 k allocations: 82.467 MiB, 20.53% gc time)
julia> using TimeZones
julia> zdts = [now(tz"UTC") for _ in 1:100_000_000];
julia> for i in 1:10
@time little_burn()
end
0.566361 seconds (10.00 k allocations: 82.467 MiB, 89.26% gc time)
0.620691 seconds (10.00 k allocations: 82.467 MiB, 88.32% gc time)
0.051410 seconds (10.00 k allocations: 82.467 MiB)
0.603839 seconds (10.00 k allocations: 82.467 MiB, 90.91% gc time)
0.060557 seconds (10.00 k allocations: 82.467 MiB, 17.61% gc time)
0.066966 seconds (10.00 k allocations: 82.467 MiB, 18.17% gc time)
0.058870 seconds (10.00 k allocations: 82.467 MiB, 19.43% gc time)
0.060740 seconds (10.00 k allocations: 82.467 MiB, 15.33% gc time)
0.064070 seconds (10.00 k allocations: 82.467 MiB, 19.75% gc time)
0.060747 seconds (10.00 k allocations: 82.467 MiB, 21.03% gc time)
0.061915 seconds (10.00 k allocations: 82.467 MiB, 18.41% gc time)
However, it's very easy to run a different function that does not display the generational behavior
burn()
is identical to little_burn
but runs for 10x longer and allocates 10x as much memory.
Like little_burn
it never actually touches the zdts
.
with burn
however the GC time always stays at 60-80%.
This is due to it triggering a full sweep of both generations of our GC.
Thus it keeps needing to mark all of the big and complex zdts
which is slow.
julia> # A function that creates a decent amount of garbage,
burn() = sum(hash, ["trashfire"^min(1000, i) for i in 1:100_000])
burn (generic function with 1 method)
julia> @time burn();
0.913665 seconds (100.00 k allocations: 863.183 MiB, 19.98% gc time)
julia> using TimeZones
julia> zdts = [now(tz"UTC") for _ in 1:100_000_000];
julia> for i in 1:10
@time burn()
end
3.127583 seconds (100.00 k allocations: 863.183 MiB, 78.35% gc time)
2.073914 seconds (100.00 k allocations: 863.183 MiB, 64.82% gc time)
2.355717 seconds (100.00 k allocations: 863.183 MiB, 77.97% gc time)
1.842793 seconds (100.00 k allocations: 863.183 MiB, 66.04% gc time)
2.335667 seconds (100.00 k allocations: 863.183 MiB, 74.24% gc time)
1.804693 seconds (100.00 k allocations: 863.183 MiB, 66.61% gc time)
2.344848 seconds (100.00 k allocations: 863.183 MiB, 73.77% gc time)
2.396741 seconds (100.00 k allocations: 863.183 MiB, 74.44% gc time)
1.883768 seconds (100.00 k allocations: 863.183 MiB, 64.91% gc time)
2.262383 seconds (100.00 k allocations: 863.183 MiB, 77.07% gc time)
1.835220 seconds (100.00 k allocations: 863.183 MiB, 64.84% gc time)
I have seen real code that takes tens of seconds to do a full mark and sweep.
We really don't want to be running that very often. (if we are running that kind of mark and sweep too often it might even be better to OOM error)
It seems we need to do something to the GC to have it do a full mark and sweep less often.
This might be tweaking the heuristics.
It might be adding another generation.
it might be both