Make LLVM's stack growable #644

marvinborner · 2024-10-15T21:44:30Z

This enables automatic growing of LLVM's stack once the end is reached (or memory larger than the remaining space is allocated).

We can now have much smaller stack sizes than before. For now, I've set the previous 256M stacks to 1KB (which, after initial tests, seems fine).

Due to some code repetition I wanted to merge the growing logic of regions with this. I've reverted my attempts since #642 removes the duplicated code anyway.

libraries/llvm/rts.ll

marvinborner · 2024-10-16T16:17:26Z

The benchmarks use config_llvm.txt similar to effekt-plots. Current master is in the left bar. I can't really explain the sudden change near the end (and in master), but the jumps are fully reproducible.

The hacky script I've used to generate the data: https://gist.github.com/marvinborner/f71e3fdf55548692c7eb7b8d348d4284

jiribenes · 2024-10-16T16:33:24Z

libraries/llvm/rts.ll

+
+    %size = sub i64 %intStackPointer, %intBase
+    %double = mul i64 %size, 2
+    %newSize = add i64 %double, %n ; TODO: should we be smarter here?


As someone who doesn't understand the LLVM runtime, why do we need to add the %n here?

It's not strictly related to the runtime, but rather the growing logic. Since the bytes to be allocated (n) could be larger than the double of the current stack size, I first double the size and then add n to it. We could (should?) of course do something smarter here.

Intuitively, I'd expect something like "if we know we have to resize, then new_size <- nextStrictlyBiggestPowerOfTwo(current_size, n)" where the mysterious function is https://llvm.org/doxygen/namespacellvm.html#afb65eef479f0473d0fe1666b80155237 or clz (get the highest bit, choose a number one bigger in binary), but I have to think about it.

Yes, that also sounds good. My thought behind the strategy was that, intuitively, large stack allocations are quite rare (or, in effekt's case, impossible?). So hypothetically, if I allocate 1GB on a 10MB stack, I'd prefer the next stack size to be 20MB + 1GB rather than 2GB.

In hindsight this hypothetical doesn't really make sense because we only use this function to allocate really small sizes 🤷‍♂️

Might be worth looking at

https://github.com/golang/go/blob/24cb743d1faa6c8f612faa3c17ac9de5cc385832/src/runtime/stack.go#L336

and it's callsites.

I thought about it for a bit and something like the following should be pretty good:

new_size := nextPowerOf2(max(size * 2, size + n + 64))

Rationale:

aligning to next power of two is Good ®️

if the allocation is huge compared to the current size, it's best to keep at least a little bit (64b) of space extra so that we don't have to alloc again soon

Here's my crappy LLVM impl:

; Calculate double of current size %double_size = shl i64 %size, 1 ; Calculate size + n + 64 (small buffer) %size_plus_n = add i64 %size, %n %size_plus_n_buffer = add i64 %size_plus_n, 64 ; Take the maximum of (size * 2) and (size + n + 64) %max_size = call i64 @llvm.maximum.i64(i64 %double_size, i64 %size_plus_n_buffer) ; Round up to the next power of 2 using ctlz %leading_zeros = call i64 @llvm.ctlz.i64(i64 %max_size, i1 false) %shift_amount = sub i64 63, %leading_zeros %power_of_two = shl i64 1, %shift_amount %newSize = select i64 %power_of_two, i64 %max_size, i1 icmp eq i64 %power_of_two, %max_size

Of course, perhaps I'm really overthinking this, I'd really need to benchmark.
Also, I'm not sure whether we should do size * 2, size * 1.5 or size * <golden ratio>. Probably again needs to be benchmarked.

To continue thinking about this, it would be really nice to know the "profile" of allocations: how do our allocations actually look like? Can we serialise this somehow and then read later?
(see summary below where I discover that I have no clue how to model this)

WDYT?

btw, @jiribenes tried to do statistics here

I also thought about my assumptions:

the number of allocations follows a Pareto distribution

the size of an allocation is a power of two and at least 64, and follows a Pareto distribution

but when I try to even simulate them, I can clearly see that they are not true, since they result in very whacky stacks with the function I suggested under the distribution above:

Initial size: 1024 bytes Trimmed Mean size: 437134839.21 bytes Median size: 524288.00 bytes 90th percentile size: 1073741824.00 bytes Max size: 1073741824.00 bytes Initial size: 4096 bytes Trimmed Mean size: 452582561.59 bytes Median size: 2097152.00 bytes 90th percentile size: 1073741824.00 bytes Max size: 1073741824.00 bytes Initial size: 8192 bytes Trimmed Mean size: 461104562.45 bytes Median size: 4194304.00 bytes 90th percentile size: 1073741824.00 bytes Max size: 1073741824.00 bytes

Of course, my code is most likely bad, but this is not very encouraging.

Very interesting, although I agree that maybe you're overthinking this a bit

Regarding the profile of the allocations: We (currently?) only use @stackAllocate to make space for pushes of pos/neg, builtin types and other pointers, never for arbitrarily large data allocations. (Maybe @phischu can confirm this?)

As far as I understand, large (de-)allocations only happen when some function has a lot of arguments that need to be stored on the stack (~8-16 byte each). In nqueens, for example, this leads to stack (de-)allocations of 108 bytes.

To make this more clear: Allocation sizes/amount in all generated files in effekt.llvmtests:

bytes allocated how often

8 318

16 305

24 1856

32 101

40 57

48 22

56 24

64 4

72 1

80 1

So I don't think we really need to be creative with the growing logic, since %n will almost never be larger than the doubled current stack size, especially if we settle on an initial stack size like 1024. Minimal solutions like adding the current size (as I did), or by using NextPowerOf2 are probably more than enough.

@phischu asked about the stack allocation profile at runtime. Here's the data:

For the entire test suite (test):

bytes allocated how often percentage

8 2918774 28.8

16 815964 8

24 5386483 53.1

32 841972 8.3

40 169619 1.7

48 6444 0.06

56 79 <0.01

64 2085 0.02

72 126 <0.01

80 707 <0.01

96 8 <0.01

112 1 <0.01

Only the LLVM tests (testOnly effekt.LLVMTests):

bytes allocated how often percentage

8 2918721 28.8

16 815664 8

24 5385418 53.1

32 841852 8.3

40 169604 1.7

48 6435 0.06

56 79 <0.01

64 2080 0.02

72 126 <0.01

80 702 <0.01

b-studios · 2024-10-16T17:42:01Z

Thanks for the benchmarks, do I read it correctly that triples and tree_explore basically disappear, because they are much faster?

b-studios · 2024-10-16T17:50:59Z

Most continuation heavy benchmarks seem to be much faster, except parsing dollars. That's strange

libraries/llvm/rts.ll

b-studios · 2024-10-23T14:50:14Z

Thanks @marvinborner !

Grow stack and make it smaller

0e98c1d

marvinborner commented Oct 15, 2024

View reviewed changes

libraries/llvm/rts.ll Outdated Show resolved Hide resolved

Remove redundant comment

5235f07

jiribenes reviewed Oct 16, 2024

View reviewed changes

Merge branch 'master' into feature/llvm-stack-growing

4650a52

b-studios requested review from b-studios, phischu and abgruszecki and removed request for abgruszecki and b-studios October 22, 2024 15:04

phischu added 2 commits October 22, 2024 21:14

Round to next power of two

8c8daab

Start small

aec52d0

marvinborner commented Oct 22, 2024

View reviewed changes

libraries/llvm/rts.ll Outdated Show resolved Hide resolved

Move base and base_pointer

5dfb7af

b-studios merged commit cb2f439 into master Oct 23, 2024
2 checks passed

b-studios deleted the feature/llvm-stack-growing branch October 23, 2024 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make LLVM's stack growable #644

Make LLVM's stack growable #644

marvinborner commented Oct 15, 2024

marvinborner commented Oct 16, 2024 •

edited

Loading

jiribenes Oct 16, 2024

marvinborner Oct 16, 2024

jiribenes Oct 16, 2024 •

edited

Loading

marvinborner Oct 16, 2024 •

edited

Loading

b-studios Oct 16, 2024

jiribenes Oct 18, 2024 •

edited

Loading

marvinborner Oct 19, 2024

marvinborner Oct 22, 2024 •

edited

Loading

b-studios commented Oct 16, 2024

b-studios commented Oct 16, 2024

b-studios commented Oct 23, 2024

bytes allocated	how often	percentage
8	2918774	28.8
16	815964	8
24	5386483	53.1
32	841972	8.3
40	169619	1.7
48	6444	0.06
56	79	<0.01
64	2085	0.02
72	126	<0.01
80	707	<0.01
96	8	<0.01
112	1	<0.01

bytes allocated	how often	percentage
8	2918721	28.8
16	815664	8
24	5385418	53.1
32	841852	8.3
40	169604	1.7
48	6435	0.06
56	79	<0.01
64	2080	0.02
72	126	<0.01
80	702	<0.01

Make LLVM's stack growable #644

Make LLVM's stack growable #644

Conversation

marvinborner commented Oct 15, 2024

marvinborner commented Oct 16, 2024 • edited Loading

jiribenes Oct 16, 2024

Choose a reason for hiding this comment

marvinborner Oct 16, 2024

Choose a reason for hiding this comment

jiribenes Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

marvinborner Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

b-studios Oct 16, 2024

Choose a reason for hiding this comment

jiribenes Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

marvinborner Oct 19, 2024

Choose a reason for hiding this comment

marvinborner Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

b-studios commented Oct 16, 2024

b-studios commented Oct 16, 2024

b-studios commented Oct 23, 2024

marvinborner commented Oct 16, 2024 •

edited

Loading

jiribenes Oct 16, 2024 •

edited

Loading

marvinborner Oct 16, 2024 •

edited

Loading

jiribenes Oct 18, 2024 •

edited

Loading

marvinborner Oct 22, 2024 •

edited

Loading