Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LLVM's stack growable #644

Merged
merged 6 commits into from
Oct 23, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 46 additions & 13 deletions libraries/llvm/rts.ll
Original file line number Diff line number Diff line change
Expand Up @@ -366,30 +366,63 @@ define private %Reference @newReference(%Stack %stack) alwaysinline {
; Stack management

define private %StackPointer @stackAllocate(%Stack %stack, i64 %n) {
%stackStackPointer = getelementptr %StackValue, %Stack %stack, i64 0, i32 1, i32 0
%stackPointer = load %StackPointer, ptr %stackStackPointer
%stackPointer_pointer = getelementptr %StackValue, %Stack %stack, i64 0, i32 1, i32 0

%stackPointer_2 = getelementptr i8, %StackPointer %stackPointer, i64 %n
store %StackPointer %stackPointer_2, ptr %stackStackPointer
ret %StackPointer %stackPointer
%stackMemory = getelementptr %StackValue, %Stack %stack, i64 0, i32 1
%memory = load %Memory, ptr %stackMemory

%current = extractvalue %Memory %memory, 0
%limit = extractvalue %Memory %memory, 2

%nextStackPointer = getelementptr i8, %StackPointer %current, i64 %n
%cmp = icmp ule %StackPointer %nextStackPointer, %limit
br i1 %cmp, label %continue, label %realloc

continue:
store %StackPointer %nextStackPointer, ptr %stackPointer_pointer
ret %StackPointer %current

realloc:
%base = extractvalue %Memory %memory, 1

%intStackPointer = ptrtoint %StackPointer %current to i64
%intBase = ptrtoint %Base %base to i64

%size = sub i64 %intStackPointer, %intBase
%double = mul i64 %size, 2
%newSize = add i64 %double, %n ; TODO: should we be smarter here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone who doesn't understand the LLVM runtime, why do we need to add the %n here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not strictly related to the runtime, but rather the growing logic. Since the bytes to be allocated (n) could be larger than the double of the current stack size, I first double the size and then add n to it. We could (should?) of course do something smarter here.

Copy link
Contributor

@jiribenes jiribenes Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively, I'd expect something like "if we know we have to resize, then new_size <- nextStrictlyBiggestPowerOfTwo(current_size, n)" where the mysterious function is https://llvm.org/doxygen/namespacellvm.html#afb65eef479f0473d0fe1666b80155237 or clz (get the highest bit, choose a number one bigger in binary), but I have to think about it.

Copy link
Member Author

@marvinborner marvinborner Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that also sounds good. My thought behind the strategy was that, intuitively, large stack allocations are quite rare (or, in effekt's case, impossible?). So hypothetically, if I allocate 1GB on a 10MB stack, I'd prefer the next stack size to be 20MB + 1GB rather than 2GB.

In hindsight this hypothetical doesn't really make sense because we only use this function to allocate really small sizes 🤷‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jiribenes jiribenes Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it for a bit and something like the following should be pretty good:

new_size := nextPowerOf2(max(size * 2, size + n + 64))

Rationale:

  • aligning to next power of two is Good ®️
  • if the allocation is huge compared to the current size, it's best to keep at least a little bit (64b) of space extra so that we don't have to alloc again soon

Here's my crappy LLVM impl:

    ; Calculate double of current size
    %double_size = shl i64 %size, 1
    
    ; Calculate size + n + 64 (small buffer)
    %size_plus_n = add i64 %size, %n
    %size_plus_n_buffer = add i64 %size_plus_n, 64
    
    ; Take the maximum of (size * 2) and (size + n + 64)
    %max_size = call i64 @llvm.maximum.i64(i64 %double_size, i64 %size_plus_n_buffer)
    
    ; Round up to the next power of 2 using ctlz
    %leading_zeros = call i64 @llvm.ctlz.i64(i64 %max_size, i1 false)
    %shift_amount = sub i64 63, %leading_zeros
    %power_of_two = shl i64 1, %shift_amount
    %newSize = select i64 %power_of_two, i64 %max_size, i1 icmp eq i64 %power_of_two, %max_size

Of course, perhaps I'm really overthinking this, I'd really need to benchmark.
Also, I'm not sure whether we should do size * 2, size * 1.5 or size * <golden ratio>. Probably again needs to be benchmarked.

To continue thinking about this, it would be really nice to know the "profile" of allocations: how do our allocations actually look like? Can we serialise this somehow and then read later?
(see summary below where I discover that I have no clue how to model this)

WDYT?


btw, @jiribenes tried to do statistics here

I also thought about my assumptions:

  • the number of allocations follows a Pareto distribution
  • the size of an allocation is a power of two and at least 64, and follows a Pareto distribution

but when I try to even simulate them, I can clearly see that they are not true, since they result in very whacky stacks with the function I suggested under the distribution above:

Initial size: 1024 bytes
  Trimmed Mean size: 437134839.21 bytes
  Median size: 524288.00 bytes
  90th percentile size: 1073741824.00 bytes
  Max size: 1073741824.00 bytes

Initial size: 4096 bytes
  Trimmed Mean size: 452582561.59 bytes
  Median size: 2097152.00 bytes
  90th percentile size: 1073741824.00 bytes
  Max size: 1073741824.00 bytes

Initial size: 8192 bytes
  Trimmed Mean size: 461104562.45 bytes
  Median size: 4194304.00 bytes
  90th percentile size: 1073741824.00 bytes
  Max size: 1073741824.00 bytes

Of course, my code is most likely bad, but this is not very encouraging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting, although I agree that maybe you're overthinking this a bit :octocat:

Regarding the profile of the allocations: We (currently?) only use @stackAllocate to make space for pushes of pos/neg, builtin types and other pointers, never for arbitrarily large data allocations. (Maybe @phischu can confirm this?)

As far as I understand, large (de-)allocations only happen when some function has a lot of arguments that need to be stored on the stack (~8-16 byte each). In nqueens, for example, this leads to stack (de-)allocations of 108 bytes.

To make this more clear: Allocation sizes/amount in all generated files in effekt.llvmtests:

bytes allocated how often
8 318
16 305
24 1856
32 101
40 57
48 22
56 24
64 4
72 1
80 1

So I don't think we really need to be creative with the growing logic, since %n will almost never be larger than the doubled current stack size, especially if we settle on an initial stack size like 1024. Minimal solutions like adding the current size (as I did), or by using NextPowerOf2 are probably more than enough.

Copy link
Member Author

@marvinborner marvinborner Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phischu asked about the stack allocation profile at runtime. Here's the data:

For the entire test suite (test):

bytes allocated how often percentage
8 2918774 28.8
16 815964 8
24 5386483 53.1
32 841972 8.3
40 169619 1.7
48 6444 0.06
56 79 <0.01
64 2085 0.02
72 126 <0.01
80 707 <0.01
96 8 <0.01
112 1 <0.01

Only the LLVM tests (testOnly effekt.LLVMTests):

bytes allocated how often percentage
8 2918721 28.8
16 815664 8
24 5385418 53.1
32 841852 8.3
40 169604 1.7
48 6435 0.06
56 79 <0.01
64 2080 0.02
72 126 <0.01
80 702 <0.01


%newBase = call ptr @realloc(ptr %base, i64 %newSize)
%newLimit = getelementptr i8, %Base %newBase, i64 %newSize
%newStackPointer = getelementptr i8, %Base %newBase, i64 %size
%newNextStackPointer = getelementptr i8, %StackPointer %newStackPointer, i64 %n

%base_pointer = getelementptr %StackValue, %Stack %stack, i64 0, i32 1, i32 1
%limit_pointer = getelementptr %StackValue, %Stack %stack, i64 0, i32 1, i32 2

store %StackPointer %newNextStackPointer, ptr %stackPointer_pointer
store %Base %newBase, ptr %base_pointer
store %Limit %newLimit, ptr %limit_pointer

ret %StackPointer %newStackPointer
}

define private %StackPointer @stackDeallocate(%Stack %stack, i64 %n) {
%stackStackPointer = getelementptr %StackValue, %Stack %stack, i64 0, i32 1, i32 0
%stackPointer = load %StackPointer, ptr %stackStackPointer
%stackPointer_pointer = getelementptr %StackValue, %Stack %stack, i64 0, i32 1, i32 0
%stackPointer = load %StackPointer, ptr %stackPointer_pointer

%o = sub i64 0, %n
%stackPointer_2 = getelementptr i8, %StackPointer %stackPointer, i64 %o
store %StackPointer %stackPointer_2, ptr %stackStackPointer
%newStackPointer = getelementptr i8, %StackPointer %stackPointer, i64 %o
store %StackPointer %newStackPointer, ptr %stackPointer_pointer

ret %StackPointer %stackPointer_2
ret %StackPointer %newStackPointer
}

; Meta-stack management

define private %Memory @newMemory() {
%stackPointer = call %StackPointer @malloc(i64 268435456)
%limit = getelementptr i8, ptr %stackPointer, i64 268435456
%stackPointer = call %StackPointer @malloc(i64 1024)
%limit = getelementptr i8, ptr %stackPointer, i64 1024

%memory.0 = insertvalue %Memory undef, %StackPointer %stackPointer, 0
%memory.1 = insertvalue %Memory %memory.0, %Base %stackPointer, 1
Expand All @@ -403,7 +436,7 @@ define private %Stack @newStack(%Prompt %prompt) {
; TODO find actual size of stack
%stack = call ptr @malloc(i64 120)

; TODO initialize to zero and grow later
; TODO initialize to zero?
marvinborner marked this conversation as resolved.
Show resolved Hide resolved
%stackMemory = call %Memory @newMemory()

%stack.0 = insertvalue %StackValue undef, %ReferenceCount 0, 0
Expand Down