-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
This is a discussion thread on implementing proper tail calls (PTC) in Chakra. For simplicity, only full jit implementation is discussed.
Introduction
Wiki & in Javascript
x86 implementation details of a PTC
Chakra uses caller cleanup for the arguments in the stack. This means caller is required to clean up the stack space for the arguments it pushed into the stack (similar to C calling convention).
Let us take a example:
"use strict"
function foo()
{
bar(1);
}
function bar()
{
return foobar(1,2,3,4); //tail call
}Without PTC, asm code for a tail call to foobar(1,2,3,4) looks as below
return foobar(1,2,3,4)
//#1 allocate the stack for arguments first
(esp).i32 = LEA [(esp).i32-24].i32
//#2 push the arguments
[esp] = MOV 0xXXXXXXXX (undefined)[Undefined].var
[esp + 4] = MOV 0x00000003.var
[esp + 8] = MOV 0x00000005.var
[esp + 12] = MOV 0x00000007.var
[esp + 16] = MOV 0x00000009.var
// push two meta arguments, callinfo (which has a count of arguments) and the function object.
PUSH 268435461 (0x10000005).i32
PUSH s7(esi).var
//#3 Call the function
CALL s21(eax).u32
//#4 Re-adjust the stack to 24+8(2*4 meta arguments)
(esp).i32 = LEA [(esp).i32+32].i32
//#5 Epilog and return
(ebx).i32 = POP
(esi).i32 = POP
(edi).i32 = POP
(esp).i32 = MOV (ebp).i32
(ebp).i32 = POP
RET 0 (0x0).i32, (eax).i32 On implementing PTC asm code:
return foobar(1,2,3,4)
//#2 push the arguments to the caller allocated stack space. Note ebp is a frame pointer.
[ebp + 16] = MOV 0xXXXXXXXX (undefined)[Undefined].var
[ebp + 20] = MOV 0x00000003.var
[ebp + 24] = MOV 0x00000005.var
[ebp + 28] = MOV 0x00000007.var
[ebp + 32] = MOV 0x00000009.var
//push two meta arguments, callinfo (which has number of arguments) and function object.
[ebp + 12] = MOV 268435461 (0x10000005).i32
[ebp + 8] = MOV s7(esi).var
//#4 Execute epilog before the jump, as we want to restore non-volatile registers
(ebx).i32 = [ebp - 16]
(esi).i32 = [ebp - 12]
(edi).i32 = [ebp - 8]
(esp).i32 = [ebp + 4] //this should point to return address
(ebp).i32 = [ebp]
//#3 Jump to callee
JMP s21(eax).u32This looks cool as we avoided a new stack frame allocation. Though there is a caveat, bar does not know the number of arguments passed to it. So bar call frame has no clue for the step 2 whether it has enough space on the stack or not. In above example, foo passed only one argument and it doesn't have space for 3 more arguments to move to the stack. What do we do now? Check in the main path.
CMP [ebp + 12], 4
JLT $helperBranches are expensive, though a smart compiler can move that to a prolog of the function.
Now let us look at the helper. How would this allocate the extra stack space required for arguments?
$helper:
//#7 Now you are mucking with the stack be carefull. Let us burn registers here as it is slow path
eax = <extra args required>
esp = LEA [ebp + eax]
//#6 Mov return address and frame pointer
[esp + 4] = MOV [ebp + 4]
[esp] = MOV [ebp]
//#2 push the arguments to the caller-allocated stack space and additional stack space
//#4 Execute epilog before the jump, as we want to restore non-volatile registers
//#3 Jump to callee
JMP s21(eax).u32Are we done? Not yet, foo which called bar has no clue that bar has changed stack pointer (esp) while doing a PTC to foobar. If foo has any locals referenced by the stack pointer, which is altered now. So it has to check and restore the stack pointer, something like following:
bar(1);
// Save and re-adjust the stack to where it was before the call from a non-volatile register or known stack location.
edi = MOV esp
CALL s21(eax).u32
esp = MOV ediThis is now burning a register in non-PTC position on the main path. Every call we need to burn an extra non-volatile register as the callee can make a PTC. This is bad news for performance.
There are couple of alternate solution to this:
- In case there is not enough stack space, do a CALL instead of PTC which will cleanup the stack space when control is returned to it. This will work though there will be an extra fake frame on the stack which needs to be hidden.
- Change the calling convention in the Chakra runtime to do a callee cleanup. This is riddled with issues as all our built-ins such as Math.sin are C++ helpers which accept variable arguments and MSVC can't do callee cleanup. The solution is to build individual thunks for every built-in in the runtime 😟.
x64 implementation
Chakra uses C++ exception to propagate errors and relies on C++ exception unwinding to catch the errors and report to the users.
Windows x64 ABI mandates setting up the stack pointer (rsp) in the prolog and disallows changing it on the fly. This is to facilitate generating static unwind data, so OS can statically unwind the call frames.
Changing the stack pointer as in x86 is ruled out. We do have to create a whole new frame when the number of arguments to function at PTC position is higher than passed to the call frame. This is again a fake frame which needs to be hidden from stack walking etc.