Skip to content

Implement Proper Tail Call [Discussion] #796

@abchatra

Description

@abchatra

This is a discussion thread on implementing proper tail calls (PTC) in Chakra. For simplicity, only full jit implementation is discussed.

Introduction

Wiki & in Javascript

x86 implementation details of a PTC

Chakra uses caller cleanup for the arguments in the stack. This means caller is required to clean up the stack space for the arguments it pushed into the stack (similar to C calling convention).

Let us take a example:

"use strict"
function foo()
{ 
  bar(1);
}
function bar()
{
  return foobar(1,2,3,4); //tail call
}

Without PTC, asm code for a tail call to foobar(1,2,3,4) looks as below

   return foobar(1,2,3,4)

    //#1 allocate the stack for arguments first
    (esp).i32       =  LEA            [(esp).i32-24].i32

    //#2 push the arguments
    [esp]           = MOV           0xXXXXXXXX (undefined)[Undefined].var
    [esp + 4]       = MOV           0x00000003.var
    [esp + 8]       = MOV           0x00000005.var
    [esp + 12]      = MOV           0x00000007.var
    [esp + 16]      = MOV           0x00000009.var

    // push two meta arguments, callinfo (which has a count of arguments) and the function object. 
                        PUSH           268435461 (0x10000005).i32
                        PUSH           s7(esi).var

    //#3 Call the function
                       CALL           s21(eax).u32

    //#4 Re-adjust the stack to 24+8(2*4 meta arguments)                   
    (esp).i32       =  LEA            [(esp).i32+32].i32

     //#5 Epilog and return
    (ebx).i32       =  POP
    (esi).i32       =  POP
    (edi).i32       =  POP
    (esp).i32       =  MOV            (ebp).i32
    (ebp).i32       =  POP
                       RET            0 (0x0).i32, (eax).i32    

On implementing PTC asm code:

    return foobar(1,2,3,4)

    //#2 push the arguments to the caller allocated stack space. Note ebp is a frame pointer.
    [ebp + 16]      = MOV           0xXXXXXXXX (undefined)[Undefined].var
    [ebp + 20]      = MOV           0x00000003.var
    [ebp + 24]      = MOV           0x00000005.var
    [ebp + 28]      = MOV           0x00000007.var
    [ebp + 32]      = MOV           0x00000009.var

    //push two meta arguments, callinfo (which has number of arguments) and function object. 
    [ebp + 12]      = MOV           268435461 (0x10000005).i32
    [ebp + 8]       = MOV           s7(esi).var


     //#4 Execute epilog before the jump, as we want to restore non-volatile registers
    (ebx).i32       =  [ebp - 16]
    (esi).i32       =  [ebp - 12]
    (edi).i32       =  [ebp - 8]
    (esp).i32       =  [ebp + 4] //this should point to return address
    (ebp).i32       =  [ebp]

    //#3 Jump to callee
                     JMP           s21(eax).u32

This looks cool as we avoided a new stack frame allocation. Though there is a caveat, bar does not know the number of arguments passed to it. So bar call frame has no clue for the step 2 whether it has enough space on the stack or not. In above example, foo passed only one argument and it doesn't have space for 3 more arguments to move to the stack. What do we do now? Check in the main path.

    CMP            [ebp + 12], 4 
    JLT            $helper

Branches are expensive, though a smart compiler can move that to a prolog of the function.

Now let us look at the helper. How would this allocate the extra stack space required for arguments?

    $helper:
    //#7 Now you are mucking with the stack be carefull. Let us burn registers here as it is slow path
     eax            = <extra args required>
     esp            =  LEA        [ebp + eax]

    //#6 Mov return address and frame pointer
    [esp + 4]     =  MOV [ebp + 4]
    [esp]           =  MOV [ebp]

    //#2 push the arguments to the caller-allocated stack space and additional stack space

    //#4 Execute epilog before the jump, as we want to restore non-volatile registers

    //#3 Jump to callee
                     JMP           s21(eax).u32

Are we done? Not yet, foo which called bar has no clue that bar has changed stack pointer (esp) while doing a PTC to foobar. If foo has any locals referenced by the stack pointer, which is altered now. So it has to check and restore the stack pointer, something like following:

bar(1);
   // Save and re-adjust the stack to where it was before the call from a non-volatile register or known stack location.                   
        edi         =  MOV     esp
                          CALL           s21(eax).u32
        esp        =  MOV     edi

This is now burning a register in non-PTC position on the main path. Every call we need to burn an extra non-volatile register as the callee can make a PTC. This is bad news for performance.

There are couple of alternate solution to this:

  • In case there is not enough stack space, do a CALL instead of PTC which will cleanup the stack space when control is returned to it. This will work though there will be an extra fake frame on the stack which needs to be hidden.
  • Change the calling convention in the Chakra runtime to do a callee cleanup. This is riddled with issues as all our built-ins such as Math.sin are C++ helpers which accept variable arguments and MSVC can't do callee cleanup. The solution is to build individual thunks for every built-in in the runtime 😟.

x64 implementation

Chakra uses C++ exception to propagate errors and relies on C++ exception unwinding to catch the errors and report to the users.

Windows x64 ABI mandates setting up the stack pointer (rsp) in the prolog and disallows changing it on the fly. This is to facilitate generating static unwind data, so OS can statically unwind the call frames.

Changing the stack pointer as in x86 is ruled out. We do have to create a whole new frame when the number of arguments to function at PTC position is higher than passed to the call frame. This is again a fake frame which needs to be hidden from stack walking etc.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions