Skip to content

Conversation

@clamp03
Copy link
Member

@clamp03 clamp03 commented Oct 14, 2025

Enable interpreter for arm32 softfp.

  • Implement Assemblies for args and return values
  • Fix some minor bugs
  • Tested with simple test cases

@clamp03 clamp03 self-assigned this Oct 14, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 14, 2025
#ifdef TARGET_64BIT
#define INTERP_STACK_SLOT_SIZE 8 // Alignment of each var offset on the interpreter stack
#else // !TARGET_64BIT
#define INTERP_STACK_SLOT_SIZE 4 // Alignment of each var offset on the interpreter stack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad idea, I think. It will cause all sorts of mayhem. Is there a particular reason why this needs to happen for your PR to work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you change this, StackVal in interpexec.h needs to have its 8-byte elements removed, I believe.

Copy link
Member Author

@clamp03 clamp03 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad idea, I think. It will cause all sorts of mayhem. Is there a particular reason why this needs to happen for your PR to work?

I thought 4bytes is good for ARM32 architecture to sync register size and interpreter stack size. (+ and reduce memory a little.) For 8-byte elements, I changed it to use two stacks in some places.
If you think it is better to keep stack slot size to 8 bytes for ARM32 too, I will update it.

Also, if you change this, StackVal in interpexec.h needs to have its 8-byte elements removed, I believe.

Thank you. I missed.

+ Do you have any test set for interpreter implementation? If you have, could you share tests and how to test? Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterpreterTester and Interpreter.cs were where we started testing before we were able to run the whole test suite.

I'll leave it to one of the interpreter architects to say whether the stack slot size should stay at 8, I just wanted to let you know that it has wide-ranging consequences.

For what it's worth, the mono interpreter has 8-byte stack slots even on arm32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check the implementation with InterpreterTester.

Okay. It can make wide-ranging consequences even though I think there are some benefits for ARM32.
I will revert to 8 byte-stack slot.
Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kg I found a problem when I change it to 8 byte-stack slot. (Actually, I forgot implementation details during my long holidays. 🥲)

If I change it to 8 byte-stack slot, it seems passing args between compiled methods and interpreter is hard. When it passes two 4-bytes args or one 8-bytes arg, it uses 2 registers in ARM32. So if it is one 8-bytes arg, values in two registers are needed to be loaded from or stored to one stack slot. However in case of two 4-bytes args, values in two registers are loaded from or stored to two stack slots. In current implementation, argument passing is handled by Load_* and Store_* routines in assembly code without any type check. However, if stack slot is 8 bytes, I think it needs to do type check for all args (or make routines for all cases.).
How do you solve this in mono interpreter? Could you share any idea about this?
Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the mono interpreter does transitions using hand-written C helpers in most cases, so the C compiler solves the problem for us. @BrzVlad would probably know better though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kg Thank you. I think 4-bytes stack slot for arm32 isn't so bad idea to me. And if I isolate ARM32 implementation from the other arch well, I think it doesn't make wide-ranging consequences in other archs. What do you think?

#endif // _MSC_VER

#ifdef TARGET_64BIT
#if defined(TARGET_64BIT) || defined(TARGET_WASM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we want some 32-bit platforms to be on 8-byte interpreter stack alignment plan and other 32-bit platforms to be on the 64-bit interpreter stack alignment plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am not sure what you mean. From my understanding, you want to use the same 8-byte interpreter stack alignment plan for all platforms like @kg mentioned earlier. Is it correct?

I think I can make arm32 on 8-byte interpreter stack by adding more routines about 8 bytes value and 4 bytes value.
Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you want to use the same 8-byte interpreter stack alignment plan

Yes, that would be the simplest option.

- Handle copying args / ret value between interpreter stack and native stack
- No Range Expansion for value (>= 8 bytes)
- Terminate current routines and add a routine for the value
@clamp03
Copy link
Member Author

clamp03 commented Oct 24, 2025

@kg I tried Interpreter.cs. TestPreciseInitCctors test in Interpreter.cs fails with error message: preciseInitCctorsRun should be 1, but is 0 when I give any DOTNET_Interpreter option like DOTNET_Interpreter=none. It fails on x64 as well as arm32. Could you share how you are testing with it?

@jkotas I found DOTNET_Interpreter option turn off TieredCompilation. So JIT compiler optimize the method aggressively like below. (* If I just turn off TieredCompilation option only, it fails.) Failures are related to Inline Opt, CSE opt and RedundantBranch Opt. Is it known issue? And I just tested by giving mutatesHeaps property to INITCLASS helper. It seems work. I don't know why INITCLASS helper has isPure and doesn't have mutatesHeaps property?
Thank you.

public static bool TestPreciseInitCctors()
    {
        if (preciseInitCctorsRun != 0)
        {
            Console.WriteLine("preciseInitCctorsRun should be 0, but is {0}", preciseInitCctorsRun);
            return false;
        }
        MyPreciseInitClass<int>.TriggerCctorClass();
        if (preciseInitCctorsRun != 1)
        {
            Console.WriteLine("preciseInitCctorsRun should be 1, but is {0}", preciseInitCctorsRun);
            return false;
        }
        MyPreciseInitClass<short>.TriggerCctorMethod<int>();
        if (preciseInitCctorsRun != 2)
        {
            Console.WriteLine("TriggerCctorClass should return 2, but is {0}", preciseInitCctorsRun);
            return false;
        }

        object o = new MyPreciseInitClass<double>();
        if (preciseInitCctorsRun != 3)
        {
            Console.WriteLine("TriggerCctorClass should return 3, but is {0}", preciseInitCctorsRun);
            return false;
        }

        MyPreciseInitClass<object>.TriggerCctorClass();
        if (preciseInitCctorsRun != 4)
        {
            Console.WriteLine("preciseInitCctorsRun should be 4 but is {0}", preciseInitCctorsRun);
            return false;
        }
        MyPreciseInitClass<string>.TriggerCctorMethod<object>();
        if (preciseInitCctorsRun != 5)
        {
            Console.WriteLine("TriggerCctorClass should return 5, but is {0}", preciseInitCctorsRun);
            return false;
        }

        o = new MyPreciseInitClass<Type>();
        if (preciseInitCctorsRun != 6)
        {
            Console.WriteLine("TriggerCctorClass should return 6,  but is {0}", preciseInitCctorsRun);
            return false;
        }

        return true;
    }

Generated x64 codes. Tested with DOTNET_TieredCompilation=0 option only.

*************** After end code gen, before unwindEmit()
G_M12045_IG01:        ; func=00, offs=0x000000, size=0x0008, bbWeight=1, PerfScore 3.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG

IN0014: 000000 push     rbp
IN0015: 000001 push     rbx
IN0016: 000002 push     rax
IN0017: 000003 lea      rbp, [rsp+0x10]

G_M12045_IG02:        ; offs=0x000008, size=0x000A, bbWeight=1, PerfScore 3.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB01 [0000], byref, isz

IN0001: 000008 mov      ebx, dword ptr [(reloc 0x7e83cc85b1d0)]      ; static handle
IN0002: 00000E test     ebx, ebx
IN0003: 000010 jne      SHORT G_M12045_IG04

G_M12045_IG03:        ; offs=0x000012, size=0x0037, bbWeight=0.50, PerfScore 5.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB03 [0002], byref, isz

IN0004: 000012 mov      rdi, 0x7E83CE725318      ; MyFirstApp.Program+MyPreciseInitClass`1[int]
IN0005: 00001C call     [CORINFO_HELP_INITCLASS]
IN0006: 000022 mov      rdi, 0x7E83CE11A818      ; System.Int32
IN0007: 00002C call     CORINFO_HELP_NEWSFAST
IN0008: 000031 mov      dword ptr [rax+0x08], ebx
IN0009: 000034 mov      rsi, rax
IN000a: 000037 mov      rdi, 0x7E8448203248      ; 'preciseInitCctorsRun should be 1, but is {0}'
IN000b: 000041 call     [System.Console:WriteLine(System.String,System.Object)]
IN000c: 000047 jmp      SHORT G_M12045_IG05

G_M12045_IG04:        ; offs=0x000049, size=0x0025, bbWeight=0.50, PerfScore 2.88, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB02 [0001], byref

IN000d: 000049 mov      rdi, 0x7E83CE11A818      ; System.Int32
IN000e: 000053 call     CORINFO_HELP_NEWSFAST
IN000f: 000058 mov      dword ptr [rax+0x08], ebx
IN0010: 00005B mov      rsi, rax
IN0011: 00005E mov      rdi, 0x7E84482032B8      ; 'preciseInitCctorsRun should be 0, but is {0}'
IN0012: 000068 call     [System.Console:WriteLine(System.String,System.Object)]

G_M12045_IG05:        ; offs=0x00006E, size=0x0002, bbWeight=0.50, PerfScore 0.12, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB16 [0023], byref

IN0013: 00006E xor      eax, eax

G_M12045_IG06:        ; offs=0x000070, size=0x0007, bbWeight=0.50, PerfScore 1.12, epilog, nogc, extend

IN0018: 000070 add      rsp, 8
IN0019: 000074 pop      rbx
IN001a: 000075 pop      rbp
IN001b: 000076 ret

@kg
Copy link
Member

kg commented Oct 24, 2025

@kg I tried Interpreter.cs. TestPreciseInitCctors test in Interpreter.cs fails with error message: preciseInitCctorsRun should be 1, but is 0 when I give any DOTNET_Interpreter option like DOTNET_Interpreter=none. It fails on x64 as well as arm32. Could you share how you are testing with it?

The correct way to run Interpreter.cs is either by running InterpreterTester (it sets the environment variables) or by setting all the appropriate environment variables correctly.

IIRC you need to set a few vars - disable readytorun, disable tiered compilation, and force on the interpreter for everything inside the Interpreter.dll module. Are you using InterpreterTester? If so, it sounds like maybe your changes have caused a regression somehow. Does it work for you on latest main?

@clamp03
Copy link
Member Author

clamp03 commented Oct 24, 2025

Thank you so much.
I tested interpreter.dll with corerun directly. DOTNET_Interpreter="abc" ./corerun Interpreter.dll.
I will test interpreter correctly.

However, I think the test failure seems a bug in jit compiler.

@jkotas
Copy link
Member

jkotas commented Oct 24, 2025

However, I think the test failure seems a bug in jit compiler.

Yes, this looks like a bug. Opened #121066

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-Interpreter-coreclr area-VM-coreclr community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants