Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISC-V] Add quirks for riscv to R2RDump #101683

Merged
merged 5 commits into from
May 16, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 233 additions & 0 deletions src/coreclr/tools/r2rdump/CoreDisTools.cs
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,10 @@ public int GetInstruction(RuntimeFunction rtf, int imageOffset, int rtfOffset, o
case Machine.ArmThumb2:
break;

case Machine.RiscV64:
ProbeRiscV64Quirks(rtf, imageOffset, rtfOffset, ref fixedTranslatedLine);
break;

default:
break;
}
Expand Down Expand Up @@ -1209,6 +1213,235 @@ private static bool IsAnotherRuntimeFunctionWithinMethod(int rva, RuntimeFunctio
return false;
}

/// <summary>
/// Improves disassembler output for RiscV64 by adding comments at the end of instructions.
/// </summary>
/// <param name="rtf">Runtime function</param>
/// <param name="imageOffset">Offset within the image byte array</param>
/// <param name="rtfOffset">Offset within the runtime function</param>
/// <param name="instruction">Textual representation of the instruction</param>
private void ProbeRiscV64Quirks(RuntimeFunction rtf, int imageOffset, int rtfOffset, ref string instruction)
{
const int InstructionSize = 4;
uint instr = BitConverter.ToUInt32(_reader.Image, imageOffset + rtfOffset);

/*
Pattern:
auipc
addi
ld
jarl
Copy link
Member

@clamp03 clamp03 May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update jarl to jalr.
Can you share code locations in runtime where you checked the patterns?

+ Please don't force-push. #100962 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no a strict line of code which checks this pattern.

Function ProbeRiscV64Quirks is looking for jalr instruction. If it finds it, then it goes back through the assembly code to calculate callee address. It is more analytical approach, than the fixed pattern. Unfortunately, instructions calculating this address do not always follow each other, sometimes unrelated instruction appears between them. Other problem was with multiple-wrapped calle, then more than one ld or lw are used to calculate callee address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. you cannot give codes in runtime for the patterns.
However, in other archs, their implementation is different from yours. And it looks like they have some strict patterns.
And if this code should be checked manually without any tests, I cannot review whether this is correct implementation or not.
More, I don't know well about r2rdump. I am so sorry.

@jkotas @ivdiazsa Could you review this PR?
Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know well about r2rdump

The point of this code in r2rdump is to make the disassembly output more human readable. Nothing in the rest of the runtime depends on it. There are no hard and fast rules for how it should behave.

@SzpejnaDawid Could you please share some examples for what the output looks like with this change, including the problematic cases like more than one ld or lw are used?

Copy link
Contributor Author

@SzpejnaDawid SzpejnaDawid May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that System.Private.CoreLib.dll has some interesting examples:

System.Exception Interop.GetExceptionForIoErrno(Interop+ErrorInfo, string, bool)
...
14811c: 00e00397    auipc   t2, 3584
148120: f443b383    ld      t2, -188(t2)
148124: 000380e7    jalr    t2 // WRITE_BARRIER (HELPER)
...
148528: 00e0cf17    auipc   t5, 3596
14852c: 550f0f13    addi    t5, t5, 1360
148530: 00000593    li      a1, 0
148534: 000f3603    ld      a2, 0(t5)
148538: 000600e7    jalr    a2 // System.Exception Interop.GetIOException(Interop+ErrorInfo, string) (METHOD_ENTRY_DEF_TOKEN)
...
148840: 00e10f17    auipc   t5, 3600
148844: a20f0f13    addi    t5, t5, -1504
148848: 000f3583    ld      a1, 0(t5)
14884c: 000580e7    jalr    a1 // string System.SR.GetResourceString(string) (METHOD_ENTRY_DEF_TOKEN)
...

int Interop+Globalization.GetCalendars(string, System.Globalization.CalendarId[], int)
...
14935c: 00e22517    auipc   a0, 3618
149360: c5450513    addi    a0, a0, -940
149364: 00053503    ld      a0, 0(a0)
149368: 00053683    ld      a3, 0(a0)
14936c: fe443503    ld      a0, -28(s0)
149370: fec42603    lw      a2, -20(s0)
149374: fdc43583    ld      a1, -36(s0)
149378: fd443f03    ld      t5, -44(s0)
14937c: 000680e7    jalr    a3 // int Interop+Globalization.<GetCalendars>g____PInvoke|0_0(ushort*, System.Globalization.CalendarId*, int) (INDIRECT_PINVOKE_TARGET)

I know that my way of calculation callee address is not fixed as for arm. Nevertheless, the only way to be sure how jumps are generated is to read a code of coregen2 in order to see how it generates riscv instructions. But the profit and loss ratio of this approach is unfavorable, imo.

, where addi and ld may be repeated many times.
Instructions that have no effect on the jump
address calculation are skipped.
*/
if (IsRiscV64JalrInstruction(instr))
{
AnalyzeRiscV64Itype(instr, out uint rd, out uint rs1, out int imm);
uint register = rs1;
int immValue = imm;

int target = 0;
bool isFound = false;
int currentInstrOffset = rtfOffset - InstructionSize;
// skip all subsequent ld,lw instructions until first non-ld,lw instruction
do
{
instr = BitConverter.ToUInt32(_reader.Image, imageOffset + currentInstrOffset);

if (IsRiscV64LdInstruction(instr) || IsRiscV64LwInstruction(instr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just curious. In your examples, check for lw instruction is not necessary. Could you give an example with lw instruction? Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i know there is no function call which uses lw to calculate jump address. However, this check for lw is necessary because this first if is trying to skip all ld and lw instructions which are used to push arguments (likely). For example

int Interop+Globalization.GetCalendars(string, System.Globalization.CalendarId[], int)
...
14935c: 00e22517    auipc   a0, 3618
149360: c5450513    addi    a0, a0, -940
149364: 00053503    ld      a0, 0(a0)
149368: 00053683    ld      a3, 0(a0)
14936c: fe443503    ld      a0, -28(s0)
149370: fec42603    lw      a2, -20(s0)
149374: fdc43583    ld      a1, -36(s0)
149378: fd443f03    ld      t5, -44(s0)
14937c: 000680e7    jalr    a3 // <calle name>
....

bool Interop+Globalization.EnumCalendarInfo(IntPtr, string, System.Globalization.CalendarId, System.Globalization.CalendarDataType, IntPtr)n
...
149470: 00e22517    auipc   a0, 3618
149474: b5050513    addi    a0, a0, -1200
149478: 00053503    ld      a0, 0(a0)
14947c: 00053783    ld      a5, 0(a0)
149480: fe843583    ld      a1, -24(s0)
149484: ff442603    lw      a2, -12(s0)
149488: fe043503    ld      a0, -32(s0)
14948c: fd843703    ld      a4, -40(s0)
149490: ff042683    lw      a3, -16(s0)
149494: fd043f03    ld      t5, -48(s0)
149498: 000780e7    jalr    a5 // <calle name>
...

void Interop+Globalization.InitOrdinalCasingPage(int, char*)
...
149634: 00e22517    auipc   a0, 3618
149638: 91c50513    addi    a0, a0, -1764
14963c: 00053503    ld      a0, 0(a0)
149640: 00053603    ld      a2, 0(a0)
149644: ff443583    ld      a1, -12(s0)
149648: ffc42503    lw      a0, -4(s0)
14964c: fec43f03    ld      t5, -20(s0)
149650: 000600e7    jalr    a2 // <calle name>
...

int Interop+Globalization.CompareString(IntPtr, char*, int, char*, int, System.Globalization.CompareOptions)
...
149aa4: 00e21517    auipc   a0, 3617
149aa8: 4bc50513    addi    a0, a0, 1212
149aac: 00053503    ld      a0, 0(a0)
149ab0: 00053803    ld      a6, 0(a0)
149ab4: ffc42783    lw      a5, -4(s0)
149ab8: fec43503    ld      a0, -20(s0)
149abc: fe443583    ld      a1, -28(s0)
149ac0: ff842603    lw      a2, -8(s0)
149ac4: fdc43683    ld      a3, -36(s0)
149ac8: ff442703    lw      a4, -12(s0)
149acc: fd443f03    ld      t5, -44(s0)
149ad0: 000800e7    jalr    a6 // <calle name>

As you can see, the sequence of lw and ld instructions is not fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. IMO, lw instructions can be skipped in line 1334 in StaticAnalyzeRiscV64Assembly. Am I wrong?

else
{
    // check if "register" is calculated using an unsupported instruction
    uint rd = (instr >> 7) & 0b_11111U;
    if (rd == register)
    {
        return false;
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to skip lw in StaticAnalyzeRiscV64Assembly, we will have to skip ld too. In other case, we would have a problem, for example, with this:

void Interop+Globalization.InitOrdinalCasingPage(int, char*)
...
149634: 00e22517    auipc   a0, 3618
149638: 91c50513    addi    a0, a0, -1764
14963c: 00053503    ld      a0, 0(a0)
149640: 00053603    ld      a2, 0(a0)
149644: ff443583    ld      a1, -12(s0)
149648: ffc42503    lw      a0, -4(s0)
14964c: fec43f03    ld      t5, -20(s0)
149650: 000600e7    jalr    a2 // <calle name>

Copy link
Member

@clamp03 clamp03 May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You are right. However, actually what I want are removing 'lw' and implementing simpler like #102146. Could you check loongarch PR and share your idea? Thank you.

Copy link
Contributor Author

@SzpejnaDawid SzpejnaDawid May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified code which calculates callee address

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem :D

{
AnalyzeRiscV64Itype(instr, out rd, out rs1, out imm);
if (rd == register)
{
immValue = imm;
register = rs1;
}
}
else
{
// frist non-ld instruction
jkotas marked this conversation as resolved.
Show resolved Hide resolved
isFound = StaticAnalyzeRiscV64Assembly(
rtf.StartAddress + currentInstrOffset,
currentInstrOffset,
imageOffset,
register,
ref target);
break;
}

currentInstrOffset -= InstructionSize;
} while (currentInstrOffset > 0);

if (isFound)
{
if (!TryGetImportCellName(target + immValue, out string targetName) || string.IsNullOrWhiteSpace(targetName))
{
return;
}

StringBuilder updatedInstruction = new();
updatedInstruction.Append(instruction);
updatedInstruction.Append(" // ");
updatedInstruction.Append(targetName);
instruction = updatedInstruction.ToString();
}
}

}

/// <summary>
/// Counts the value stored in a given register.
/// </summary>
/// <param name="currentPC">Program Counter of analyzed instruction</param>
/// <param name="currentInstrOffset">Offset within the runtime function of analyzed instruction</param>
/// <param name="imageOffset">Offset within the image byte array</param>
/// <param name="register">Register whose value is being searched for</param>
/// <param name="registerValue">Found value of the register</param>
/// <returns>If the value stored in a given register is countable, the function returns true. Otherwise false</returns>
private bool StaticAnalyzeRiscV64Assembly(int currentPC, int currentInstrOffset, int imageOffset, uint register, ref int registerValue)
{
const int InstructionSize = 4;

// if currentInstrOffset is less than 0 then it is the end of this analyzed method
if (currentInstrOffset < 0)
{
return false;
}

do
{
uint instr = BitConverter.ToUInt32(_reader.Image, imageOffset + currentInstrOffset);
if (IsRiscV64AddiInstruction(instr))
{
AnalyzeRiscV64Itype(instr, out uint rd, out uint rs1, out int imm);
if (rd == register)
SzpejnaDawid marked this conversation as resolved.
Show resolved Hide resolved
{
int rs1Value = 0;
bool returnValue = StaticAnalyzeRiscV64Assembly(currentPC - InstructionSize, currentInstrOffset - InstructionSize, imageOffset, rs1, ref rs1Value);
registerValue = rs1Value + imm;
return returnValue;
}
}
else if (IsRiscV64AuipcInstruction(instr))
{
AnalyzeRiscV64Utype(instr, out uint rd, out int imm);
if (rd == register)
{
registerValue = currentPC + imm;
return true;
}
}
else
{
// check if "register" is calculated using an unsupported instruction
uint rd = (instr >> 7) & 0b_11111U;
Copy link
Member

@clamp03 clamp03 May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If instr is BType, JType or SType like sd which 11-7 bits are offset. It can return false when immediate value is same to register. I think this pattern is not produced in our runtime. I just want to check it is intended?
Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my honest opinion, i have never seen a situation when an instruction, like sd, would broke the pattern which is acceptable by ProbeRiscV64Quirks(...). So I assume that it is not produced in the runtime.

if (rd == register)
{
return false;
}
}

currentInstrOffset -= InstructionSize;
currentPC -= InstructionSize;
} while (currentInstrOffset > 0);

return false;
}

/// <summary>
/// Checks if instruction is auipc.
/// </summary>
/// <param name="instruction">Assembly code of instruction</param>
/// <returns>It returns true if instruction is auipc. Otherwise false</returns>
private bool IsRiscV64AuipcInstruction(uint instruction)
{
const uint OpcodeAuipc = 0b_0010111;
return (instruction & 0x7f) == OpcodeAuipc;
}

/// <summary>
/// Checks if instruction is jalr.
/// </summary>
/// <param name="instruction">Assembly code of instruction</param>
/// <returns>It returns true if instruction is jalr. Otherwise false</returns>
private bool IsRiscV64JalrInstruction(uint instruction)
{
const uint OpcodeJalr = 0b_1100111;
const uint Funct3Jalr = 0b_000;
return (instruction & 0x7f) == OpcodeJalr &&
((instruction >> 12) & 0b_111) == Funct3Jalr;
}

/// <summary>
/// Checks if instruction is addi.
/// </summary>
/// <param name="instruction">Assembly code of instruction</param>
/// <returns>It returns true if instruction is addi. Otherwise false</returns>
private bool IsRiscV64AddiInstruction(uint instruction)
{
const uint OpcodeAddi = 0b_0010011;
const uint Funct3Addi = 0b_000;
return (instruction & 0x7f) == OpcodeAddi &&
((instruction >> 12) & 0b_111) == Funct3Addi;
}

/// <summary>
/// Checks if instruction is ld.
/// </summary>
/// <param name="instruction">Assembly code of instruction</param>
/// <returns>It returns true if instruction is ld. Otherwise false</returns>
private bool IsRiscV64LdInstruction(uint instruction)
{
const uint OpcodeLd = 0b_0000011;
const uint Funct3Ld = 0b_011;
return (instruction & 0x7f) == OpcodeLd &&
((instruction >> 12) & 0b_111) == Funct3Ld;
}

/// <summary>
/// Checks if instruction is lw.
/// </summary>
/// <param name="instruction">Assembly code of instruction</param>
/// <returns>It returns true if instruction is lw. Otherwise false</returns>
private bool IsRiscV64LwInstruction(uint instruction)
{
const uint OpcodeLw = 0b_0000011;
const uint Funct3Lw = 0b_010;
return (instruction & 0x7f) == OpcodeLw &&
((instruction >> 12) & 0b_111) == Funct3Lw;
}

/// <summary>
/// Retrieves output register and immediate value from U-type instruction.
/// </summary>
/// <param name="instruction">Assembly code of instruction</param>
/// <param name="rd">Output register</param>
/// <param name="imm">Immediate value</param>
private void AnalyzeRiscV64Utype(uint instruction, out uint rd, out int imm)
{
// U-type 31 12 11 7 6 0
// [ imm ] [ rd ] [ opcode ]
rd = (instruction >> 7) & 0b_11111U;
imm = unchecked((int)(instruction & (0xfffff << 12)));
}

/// <summary>
/// Retrieves output register, resource register and immediate value from U-type instruction.
/// </summary>
/// <param name="instruction">Assembly code of instruction</param>
/// <param name="rd">Output register</param>
/// <param name="rs1">Resource register</param>
/// <param name="imm">Immediate value</param>
private void AnalyzeRiscV64Itype(uint instruction, out uint rd, out uint rs1, out int imm)
{
// I-type 31 20 19 15 14 12 11 7 6 0
// [ imm ] [ rs1 ] [ funct3 ] [ rd ] [ opcode ]
rd = (instruction >> 7) & 0b_11111U;
rs1 = (instruction >> 15) & 0b_11111U;
imm = unchecked((int)instruction) >> 20;
}

/// <summary>
/// Determine whether a given character is an ASCII digit.
/// </summary>
Expand Down
Loading