Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISC-V] Add quirks for riscv to R2RDump #101683

Merged
merged 5 commits into from
May 16, 2024
Merged

Conversation

SzpejnaDawid
Copy link
Contributor

This PR adds a functionality ProbeRiscV64Quirks(...) to R2RDump in order to provide target function labels in the form of comments at the end of any jalr instructions. it mimics a similar solution for x64 and arm.

Part of #84834, cc @dotnet/samsung

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-R2RDump-coreclr Ready-to-run image dump tool label Apr 29, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 29, 2024
@SzpejnaDawid
Copy link
Contributor Author

@dotnet-policy-service agree company="Samsung"

@am11
Copy link
Member

am11 commented Apr 29, 2024

cc @LuckyXu-HF, @shushanhf LA64 could use a similar approach.

src/coreclr/tools/r2rdump/CoreDisTools.cs Outdated Show resolved Hide resolved
src/coreclr/tools/r2rdump/CoreDisTools.cs Outdated Show resolved Hide resolved
src/coreclr/tools/r2rdump/CoreDisTools.cs Outdated Show resolved Hide resolved
src/coreclr/tools/r2rdump/CoreDisTools.cs Outdated Show resolved Hide resolved
@clamp03 clamp03 added the arch-riscv Related to the RISC-V architecture label Apr 30, 2024
Copy link
Member

@clamp03 clamp03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you test this PR? Thank you.

src/coreclr/tools/r2rdump/CoreDisTools.cs Outdated Show resolved Hide resolved
src/coreclr/tools/r2rdump/CoreDisTools.cs Outdated Show resolved Hide resolved
@LuckyXu-HF
Copy link
Contributor

cc @LuckyXu-HF, @shushanhf LA64 could use a similar approach.

Thanks so much for remind us! We will implement it next week as we will start a holiday tomorrow.
(Currently we left a TODO-LoongArch64 note for ProbeLoongArch64Quirks() as we temporarily not using this function. Anyway, we will need this sooner or later.)

src/coreclr/tools/r2rdump/CoreDisTools.cs Outdated Show resolved Hide resolved
src/coreclr/tools/r2rdump/CoreDisTools.cs Outdated Show resolved Hide resolved
src/coreclr/tools/r2rdump/CoreDisTools.cs Outdated Show resolved Hide resolved
@SzpejnaDawid
Copy link
Contributor Author

How do you test this PR? Thank you.

It wasn't straightforward so i tested it manually. I crossgened four libraries (incl. System.Private.CoreLib.dll) for riscv64 and x64 and compared calle's names between r2rdumps. r2rdumps files for x64 have more comments than riscv64 because mov instructions are also labeled. I only compared call and jalr instructions.

auipc
addi
ld
jarl
Copy link
Member

@clamp03 clamp03 May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update jarl to jalr.
Can you share code locations in runtime where you checked the patterns?

+ Please don't force-push. #100962 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no a strict line of code which checks this pattern.

Function ProbeRiscV64Quirks is looking for jalr instruction. If it finds it, then it goes back through the assembly code to calculate callee address. It is more analytical approach, than the fixed pattern. Unfortunately, instructions calculating this address do not always follow each other, sometimes unrelated instruction appears between them. Other problem was with multiple-wrapped calle, then more than one ld or lw are used to calculate callee address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. you cannot give codes in runtime for the patterns.
However, in other archs, their implementation is different from yours. And it looks like they have some strict patterns.
And if this code should be checked manually without any tests, I cannot review whether this is correct implementation or not.
More, I don't know well about r2rdump. I am so sorry.

@jkotas @ivdiazsa Could you review this PR?
Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know well about r2rdump

The point of this code in r2rdump is to make the disassembly output more human readable. Nothing in the rest of the runtime depends on it. There are no hard and fast rules for how it should behave.

@SzpejnaDawid Could you please share some examples for what the output looks like with this change, including the problematic cases like more than one ld or lw are used?

Copy link
Contributor Author

@SzpejnaDawid SzpejnaDawid May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that System.Private.CoreLib.dll has some interesting examples:

System.Exception Interop.GetExceptionForIoErrno(Interop+ErrorInfo, string, bool)
...
14811c: 00e00397    auipc   t2, 3584
148120: f443b383    ld      t2, -188(t2)
148124: 000380e7    jalr    t2 // WRITE_BARRIER (HELPER)
...
148528: 00e0cf17    auipc   t5, 3596
14852c: 550f0f13    addi    t5, t5, 1360
148530: 00000593    li      a1, 0
148534: 000f3603    ld      a2, 0(t5)
148538: 000600e7    jalr    a2 // System.Exception Interop.GetIOException(Interop+ErrorInfo, string) (METHOD_ENTRY_DEF_TOKEN)
...
148840: 00e10f17    auipc   t5, 3600
148844: a20f0f13    addi    t5, t5, -1504
148848: 000f3583    ld      a1, 0(t5)
14884c: 000580e7    jalr    a1 // string System.SR.GetResourceString(string) (METHOD_ENTRY_DEF_TOKEN)
...

int Interop+Globalization.GetCalendars(string, System.Globalization.CalendarId[], int)
...
14935c: 00e22517    auipc   a0, 3618
149360: c5450513    addi    a0, a0, -940
149364: 00053503    ld      a0, 0(a0)
149368: 00053683    ld      a3, 0(a0)
14936c: fe443503    ld      a0, -28(s0)
149370: fec42603    lw      a2, -20(s0)
149374: fdc43583    ld      a1, -36(s0)
149378: fd443f03    ld      t5, -44(s0)
14937c: 000680e7    jalr    a3 // int Interop+Globalization.<GetCalendars>g____PInvoke|0_0(ushort*, System.Globalization.CalendarId*, int) (INDIRECT_PINVOKE_TARGET)

I know that my way of calculation callee address is not fixed as for arm. Nevertheless, the only way to be sure how jumps are generated is to read a code of coregen2 in order to see how it generates riscv instructions. But the profit and loss ratio of this approach is unfavorable, imo.

{
instr = BitConverter.ToUInt32(_reader.Image, imageOffset + currentInstrOffset);

if (IsRiscV64LdInstruction(instr) || IsRiscV64LwInstruction(instr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just curious. In your examples, check for lw instruction is not necessary. Could you give an example with lw instruction? Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i know there is no function call which uses lw to calculate jump address. However, this check for lw is necessary because this first if is trying to skip all ld and lw instructions which are used to push arguments (likely). For example

int Interop+Globalization.GetCalendars(string, System.Globalization.CalendarId[], int)
...
14935c: 00e22517    auipc   a0, 3618
149360: c5450513    addi    a0, a0, -940
149364: 00053503    ld      a0, 0(a0)
149368: 00053683    ld      a3, 0(a0)
14936c: fe443503    ld      a0, -28(s0)
149370: fec42603    lw      a2, -20(s0)
149374: fdc43583    ld      a1, -36(s0)
149378: fd443f03    ld      t5, -44(s0)
14937c: 000680e7    jalr    a3 // <calle name>
....

bool Interop+Globalization.EnumCalendarInfo(IntPtr, string, System.Globalization.CalendarId, System.Globalization.CalendarDataType, IntPtr)n
...
149470: 00e22517    auipc   a0, 3618
149474: b5050513    addi    a0, a0, -1200
149478: 00053503    ld      a0, 0(a0)
14947c: 00053783    ld      a5, 0(a0)
149480: fe843583    ld      a1, -24(s0)
149484: ff442603    lw      a2, -12(s0)
149488: fe043503    ld      a0, -32(s0)
14948c: fd843703    ld      a4, -40(s0)
149490: ff042683    lw      a3, -16(s0)
149494: fd043f03    ld      t5, -48(s0)
149498: 000780e7    jalr    a5 // <calle name>
...

void Interop+Globalization.InitOrdinalCasingPage(int, char*)
...
149634: 00e22517    auipc   a0, 3618
149638: 91c50513    addi    a0, a0, -1764
14963c: 00053503    ld      a0, 0(a0)
149640: 00053603    ld      a2, 0(a0)
149644: ff443583    ld      a1, -12(s0)
149648: ffc42503    lw      a0, -4(s0)
14964c: fec43f03    ld      t5, -20(s0)
149650: 000600e7    jalr    a2 // <calle name>
...

int Interop+Globalization.CompareString(IntPtr, char*, int, char*, int, System.Globalization.CompareOptions)
...
149aa4: 00e21517    auipc   a0, 3617
149aa8: 4bc50513    addi    a0, a0, 1212
149aac: 00053503    ld      a0, 0(a0)
149ab0: 00053803    ld      a6, 0(a0)
149ab4: ffc42783    lw      a5, -4(s0)
149ab8: fec43503    ld      a0, -20(s0)
149abc: fe443583    ld      a1, -28(s0)
149ac0: ff842603    lw      a2, -8(s0)
149ac4: fdc43683    ld      a3, -36(s0)
149ac8: ff442703    lw      a4, -12(s0)
149acc: fd443f03    ld      t5, -44(s0)
149ad0: 000800e7    jalr    a6 // <calle name>

As you can see, the sequence of lw and ld instructions is not fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. IMO, lw instructions can be skipped in line 1334 in StaticAnalyzeRiscV64Assembly. Am I wrong?

else
{
    // check if "register" is calculated using an unsupported instruction
    uint rd = (instr >> 7) & 0b_11111U;
    if (rd == register)
    {
        return false;
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to skip lw in StaticAnalyzeRiscV64Assembly, we will have to skip ld too. In other case, we would have a problem, for example, with this:

void Interop+Globalization.InitOrdinalCasingPage(int, char*)
...
149634: 00e22517    auipc   a0, 3618
149638: 91c50513    addi    a0, a0, -1764
14963c: 00053503    ld      a0, 0(a0)
149640: 00053603    ld      a2, 0(a0)
149644: ff443583    ld      a1, -12(s0)
149648: ffc42503    lw      a0, -4(s0)
14964c: fec43f03    ld      t5, -20(s0)
149650: 000600e7    jalr    a2 // <calle name>

Copy link
Member

@clamp03 clamp03 May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You are right. However, actually what I want are removing 'lw' and implementing simpler like #102146. Could you check loongarch PR and share your idea? Thank you.

Copy link
Contributor Author

@SzpejnaDawid SzpejnaDawid May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified code which calculates callee address

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem :D

else
{
// check if "register" is calculated using an unsupported instruction
uint rd = (instr >> 7) & 0b_11111U;
Copy link
Member

@clamp03 clamp03 May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If instr is BType, JType or SType like sd which 11-7 bits are offset. It can return false when immediate value is same to register. I think this pattern is not produced in our runtime. I just want to check it is intended?
Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my honest opinion, i have never seen a situation when an instruction, like sd, would broke the pattern which is acceptable by ProbeRiscV64Quirks(...). So I assume that it is not produced in the runtime.

AnalyzeRiscV64Itype(instr, out rd, out rs1, out imm);
if (rd == register)
{
target =+ imm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Have you seen a pattern which needs addition for addi instruction? IMO, it doesn't generate two addis for an address calculation. And it doesn't need both immediate values in ld and addi because ld and addi has the same bit size (signed 12 bits) for immediate value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are patterns which use addi. You can find them here, link. As you can see, last three examples show patterns which use addi. Only the first one uses ld to add an immediate value instead of addi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you don't need target += imm. target = imm is enough because address is calculated with imm in auipc and one imm in ld or addi in patterns. If I am wrong, please let me know

Copy link
Contributor Author

@SzpejnaDawid SzpejnaDawid May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that target = imm is enough. My explanation for target += imm is that i wanted to cover even those patterns which could exist and i didn't meet them. If you think that we shouldn't worry so much for the future I can change it to target = imm

Copy link
Member

@clamp03 clamp03 May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we don't need to worry such cases. If there are such cases, it is a bug we need to fix or optimization candidates. :)
If target already has a value when we check addi, it means a value is set from other addi or ld before.

  • We don't need to generate multiple addi for an address calculation.
  • And ld and addi have the same offset bits. So addi is redundant code in case of that target is set by ld before.

However, if you want, you can leave it. That is what I set it a Nit.
Thank you.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@jkotas jkotas merged commit dafaf4a into dotnet:main May 16, 2024
87 checks passed
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
* [RISC-V] Add quirks for riscv

* [RISC-V] minimize code
@github-actions github-actions bot locked and limited conversation to collaborators Jun 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Related to the RISC-V architecture area-R2RDump-coreclr Ready-to-run image dump tool community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants