-
Notifications
You must be signed in to change notification settings - Fork 14.1k
[X86] Fix overflow with large stack probes on x86-64 #113219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-x86 Author: None (mconst) ChangesWhen emitting an inline stack probe loop, we can't use SUBri to calculate the loop bound if it doesn't fit in a 32-bit (possibly sign-extended) immediate. Fixes #113218. Full diff: https://github.com/llvm/llvm-project/pull/113219.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86FrameLowering.cpp b/llvm/lib/Target/X86/X86FrameLowering.cpp
index a35b04606e595d..a083f5919837f1 100644
--- a/llvm/lib/Target/X86/X86FrameLowering.cpp
+++ b/llvm/lib/Target/X86/X86FrameLowering.cpp
@@ -798,18 +798,43 @@ void X86FrameLowering::emitStackProbeInlineGenericLoop(
: Is64Bit ? X86::R11D
: X86::EAX;
- BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::COPY), FinalStackProbed)
- .addReg(StackPtr)
- .setMIFlag(MachineInstr::FrameSetup);
-
// save loop bound
{
- const unsigned BoundOffset = alignDown(Offset, StackProbeSize);
- const unsigned SUBOpc = getSUBriOpcode(Uses64BitFramePtr);
- BuildMI(MBB, MBBI, DL, TII.get(SUBOpc), FinalStackProbed)
- .addReg(FinalStackProbed)
- .addImm(BoundOffset)
- .setMIFlag(MachineInstr::FrameSetup);
+ const uint64_t BoundOffset = alignDown(Offset, StackProbeSize);
+
+ // Can we calculate the loop bound using SUB with a 32-bit immediate?
+ // Note that the immediate gets sign-extended when used with a 64-bit
+ // register, so in that case we only have 31 bits to work with.
+ bool canUseSub =
+ Uses64BitFramePtr ? isUInt<31>(BoundOffset) : isUInt<32>(BoundOffset);
+
+ if (canUseSub) {
+ const unsigned SUBOpc = getSUBriOpcode(Uses64BitFramePtr);
+
+ BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::COPY), FinalStackProbed)
+ .addReg(StackPtr)
+ .setMIFlag(MachineInstr::FrameSetup);
+ BuildMI(MBB, MBBI, DL, TII.get(SUBOpc), FinalStackProbed)
+ .addReg(FinalStackProbed)
+ .addImm(BoundOffset)
+ .setMIFlag(MachineInstr::FrameSetup);
+ } else if (Uses64BitFramePtr) {
+ BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), FinalStackProbed)
+ .addImm(-BoundOffset)
+ .setMIFlag(MachineInstr::FrameSetup);
+ BuildMI(MBB, MBBI, DL, TII.get(X86::ADD64rr), FinalStackProbed)
+ .addReg(FinalStackProbed)
+ .addReg(StackPtr)
+ .setMIFlag(MachineInstr::FrameSetup);
+ } else {
+ // We're being asked to probe a stack frame that's 4 GiB or larger,
+ // but our stack pointer is only 32 bits.
+ DiagnosticInfoResourceLimit Diag(MF.getFunction(),
+ "probed stack frame size", BoundOffset,
+ 0xffffffff, DS_Error, DK_ResourceLimit);
+ MF.getFunction().getContext().diagnose(Diag);
+ return;
+ }
// while in the loop, use loop-invariant reg for CFI,
// instead of the stack pointer, which changes during the loop
diff --git a/llvm/test/CodeGen/X86/stack-clash-huge.ll b/llvm/test/CodeGen/X86/stack-clash-huge.ll
new file mode 100644
index 00000000000000..03f028dfc25067
--- /dev/null
+++ b/llvm/test/CodeGen/X86/stack-clash-huge.ll
@@ -0,0 +1,84 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --no_x86_scrub_sp
+; RUN: llc -mtriple=x86_64-linux-android < %s | FileCheck -check-prefix=CHECK-X64 %s
+; RUN: llc -mtriple=i686-linux-android < %s | FileCheck -check-prefix=CHECK-X86 %s
+; RUN: llc -mtriple=x86_64-linux-gnux32 < %s | FileCheck -check-prefix=CHECK-X32 %s
+
+define i32 @foo() local_unnamed_addr #0 {
+; CHECK-X64-LABEL: foo:
+; CHECK-X64: # %bb.0:
+; CHECK-X64-NEXT: movabsq $-2399997952, %r11 # imm = 0xFFFFFFFF70F2F000
+; CHECK-X64-NEXT: addq %rsp, %r11
+; CHECK-X64-NEXT: .cfi_def_cfa_register %r11
+; CHECK-X64-NEXT: .cfi_adjust_cfa_offset 2399997952
+; CHECK-X64-NEXT: .LBB0_1: # =>This Inner Loop Header: Depth=1
+; CHECK-X64-NEXT: subq $4096, %rsp # imm = 0x1000
+; CHECK-X64-NEXT: movq $0, (%rsp)
+; CHECK-X64-NEXT: cmpq %r11, %rsp
+; CHECK-X64-NEXT: jne .LBB0_1
+; CHECK-X64-NEXT: # %bb.2:
+; CHECK-X64-NEXT: subq $1928, %rsp # imm = 0x788
+; CHECK-X64-NEXT: .cfi_def_cfa_register %rsp
+; CHECK-X64-NEXT: .cfi_def_cfa_offset 2399999888
+; CHECK-X64-NEXT: movl $1, 264(%rsp)
+; CHECK-X64-NEXT: movl $1, 28664(%rsp)
+; CHECK-X64-NEXT: movl -128(%rsp), %eax
+; CHECK-X64-NEXT: movl $2399999880, %ecx # imm = 0x8F0D1788
+; CHECK-X64-NEXT: addq %rcx, %rsp
+; CHECK-X64-NEXT: .cfi_def_cfa_offset 8
+; CHECK-X64-NEXT: retq
+;
+; CHECK-X86-LABEL: foo:
+; CHECK-X86: # %bb.0:
+; CHECK-X86-NEXT: movl %esp, %eax
+; CHECK-X86-NEXT: subl $2399997952, %eax # imm = 0x8F0D1000
+; CHECK-X86-NEXT: .cfi_def_cfa_register %eax
+; CHECK-X86-NEXT: .cfi_adjust_cfa_offset 2399997952
+; CHECK-X86-NEXT: .LBB0_1: # =>This Inner Loop Header: Depth=1
+; CHECK-X86-NEXT: subl $4096, %esp # imm = 0x1000
+; CHECK-X86-NEXT: movl $0, (%esp)
+; CHECK-X86-NEXT: cmpl %eax, %esp
+; CHECK-X86-NEXT: jne .LBB0_1
+; CHECK-X86-NEXT: # %bb.2:
+; CHECK-X86-NEXT: subl $2060, %esp # imm = 0x80C
+; CHECK-X86-NEXT: .cfi_def_cfa_register %esp
+; CHECK-X86-NEXT: .cfi_def_cfa_offset 2400000016
+; CHECK-X86-NEXT: movl $1, 392(%esp)
+; CHECK-X86-NEXT: movl $1, 28792(%esp)
+; CHECK-X86-NEXT: movl (%esp), %eax
+; CHECK-X86-NEXT: movl $2400000012, %ecx # imm = 0x8F0D180C
+; CHECK-X86-NEXT: addl %ecx, %esp
+; CHECK-X86-NEXT: .cfi_def_cfa_offset 4
+; CHECK-X86-NEXT: retl
+;
+; CHECK-X32-LABEL: foo:
+; CHECK-X32: # %bb.0:
+; CHECK-X32-NEXT: movl %esp, %r11d
+; CHECK-X32-NEXT: subl $2399997952, %r11d # imm = 0x8F0D1000
+; CHECK-X32-NEXT: .cfi_def_cfa_register %r11
+; CHECK-X32-NEXT: .cfi_adjust_cfa_offset 2399997952
+; CHECK-X32-NEXT: .LBB0_1: # =>This Inner Loop Header: Depth=1
+; CHECK-X32-NEXT: subl $4096, %esp # imm = 0x1000
+; CHECK-X32-NEXT: movq $0, (%esp)
+; CHECK-X32-NEXT: cmpl %r11d, %esp
+; CHECK-X32-NEXT: jne .LBB0_1
+; CHECK-X32-NEXT: # %bb.2:
+; CHECK-X32-NEXT: subl $1928, %esp # imm = 0x788
+; CHECK-X32-NEXT: .cfi_def_cfa_register %rsp
+; CHECK-X32-NEXT: .cfi_def_cfa_offset 2399999888
+; CHECK-X32-NEXT: movl $1, 264(%esp)
+; CHECK-X32-NEXT: movl $1, 28664(%esp)
+; CHECK-X32-NEXT: movl -128(%esp), %eax
+; CHECK-X32-NEXT: movl $2399999880, %ecx # imm = 0x8F0D1788
+; CHECK-X32-NEXT: addq %rcx, %esp
+; CHECK-X32-NEXT: .cfi_def_cfa_offset 8
+; CHECK-X32-NEXT: retq
+ %a = alloca i32, i64 600000000, align 16
+ %b0 = getelementptr inbounds i32, ptr %a, i64 98
+ %b1 = getelementptr inbounds i32, ptr %a, i64 7198
+ store volatile i32 1, ptr %b0
+ store volatile i32 1, ptr %b1
+ %c = load volatile i32, ptr %a
+ ret i32 %c
+}
+
+attributes #0 = {"probe-stack"="inline-asm"}
|
.setMIFlag(MachineInstr::FrameSetup); | ||
} else { | ||
// We're being asked to probe a stack frame that's 4 GiB or larger, | ||
// but our stack pointer is only 32 bits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned about printing an error for valid code... maybe less of an issue here than in other contexts, but generally, stack overflows should be a runtime error.
Maybe we can just set the size of the allocation to 2^32-1, and just let the OS generate a stack overflow trap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at this!
Sure, trapping at runtime sounds fine, although I'd kind of like to make it an unconditional trap rather than allocating less than the user requested and relying on it to fail. (It's not likely, but you could imagine a situation where the user's code is elsewhere in the address space and the entire lower 4 GiB is available for the stack. The stack pointer would also have to be unaligned.)
Would you replace the error with a warning, or just skip it entirely and rely on the existing -Wframe-larger-than
warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's realistic to worry about an 32-bit x86 application with an address-space larger than 4GB; I don't think it's actually possible to setup a configuration like that. But an explicit trap is probably okay too.
A warning has basically the same issues as an error: optimizations can cause it to trigger in dynamically unreachable code, and the user is left with a nonsense diagnostic that can only be fixed by blindly refactoring their code until the compiler stops complaining. So general policy is that we don't emit warnings from the backend for runtime issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I've updated the code.
Ping. (Current version looks okay to me, but I want a second opinion from an x86 backend expert.) |
; CHECK-X32-NEXT: movl $1, 28664(%esp) | ||
; CHECK-X32-NEXT: movl -128(%esp), %eax | ||
; CHECK-X32-NEXT: movl $2399999880, %ecx # imm = 0x8F0D1788 | ||
; CHECK-X32-NEXT: addq %rcx, %esp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at this!
I think this is an unrelated bug. LLVM generates the same invalid instruction even without my changes -- you can reproduce it by compiling the following code with -mx32 -fomit-frame-pointer
:
void foo() {
char x[0xa0000000];
}
https://godbolt.org/z/E41ecqPna
If you want, I can try to fix this, but maybe it might make more sense to do that as a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it's not from the modified code. Feel free to fix it or file an issue otherwise.
// We're being asked to probe a stack frame that's 4 GiB or larger, | ||
// but our stack pointer is only 32 bits. This might be unreachable | ||
// code, so don't complain now; just trap if it's reached at runtime. | ||
BuildMI(MBB, MBBI, DL, TII.get(X86::TRAP)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a test case for it if don't complain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
; CHECK-X32-NEXT: movl $1, 28664(%esp) | ||
; CHECK-X32-NEXT: movl -128(%esp), %eax | ||
; CHECK-X32-NEXT: movl $2399999880, %ecx # imm = 0x8F0D1788 | ||
; CHECK-X32-NEXT: addq %rcx, %esp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it's not from the modified code. Feel free to fix it or file an issue otherwise.
Ping |
@mconst do you have any update? Otherwise, I can help merge it. |
Nope, I think this is ready to merge! |
@mconst Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/12423 Here is the relevant piece of the build log for the reference
|
The buildbot failure looks like it's from the bad instruction you pointed out earlier (which actually gets detected automatically when expensive checks are enabled). I think the fix should be pretty simple -- I'll try writing up a PR for it soon. |
@mconst is the fix imminent? currently the expensive checks bots are failing due to this. I'd also suggest you add |
I should have the PR ready within a couple of hours at the latest, although it'll probably still take time for it to be reviewed and merged. If that's too long, feel free to revert this for now! Sure, I'll add |
Okay, I've submitted the fix as #124041. |
`X86FrameLowering::emitSPUpdate()` assumes that 64-bit targets use a 64-bit stack pointer, but that's not true on x32. When checking the stack pointer size, we need to look at `Uses64BitFramePtr` rather than `Is64Bit`. This avoids generating invalid instructions like `add esp, rcx`. For impossibly-large stack frames (4 GiB or larger with a 32-bit stack pointer), we were also generating invalid instructions like `mov eax, 5000000000`. The inline stack probe code already had a check for that situation; I've moved the check into `emitSPUpdate()`, so any attempt to allocate a 4 GiB stack frame with a 32-bit stack pointer will now trap rather than adjusting ESP by the wrong amount. This also fixes the "can't have 32-bit 16GB stack frame" assertion, which used to be triggerable by user code but is now correct. To help catch situations like this in the future, I've added `-verify-machineinstrs` to the stack clash tests that generate large stack frames. This fixes the expensive-checks buildbot failure caused by #113219.
When emitting an inline stack probe loop, we can't use SUBri to calculate the loop bound if it doesn't fit in a 32-bit (possibly sign-extended) immediate.
Fixes #113218.