Skip to content

AArch64: Clear hasSideEffects on AUT and AUTPAC. #141330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pcc
Copy link
Contributor

@pcc pcc commented May 24, 2025

With hasSideEffects=1 we unnecessarily inhibit certain optimizations in
the backend such as code motion which lead to slightly less efficient
codegen when the pseudo instructions are used frequently, e.g. in the
PFP use case. These annotations do not cause the instructions to be kept
without a use because the underlying intrinsics are IntrNoMem. Users
should be expected to use another means of keeping the use alive if they
need the trapping side effect without a use (e.g. llvm.fake.use).

Created using spr 1.3.6-beta.1
@llvmbot
Copy link
Member

llvmbot commented May 24, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Peter Collingbourne (pcc)

Changes

With hasSideEffects=1 we unnecessarily inhibit certain optimizations in
the backend such as code motion which lead to slightly less efficient
codegen when the pseudo instructions are used frequently, e.g. in the
PFP use case. These annotations do not cause the instructions to be kept
without a use because the underlying intrinsics are IntrNoMem. Users
should be expected to use another means of keeping the use alive if they
need the trapping side effect without a use (e.g. llvm.fake.use).


Full diff: https://github.com/llvm/llvm-project/pull/141330.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+2-2)
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 61055a66e8858..5674721be90c4 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -1943,7 +1943,7 @@ let Predicates = [HasPAuth] in {
   def AUT : Pseudo<(outs), (ins i32imm:$Key, i64imm:$Disc, GPR64noip:$AddrDisc),
                    []>, Sched<[WriteI, ReadI]> {
     let isCodeGenOnly = 1;
-    let hasSideEffects = 1;
+    let hasSideEffects = 0;
     let mayStore = 0;
     let mayLoad = 0;
     let Size = 32;
@@ -1960,7 +1960,7 @@ let Predicates = [HasPAuth] in {
                     i32imm:$PACKey, i64imm:$PACDisc, GPR64noip:$PACAddrDisc),
                []>, Sched<[WriteI, ReadI]> {
     let isCodeGenOnly = 1;
-    let hasSideEffects = 1;
+    let hasSideEffects = 0;
     let mayStore = 0;
     let mayLoad = 0;
     let Size = 48;

@davemgreen
Copy link
Collaborator

I feel I would expect these instructions to have side-effects, especially with FPAC. Can you give more details about what kind of optimization this enables and why it is safe? I guess you mean that we do not expect the exception to occur, and if it does it does not matter where it happens, only that it does?

@pcc
Copy link
Contributor Author

pcc commented May 28, 2025

I feel I would expect these instructions to have side-effects, especially with FPAC. Can you give more details about what kind of optimization this enables and why it is safe? I guess you mean that we do not expect the exception to occur, and if it does it does not matter where it happens, only that it does?

The kinds of optimizations that I was seeing was things like merging LDR/LDR into LDP if it happened to cross the AUT boundary. I'll see if I can show you an example tomorrow.

In general we only need this exception to happen in case of a use (and it needs to happen before the use by construction) and as you mention it doesn't really matter when it happens relative to the other instructions. This (exception only guaranteed if used) is already the case at the IR level (IntrNoMem allows the intrinsic to be dropped without a use), this just extends the same idea to the backend.

@pcc
Copy link
Contributor Author

pcc commented May 31, 2025

Here is an example extracted from fleetbench. With my PFP patch series and hasSideEffects=0 on AUTxMxN:

0000000000dad294 <_ZNK9grpc_core14filters_detail9StackData5emptyEv>:
  dad294:	aa0003e8 	mov	x8, x0
  dad298:	a9c12909 	ldp	x9, x10, [x8, #16]!
  dad29c:	dac11909 	autda	x9, x8
  dad2a0:	dac1190a 	autda	x10, x8
  dad2a4:	eb0a013f 	cmp	x9, x10
  dad2a8:	54000060 	b.eq	dad2b4 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x20>  // b.none
  dad2ac:	2a1f03e0 	mov	w0, wzr
  dad2b0:	d65f03c0 	ret
  dad2b4:	aa0003e8 	mov	x8, x0
  dad2b8:	a9c2a909 	ldp	x9, x10, [x8, #40]!
  dad2bc:	dac11909 	autda	x9, x8
  dad2c0:	dac1190a 	autda	x10, x8
  dad2c4:	eb0a013f 	cmp	x9, x10
  dad2c8:	54000060 	b.eq	dad2d4 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x40>  // b.none
  dad2cc:	2a1f03e0 	mov	w0, wzr
  dad2d0:	d65f03c0 	ret
  dad2d4:	aa0003e8 	mov	x8, x0
  dad2d8:	a9c52909 	ldp	x9, x10, [x8, #80]!
  dad2dc:	dac11909 	autda	x9, x8
  dad2e0:	dac1190a 	autda	x10, x8
  dad2e4:	eb0a013f 	cmp	x9, x10
  dad2e8:	54000060 	b.eq	dad2f4 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x60>  // b.none
  dad2ec:	2a1f03e0 	mov	w0, wzr
  dad2f0:	d65f03c0 	ret
  dad2f4:	aa0003e8 	mov	x8, x0
  dad2f8:	a9c7a909 	ldp	x9, x10, [x8, #120]!
  dad2fc:	dac11909 	autda	x9, x8
  dad300:	dac1190a 	autda	x10, x8
  dad304:	eb0a013f 	cmp	x9, x10
  dad308:	54000060 	b.eq	dad314 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x80>  // b.none
  dad30c:	2a1f03e0 	mov	w0, wzr
  dad310:	d65f03c0 	ret
  dad314:	aa0003e8 	mov	x8, x0
  dad318:	a9ca2909 	ldp	x9, x10, [x8, #160]!
  dad31c:	dac11909 	autda	x9, x8
  dad320:	dac1190a 	autda	x10, x8
  dad324:	eb0a013f 	cmp	x9, x10
  dad328:	54000060 	b.eq	dad334 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0xa0>  // b.none
  dad32c:	2a1f03e0 	mov	w0, wzr
  dad330:	d65f03c0 	ret
  dad334:	aa0003e8 	mov	x8, x0
  dad338:	a9cba909 	ldp	x9, x10, [x8, #184]!
  dad33c:	dac11909 	autda	x9, x8
  dad340:	dac1190a 	autda	x10, x8
  dad344:	eb0a013f 	cmp	x9, x10
  dad348:	54000060 	b.eq	dad354 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0xc0>  // b.none
  dad34c:	2a1f03e0 	mov	w0, wzr
  dad350:	d65f03c0 	ret
  dad354:	aa0003e8 	mov	x8, x0
  dad358:	a9ce2909 	ldp	x9, x10, [x8, #224]!
  dad35c:	dac11909 	autda	x9, x8
  dad360:	dac1190a 	autda	x10, x8
  dad364:	eb0a013f 	cmp	x9, x10
  dad368:	54000060 	b.eq	dad374 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0xe0>  // b.none
  dad36c:	2a1f03e0 	mov	w0, wzr
  dad370:	d65f03c0 	ret
  dad374:	aa0003e8 	mov	x8, x0
  dad378:	a9cfa909 	ldp	x9, x10, [x8, #248]!
  dad37c:	dac11909 	autda	x9, x8
  dad380:	dac1190a 	autda	x10, x8
  dad384:	eb0a013f 	cmp	x9, x10
  dad388:	54000060 	b.eq	dad394 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x100>  // b.none
  dad38c:	2a1f03e0 	mov	w0, wzr
  dad390:	d65f03c0 	ret
  dad394:	a9512808 	ldp	x8, x10, [x0, #272]
  dad398:	91044009 	add	x9, x0, #0x110
  dad39c:	dac11928 	autda	x8, x9
  dad3a0:	dac1192a 	autda	x10, x9
  dad3a4:	eb0a011f 	cmp	x8, x10
  dad3a8:	54000060 	b.eq	dad3b4 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x120>  // b.none
  dad3ac:	2a1f03e0 	mov	w0, wzr
  dad3b0:	d65f03c0 	ret
  dad3b4:	a952a808 	ldp	x8, x10, [x0, #296]
  dad3b8:	9104a009 	add	x9, x0, #0x128
  dad3bc:	dac11928 	autda	x8, x9
  dad3c0:	dac1192a 	autda	x10, x9
  dad3c4:	eb0a011f 	cmp	x8, x10
  dad3c8:	1a9f17e0 	cset	w0, eq	// eq = none
  dad3cc:	d65f03c0 	ret

The same but with hasSideEffects=1 on AUTxMxN:

0000000000dae788 <_ZNK9grpc_core14filters_detail9StackData5emptyEv>:
  dae788:	aa0003e8 	mov	x8, x0
  dae78c:	f8410d09 	ldr	x9, [x8, #16]!
  dae790:	dac11909 	autda	x9, x8
  dae794:	f940050a 	ldr	x10, [x8, #8]
  dae798:	dac1190a 	autda	x10, x8
  dae79c:	eb0a013f 	cmp	x9, x10
  dae7a0:	54000060 	b.eq	dae7ac <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x24>  // b.none
  dae7a4:	2a1f03e0 	mov	w0, wzr
  dae7a8:	d65f03c0 	ret
  dae7ac:	aa0003e8 	mov	x8, x0
  dae7b0:	f8428d09 	ldr	x9, [x8, #40]!
  dae7b4:	dac11909 	autda	x9, x8
  dae7b8:	f940050a 	ldr	x10, [x8, #8]
  dae7bc:	dac1190a 	autda	x10, x8
  dae7c0:	eb0a013f 	cmp	x9, x10
  dae7c4:	54000060 	b.eq	dae7d0 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x48>  // b.none
  dae7c8:	2a1f03e0 	mov	w0, wzr
  dae7cc:	d65f03c0 	ret
  dae7d0:	aa0003e8 	mov	x8, x0
  dae7d4:	f8450d09 	ldr	x9, [x8, #80]!
  dae7d8:	dac11909 	autda	x9, x8
  dae7dc:	f940050a 	ldr	x10, [x8, #8]
  dae7e0:	dac1190a 	autda	x10, x8
  dae7e4:	eb0a013f 	cmp	x9, x10
  dae7e8:	54000060 	b.eq	dae7f4 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x6c>  // b.none
  dae7ec:	2a1f03e0 	mov	w0, wzr
  dae7f0:	d65f03c0 	ret
  dae7f4:	aa0003e8 	mov	x8, x0
  dae7f8:	f8478d09 	ldr	x9, [x8, #120]!
  dae7fc:	dac11909 	autda	x9, x8
  dae800:	f940050a 	ldr	x10, [x8, #8]
  dae804:	dac1190a 	autda	x10, x8
  dae808:	eb0a013f 	cmp	x9, x10
  dae80c:	54000060 	b.eq	dae818 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x90>  // b.none
  dae810:	2a1f03e0 	mov	w0, wzr
  dae814:	d65f03c0 	ret
  dae818:	aa0003e8 	mov	x8, x0
  dae81c:	f84a0d09 	ldr	x9, [x8, #160]!
  dae820:	dac11909 	autda	x9, x8
  dae824:	f940050a 	ldr	x10, [x8, #8]
  dae828:	dac1190a 	autda	x10, x8
  dae82c:	eb0a013f 	cmp	x9, x10
  dae830:	54000060 	b.eq	dae83c <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0xb4>  // b.none
  dae834:	2a1f03e0 	mov	w0, wzr
  dae838:	d65f03c0 	ret
  dae83c:	aa0003e8 	mov	x8, x0
  dae840:	f84b8d09 	ldr	x9, [x8, #184]!
  dae844:	dac11909 	autda	x9, x8
  dae848:	f940050a 	ldr	x10, [x8, #8]
  dae84c:	dac1190a 	autda	x10, x8
  dae850:	eb0a013f 	cmp	x9, x10
  dae854:	54000060 	b.eq	dae860 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0xd8>  // b.none
  dae858:	2a1f03e0 	mov	w0, wzr
  dae85c:	d65f03c0 	ret
  dae860:	aa0003e8 	mov	x8, x0
  dae864:	f84e0d09 	ldr	x9, [x8, #224]!
  dae868:	dac11909 	autda	x9, x8
  dae86c:	f940050a 	ldr	x10, [x8, #8]
  dae870:	dac1190a 	autda	x10, x8
  dae874:	eb0a013f 	cmp	x9, x10
  dae878:	54000060 	b.eq	dae884 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0xfc>  // b.none
  dae87c:	2a1f03e0 	mov	w0, wzr
  dae880:	d65f03c0 	ret
  dae884:	aa0003e8 	mov	x8, x0
  dae888:	f84f8d09 	ldr	x9, [x8, #248]!
  dae88c:	dac11909 	autda	x9, x8
  dae890:	f940050a 	ldr	x10, [x8, #8]
  dae894:	dac1190a 	autda	x10, x8
  dae898:	eb0a013f 	cmp	x9, x10
  dae89c:	54000060 	b.eq	dae8a8 <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x120>  // b.none
  dae8a0:	2a1f03e0 	mov	w0, wzr
  dae8a4:	d65f03c0 	ret
  dae8a8:	f9408808 	ldr	x8, [x0, #272]
  dae8ac:	91044009 	add	x9, x0, #0x110
  dae8b0:	dac11928 	autda	x8, x9
  dae8b4:	f9408c0a 	ldr	x10, [x0, #280]
  dae8b8:	dac1192a 	autda	x10, x9
  dae8bc:	eb0a011f 	cmp	x8, x10
  dae8c0:	54000060 	b.eq	dae8cc <_ZNK9grpc_core14filters_detail9StackData5emptyEv+0x144>  // b.none
  dae8c4:	2a1f03e0 	mov	w0, wzr
  dae8c8:	d65f03c0 	ret
  dae8cc:	f9409408 	ldr	x8, [x0, #296]
  dae8d0:	9104a009 	add	x9, x0, #0x128
  dae8d4:	dac11928 	autda	x8, x9
  dae8d8:	f940980a 	ldr	x10, [x0, #304]
  dae8dc:	dac1192a 	autda	x10, x9
  dae8e0:	eb0a011f 	cmp	x8, x10
  dae8e4:	1a9f17e0 	cset	w0, eq	// eq = none
  dae8e8:	d65f03c0 	ret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants