merge main into amd-staging #784

z1-cciauto · 2025-12-07T20:06:22Z

No description provided.

setzucc with memory operand is same as setcc but the later is shorter.

This was marked as deprecated in 2022, but as comment. Switch to error to make visible and stop generating. Will remove the error message in follow up, just felt this was easier for folks to understand compilation errors. The change required to new form is rather minimal.

…170994) Bitwidths greater than 64 are not supported by `arith-to-apfloat`.

This change makes inlining logic in the translator simpler and more consistent by (a) Extending the inlining concept to include CExpression ops, which by definition are inlined if and only if they reside within an ExpressionOp. (b) Concentraing all inlining decisions in `shouldBeInlined()` to make sure that ops get the same decision when queried as operations and as operands.

The op was not added to `hasDeferredEmission()` when introduced by f17abc2, causing incorrect translation.

…k_wait (llvm#161086) This is to address llvm#146145 The issue before was that, for `std::atomic::wait/notify`, we only support `uint64_t` to go through the native `ulock_wait` directly. Any other types will go through the global contention table's `atomic`, increasing the chances of spurious wakeup. This PR tries to allow any types that are of size 4 or 8 to directly go to the `ulock_wait`. This PR is just proof of concept. If we like this idea, I can go further to update the Linux/FreeBSD branch and add ABI macros so the existing behaviours are reserved under the stable ABI Here are some benchmark results ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- BM_stop_token_single_thread_reg_unreg_callback/1024 -0.1113 -0.1165 51519 45785 51397 45408 BM_stop_token_single_thread_reg_unreg_callback/4096 -0.2727 -0.1447 249685 181608 211865 181203 BM_stop_token_single_thread_reg_unreg_callback/65536 -0.1241 -0.1237 3308930 2898396 3300986 2892608 BM_stop_token_single_thread_reg_unreg_callback/262144 +0.0335 -0.1920 13237682 13681632 13208849 10673254 OVERALL_GEOMEAN -0.1254 -0.1447 0 0 0 0 ``` ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/65536 -0.3344 -0.2424 5960741 3967212 5232250 3964085 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/131072 -0.1474 -0.1475 9144356 7796745 9137547 7790193 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/262144 -0.1336 -0.1340 18333441 15883805 18323711 15868500 OVERALL_GEOMEAN -0.2107 -0.1761 0 0 0 0 ``` ``` Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<2>, NumHighPrioTasks<0>>/16384 +0.2321 -0.0081 836618 1030772 833197 826476 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<2>, NumHighPrioTasks<0>>/32768 -0.3034 -0.1329 2182721 1520569 1747211 1515028 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<2>, NumHighPrioTasks<0>>/65536 -0.0924 -0.0924 3389098 3075897 3378486 3066448 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<8>, NumHighPrioTasks<0>>/4096 +0.0464 +0.0474 664233 695080 657736 688892 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<8>, NumHighPrioTasks<0>>/8192 -0.0279 -0.0267 1336041 1298794 1324270 1288953 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<8>, NumHighPrioTasks<0>>/16384 +0.0270 +0.0304 2543004 2611786 2517471 2593975 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<32>, NumHighPrioTasks<0>>/1024 +0.0423 +0.0941 473621 493657 325604 356245 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<32>, NumHighPrioTasks<0>>/2048 +0.0420 +0.0675 906266 944349 636253 679169 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<32>, NumHighPrioTasks<0>>/4096 +0.0359 +0.0378 1761584 1824783 1015092 1053439 OVERALL_GEOMEAN -0.0097 -0.0007 0 0 0 0 ``` ``` Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/4096 -0.0990 -0.1001 371100 334370 369984 332955 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/8192 -0.0305 -0.0314 698228 676908 696418 674585 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/16384 -0.0258 -0.0268 1383530 1347894 1380665 1343680 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<8>, NumHighPrioTasks<0>>/1024 +0.0465 +0.4702 937821 981388 472087 694082 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<8>, NumHighPrioTasks<0>>/2048 +0.1596 +0.9140 1704819 1976899 616419 1179852 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<8>, NumHighPrioTasks<0>>/4096 -0.1018 -0.2316 3793976 3407609 1912209 1469331 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<32>, NumHighPrioTasks<0>>/256 +0.0395 +0.5818 30102662 31292982 174650 276270 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<32>, NumHighPrioTasks<0>>/512 -0.0065 +1.2860 33079634 32863968 162150 370680 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<32>, NumHighPrioTasks<0>>/1024 -0.0325 +0.4683 36581740 35392385 282320 414520 OVERALL_GEOMEAN -0.0084 +0.2878 0 0 0 0 ``` --------- Co-authored-by: Louis Dionne <ldionne.2@gmail.com>

…1033) [CommandLine.cpp](https://github.com/llvm/llvm-project/blob/fb0400fe1f1f9e83f3148db8ce2c72ab5bc6728e/llvm/lib/Support/CommandLine.cpp#L940) treats single quote as literal characters on Windows, so the argument is parsed as a check named `' -*,llvm-namespace-comment '`, which matches no existing checks, so no checks are enabled via the command line. Previously, the test passed because it fell back to the root `.clang-tidy` configuration which enables `llvm-*`.

This patch adds Clang support for speculative devirtualization and integrates the related pass into the pass pipeline. It's building on the LLVM backend implementation from PR llvm#159048. Speculative devirtualization transforms an indirect call (the virtual function) to a guarded direct call. It is guarded by a comparison of the virtual function pointer to the expected target. This optimization is still safe without LTO because it doesn't do direct calls, it's conditional according to the function ptr. This optimization: - Opt-in: Disabled by default, enabled via `-fdevirtualize-speculatively` - Works in non-LTO mode - Handles publicly-visible objects. - Uses guarded devirtualization with fallback to indirect calls when the speculation is incorrect. For this C++ example: ``` class Base { public: __attribute__((noinline)) virtual void virtual_function1() { asm volatile("NOP"); } virtual void virtual_function2() { asm volatile("NOP"); } }; class Derived : public Base { public: void virtual_function2() override { asm volatile("NOP"); } }; __attribute__((noinline)) void foo(Base *BV) { BV->virtual_function1(); } void bar() { Base *b = new Derived(); foo(b); } ``` Here is the IR without enabling speculative devirtualization: ``` define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 { entry: %vtable = load ptr, ptr %BV, align 8, !tbaa !6 %0 = load ptr, ptr %vtable, align 8 tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV) ret void } ``` IR after enabling speculative devirtualization: ``` define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 { entry: %vtable = load ptr, ptr %BV, align 8, !tbaa !12 %0 = load ptr, ptr %vtable, align 8 %1 = icmp eq ptr %0, @_ZN4Base17virtual_function1Ev br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !15 if.true.direct_targ: ; preds = %entry tail call void @_ZN4Base17virtual_function1Ev(ptr noundef nonnull align 8 dereferenceable(8) %BV) br label %if.end.icp if.false.orig_indirect: ; preds = %entry tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV) br label %if.end.icp if.end.icp: ; preds = %if.false.orig_indirect, %if.true.direct_targ ret void } ```

…art) (llvm#164124) Replace ExtractLastElement and ExtractLastLanePerPart with more generic and specific ExtractLastLane and ExtractLastPart, which model distinct parts of extracting across parts and lanes. ExtractLastElement == ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart == ExtractLastLane, the latter clarifying the name of the opcode. A new m_ExtractLastElement matcher is provided for convenience. The patch should be NFC modulo printing changes. PR: llvm#164124

This PR enables the MLIR execution engine to dump object file as PIC code, which is needed when the object file is later bundled into a dynamic shared library. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>

At the moment AMDGCN flavoured SPIRV uses the SPIRV ABI with some tweaks revolving around passing aggregates as direct. This is problematic in multiple ways: - it leads to divergence from code compiled for a concrete target, which makes it difficult to debug; - it incurs a run time cost, when dealing with larger aggregates; - it incurs a compile time cost, when dealing with larger aggregates. This patch switches over AMDGCN flavoured SPIRV to implement the AMDGPU ABI (except for dealing with variadic functions, which will be added in the future). One additional complication (and the primary motivation behind the current less than ideal state of affairs) stems from `byref`, which AMDGPU uses, not being expressible in SPIR-V. We deal with this by CodeGen-ing for `byref`, lowering it to the `FuncParamAttr ByVal` in SPIR-V, and restoring it when doing reverse translation from AMDGCN flavoured SPIR-V.

llvm-project/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp:312:19: warning: unused variable 'EB' [-Wunused-variable] 312 | VPBasicBlock *EB = Plan.getExitBlocks().front(); | ^~ This showed up in a non-assertions build.

Add support for vectorized operations such as `arith.addf ... : vector<4xf4E2M1FN>`. The computation is scalarized: scalar operands are extracted with `vector.to_elements`, multiple scalar computations are performed and the result is inserted back into a vector with `vector.from_elements`.

z1-cciauto · 2025-12-07T20:07:27Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3167

fzou1 and others added 14 commits December 7, 2025 13:26

[X86][APX] Compress setzucc with memory operand to setcc (llvm#170842)

0dff5b5

setzucc with memory operand is same as setcc but the later is shorter.

[mlir][arith] arith-to-apfloat: Bail on unsupported bitwidth (llvm#…

bdb918e

…170994) Bitwidths greater than 64 are not supported by `arith-to-apfloat`.

[mlir][emitc] Fix bug in dereference translation (llvm#171028)

fb0400f

The op was not added to `hasDeferredEmission()` when introduced by f17abc2, causing incorrect translation.

[MLIR][ExecutionEngine] Enable PIC option (llvm#170995)

11fd760

This PR enables the MLIR execution engine to dump object file as PIC code, which is needed when the object file is later bundled into a dynamic shared library. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>

[VPlan] Fix unused variable warning

7bfdaa5

llvm-project/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp:312:19: warning: unused variable 'EB' [-Wunused-variable] 312 | VPBasicBlock *EB = Plan.getExitBlocks().front(); | ^~ This showed up in a non-assertions build.

merge main into amd-staging

879bde0

z1-cciauto requested review from kuhar and stellaraccident as code owners December 7, 2025 20:06

z1-cciauto requested a review from a team December 7, 2025 20:06

ronlieb merged commit bf18dc9 into amd-staging Dec 8, 2025
17 of 18 checks passed

ronlieb deleted the upstream_merge_202512071506 branch December 8, 2025 01:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #784

merge main into amd-staging #784

Uh oh!

z1-cciauto commented Dec 7, 2025

Uh oh!

z1-cciauto commented Dec 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

merge main into amd-staging #784

merge main into amd-staging #784

Uh oh!

Conversation

z1-cciauto commented Dec 7, 2025

Uh oh!

z1-cciauto commented Dec 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants