Skip to content

Conversation

@z1-cciauto
Copy link
Collaborator

No description provided.

fzou1 and others added 14 commits December 7, 2025 13:26
setzucc with memory operand is same as setcc but the later is shorter.
This was marked as deprecated in 2022, but as comment. Switch to error
to make visible and stop generating. Will remove the error message in
follow up, just felt this was easier for folks to understand compilation
errors. The change required to new form is rather minimal.
…170994)

Bitwidths greater than 64 are not supported by `arith-to-apfloat`.
This change makes inlining logic in the translator simpler and more
consistent by

(a) Extending the inlining concept to include CExpression ops, which by
    definition are inlined if and only if they reside within an
    ExpressionOp.

(b) Concentraing all inlining decisions in `shouldBeInlined()` to make
    sure that ops get the same decision when queried as operations and
    as operands.
The op was not added to `hasDeferredEmission()` when introduced by
f17abc2, causing incorrect translation.
…k_wait (llvm#161086)

This is to address llvm#146145

The issue before was that, for `std::atomic::wait/notify`, we only
support `uint64_t` to go through the native `ulock_wait` directly. Any
other types will go through the global contention table's `atomic`,
increasing the chances of spurious wakeup. This PR tries to allow any
types that are of size 4 or 8 to directly go to the `ulock_wait`.

This PR is just proof of concept. If we like this idea, I can go further
to update the Linux/FreeBSD branch and add ABI macros so the existing
behaviours are reserved under the stable ABI

Here are some benchmark results

```
Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
BM_stop_token_single_thread_reg_unreg_callback/1024                  -0.1113         -0.1165         51519         45785         51397         45408
BM_stop_token_single_thread_reg_unreg_callback/4096                  -0.2727         -0.1447        249685        181608        211865        181203
BM_stop_token_single_thread_reg_unreg_callback/65536                 -0.1241         -0.1237       3308930       2898396       3300986       2892608
BM_stop_token_single_thread_reg_unreg_callback/262144                +0.0335         -0.1920      13237682      13681632      13208849      10673254
OVERALL_GEOMEAN                                                      -0.1254         -0.1447             0             0             0             0
```

```
Benchmark                                                                                    Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/65536                 -0.3344         -0.2424       5960741       3967212       5232250       3964085
BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/131072                -0.1474         -0.1475       9144356       7796745       9137547       7790193
BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/262144                -0.1336         -0.1340      18333441      15883805      18323711      15868500
OVERALL_GEOMEAN                                                                           -0.2107         -0.1761             0             0             0             0
```

```
Benchmark                                                                                                             Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<2>, NumHighPrioTasks<0>>/16384                +0.2321         -0.0081        836618       1030772        833197        826476
BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<2>, NumHighPrioTasks<0>>/32768                -0.3034         -0.1329       2182721       1520569       1747211       1515028
BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<2>, NumHighPrioTasks<0>>/65536                -0.0924         -0.0924       3389098       3075897       3378486       3066448
BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<8>, NumHighPrioTasks<0>>/4096                 +0.0464         +0.0474        664233        695080        657736        688892
BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<8>, NumHighPrioTasks<0>>/8192                 -0.0279         -0.0267       1336041       1298794       1324270       1288953
BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<8>, NumHighPrioTasks<0>>/16384                +0.0270         +0.0304       2543004       2611786       2517471       2593975
BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<32>, NumHighPrioTasks<0>>/1024                +0.0423         +0.0941        473621        493657        325604        356245
BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<32>, NumHighPrioTasks<0>>/2048                +0.0420         +0.0675        906266        944349        636253        679169
BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<32>, NumHighPrioTasks<0>>/4096                +0.0359         +0.0378       1761584       1824783       1015092       1053439
OVERALL_GEOMEAN                                                                                                    -0.0097         -0.0007             0             0             0             0
```

```
Benchmark                                                                                                        Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/4096                 -0.0990         -0.1001        371100        334370        369984        332955
BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/8192                 -0.0305         -0.0314        698228        676908        696418        674585
BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/16384                -0.0258         -0.0268       1383530       1347894       1380665       1343680
BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<8>, NumHighPrioTasks<0>>/1024                 +0.0465         +0.4702        937821        981388        472087        694082
BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<8>, NumHighPrioTasks<0>>/2048                 +0.1596         +0.9140       1704819       1976899        616419       1179852
BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<8>, NumHighPrioTasks<0>>/4096                 -0.1018         -0.2316       3793976       3407609       1912209       1469331
BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<32>, NumHighPrioTasks<0>>/256                 +0.0395         +0.5818      30102662      31292982        174650        276270
BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<32>, NumHighPrioTasks<0>>/512                 -0.0065         +1.2860      33079634      32863968        162150        370680
BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<32>, NumHighPrioTasks<0>>/1024                -0.0325         +0.4683      36581740      35392385        282320        414520
OVERALL_GEOMEAN                                                                                               -0.0084         +0.2878             0             0             0             0
```

---------

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
…1033)

[CommandLine.cpp](https://github.com/llvm/llvm-project/blob/fb0400fe1f1f9e83f3148db8ce2c72ab5bc6728e/llvm/lib/Support/CommandLine.cpp#L940)
treats single quote as literal characters on Windows, so the argument is
parsed as a check named `' -*,llvm-namespace-comment '`, which matches
no existing checks, so no checks are enabled via the command line.

Previously, the test passed because it fell back to the root
`.clang-tidy` configuration which enables `llvm-*`.
This patch adds Clang support for speculative devirtualization and
integrates the related pass into the pass pipeline.
It's building on the LLVM backend implementation from PR llvm#159048.
Speculative devirtualization transforms an indirect call (the virtual
function) to a guarded direct call.
It is guarded by a comparison of the virtual function pointer to the
expected target.
This optimization is still safe without LTO because it doesn't do direct
calls, it's conditional according to the function ptr.
This optimization:
- Opt-in: Disabled by default, enabled via `-fdevirtualize-speculatively`
- Works in non-LTO mode
- Handles publicly-visible objects.
- Uses guarded devirtualization with fallback to indirect calls when the
speculation is incorrect.

For this C++ example:
```
class Base {
public:
    __attribute__((noinline))
    virtual void virtual_function1() { asm volatile("NOP"); }
    virtual void virtual_function2() { asm volatile("NOP"); }
};
class Derived : public Base {
public:
    void virtual_function2() override { asm volatile("NOP"); }
};
__attribute__((noinline))
void foo(Base *BV) {
    BV->virtual_function1();
}
void bar() {
    Base *b = new Derived();
    foo(b);
}
```
Here is the IR without enabling speculative devirtualization:
```
define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 {
entry:
  %vtable = load ptr, ptr %BV, align 8, !tbaa !6
  %0 = load ptr, ptr %vtable, align 8
  tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
  ret void
}
```
IR after enabling speculative devirtualization:
```
define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 {
entry:
  %vtable = load ptr, ptr %BV, align 8, !tbaa !12
  %0 = load ptr, ptr %vtable, align 8
  %1 = icmp eq ptr %0, @_ZN4Base17virtual_function1Ev
  br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !15

if.true.direct_targ:                              ; preds = %entry
  tail call void @_ZN4Base17virtual_function1Ev(ptr noundef nonnull align 8 dereferenceable(8) %BV)
  br label %if.end.icp

if.false.orig_indirect:                           ; preds = %entry
  tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
  br label %if.end.icp

if.end.icp:                                       ; preds = %if.false.orig_indirect, %if.true.direct_targ
  ret void
}
```
…art) (llvm#164124)

Replace ExtractLastElement and ExtractLastLanePerPart with more generic
and specific ExtractLastLane and ExtractLastPart, which model distinct
parts of extracting across parts and lanes. ExtractLastElement ==
ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart ==
ExtractLastLane, the latter clarifying the name of the opcode. A new
m_ExtractLastElement matcher is provided for convenience.

The patch should be NFC modulo printing changes.

PR: llvm#164124
This PR enables the MLIR execution engine to dump object file as PIC
code, which is needed when the object file is later bundled into a dynamic
shared library.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
At the moment AMDGCN flavoured SPIRV uses the SPIRV ABI with some tweaks
revolving around passing aggregates as direct. This is problematic in
multiple ways:

- it leads to divergence from code compiled for a concrete target, which
makes it difficult to debug;
- it incurs a run time cost, when dealing with larger aggregates;
- it incurs a compile time cost, when dealing with larger aggregates.

This patch switches over AMDGCN flavoured SPIRV to implement the AMDGPU
ABI (except for dealing with variadic functions, which will be added in
the future). One additional complication (and the primary motivation
behind the current less than ideal state of affairs) stems from `byref`,
which AMDGPU uses, not being expressible in SPIR-V. We deal with this by
CodeGen-ing for `byref`, lowering it to the `FuncParamAttr ByVal` in
SPIR-V, and restoring it when doing reverse translation from AMDGCN
flavoured SPIR-V.
llvm-project/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp:312:19: warning: unused variable 'EB' [-Wunused-variable]
  312 |     VPBasicBlock *EB = Plan.getExitBlocks().front();
      |                   ^~

This showed up in a non-assertions build.
Add support for vectorized operations such as `arith.addf ... :
vector<4xf4E2M1FN>`. The computation is scalarized: scalar operands are
extracted with `vector.to_elements`, multiple scalar computations are
performed and the result is inserted back into a vector with
`vector.from_elements`.
@z1-cciauto z1-cciauto requested a review from a team December 7, 2025 20:06
@z1-cciauto
Copy link
Collaborator Author

@ronlieb ronlieb merged commit bf18dc9 into amd-staging Dec 8, 2025
17 of 18 checks passed
@ronlieb ronlieb deleted the upstream_merge_202512071506 branch December 8, 2025 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.