Skip to content

[MTE] [NFC] use vector to collect globals to tag (#120283) #142330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jofrn
Copy link
Contributor

@jofrn jofrn commented Jun 2, 2025

[MTE] [NFC] use vector to collect globals to tag (#120283)

The same pattern caused test failures in the HWASan pass, so is brittle.
Let's go for the easier approach.

[DOCS] Rename LLVM Security Group to LLVM Security Response Group. (#116986)

Rename LLVM Security Group to LLVM Security Response Group. Take the
opportunity to canonicalise security group and Security Group to LLVM
Security Response Group.

At the 2024-11-19 LLVM Security Group meeting [1] we discussed that in
practice the LLVM Security Group was performing an incident response
role, but it was not proactively adding additional testing, fuzzing and
hardening. We do not want projects that use LLVM to see the LLVM
Security Group as guaranteeing security for LLVM.

We decided that it would be useful to rename the group to LLVM Security
Response Group as that reflects the work that it is doing.

There may be a case for a proactive security group with a different
remit, but this is out of scope of this commit.

[1]

https://discourse.llvm.org/t/llvm-security-group-public-sync-ups/62735/32

[DOCS] Remove bullet point on improving security over time. (#116980)

Remove the 6th bullet point "Strive to improve security over time, for
example by adding additional testing, fuzzing and hardening after fixing
issues."

At the security group meeting on 2024-11-19 we discussed the role the
security group was performing in practice. We are in effect acting as a
security response group, dealing with issues raised via the process
given in the LLVM Security group page. We are not proactively adding
additional testing fuzzing and hardening. While this could be considered
an aspirational goal, it may give the implication that the LLVM Security
Group is handling or at worst guaranteeing security for the LLVM project
when in practice it is not.

Meeting notes:

https://discourse.llvm.org/t/llvm-security-group-public-sync-ups/62735/32

[Github] Add LLVM Premerge Checks to the watchlist (#120230)

LLVM Premerge Checks is running on the new GCP cluster. Tracking its
metrics will allow us to determine the stability of the presubmit and
make sure the new infra is working as intended.


Signed-off-by: Nathan Gauër brioche@google.com

[SPIR-V] Fix issue #120078 and simplifies parsing of floating point decoration tips in demangled function name (#120128)

This PR fixes #120078 and
improves/simplifies parsing of demangled function name that aims to
detect a tip for floating point decorations. The latter improvement
fixes also a complaint from LLVM_USE_SANITIZER=Address.

[AArch64] Prevent unnecessary truncation in bool vector reduce code generation (#120096)

Prevent unnecessarily truncating results of 128 bit wide vector
comparisons to 64 bit wide vector values in boolean vector reduce
operations.

[LoopVectorize] Enable more early exit vectorisation tests (#117008)

PR #112138 introduced initial support for dispatching to
multiple exit blocks via split middle blocks. This patch
fixes a few issues so that we can enable more tests to use
the new enable-early-exit-vectorization flag. Fixes are:

  1. The code to bail out for any loop live-out values happens
    too late. This is because collectUsersInExitBlocks ignores
    induction variables, which get dealt with in fixupIVUsers.
    I've moved the check much earlier in processLoop by looking
    for outside users of loop-defined values.
  2. We shouldn't yet be interleaving when vectorising loops
    with uncountable early exits, since we've not added support
    for this yet.
  3. Similarly, we also shouldn't be creating vector epilogues.
  4. Similarly, we shouldn't enable tail-folding.
  5. The existing implementation doesn't yet support loops
    that require scalar epilogues, although I plan to add that
    as part of PR [LoopVectorize] Add support for vectorisation of more early exit loops #88385.
  6. The new split middle blocks weren't being added to the
    parent loop.

[flang][HLFIR] fix FORALL issue 120190 (#120236)

Fix #120190.

The hlfir.forall lowering code was not properly checking for forall
index reference in mask value computation before trying to hoist it: it
was only looking at the ops directly nested in the hlfir.forall_mask
region, but not the operation indirectly nested. This caused triggered
bogus hoisting in #120190 leading to undefined behavior (reference to
uinitialized data). The added regression test would die at compile time
with a dominance error.

Fix this by doing a deep walk of the region operation instead. Also
clean-up the region cloning to use without_terminator.

[llvm][RISCV] Set ScalableVector stack id in proper place (#117862)

Without this patch ScalableVector frame index property is used before
assignment. More precisely, let's take a look at
RISCVFrameLowering::assignCalleeSavedSpillSlots. In this function we
divide callee saved registers on scalar and vector ones, based on
ScalableVector property of their frame indexes:

  ...
  const auto &UnmanagedCSI = getUnmanagedCSI(*MF, CSI);
  const auto &RVVCSI = getRVVCalleeSavedInfo(*MF, CSI);
  ...

But we assign ScalableVector property several lines below:

  ...
  auto storeRegToStackSlot = [&](decltype(UnmanagedCSI) CSInfo) {
    for (auto &CS : CSInfo) {
      // Insert the spill to the stack frame.
      Register Reg = CS.getReg();
      const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);
      TII.storeRegToStackSlot(MBB, MI, Reg, !MBB.isLiveIn(Reg),
                              CS.getFrameIdx(), RC, TRI, Register());
    }
  };
  storeRegToStackSlot(UnmanagedCSI);
  ...

Due to it, list of RVV callee saved registers will always be empty.
Currently this problem doesn't appear, but if you slightly change the
code and, for example, put some instructions between scalar and vector
spills, the resulting code will be ill formed.

[LV] Fixup check lines after 13107cb.

[lldb][NFC] clang-format MainLoopPosix.cpp

Since AIX support is about to change this.

[Clang] Implement CWG2813: Class member access with prvalues (#120223)

This is a rebase of #95112 with my own feedback apply as @MitalAshok has
been inactive for a while.
It's fairly important this makes clang 20 as it is a blocker for #107451


CWG2813

prvalue.member_fn(expression-list) now will not materialize a temporary
for prvalue if member_fn is an explicit object member function, and
prvalue will bind directly to the object parameter.

The E1 in E1.static_member is now a discarded-value expression, so if E1
was a call to a [[nodiscard]] function, there will now be a warning.
This also affects C++98 with [[gnu::warn_unused_result]] functions.

This should not affect C where TemporaryMaterializationConversion is a
no-op.

Closes #100314
Fixes #100341


Co-authored-by: Mital Ashok mital@mitalashok.co.uk

[lldb] Add lldb/source/Host/posix/MainLoopPosix.cpp to git blame ignores

[VFABI] Add support for vector functions that return struct types (#119000)

This patch updates the VFABIDemangler to support vector functions that
return struct types. For example, a vector variant of sincos that
returns a vector of sine values and a vector of cosine values within a
struct.

This patch also adds some helpers for vectorizing types (including
struct types). Some of these are used in the VFABIDemangler, and
others will be used in subsequent patches, so this patch simply adds
tests for them.

[X86] combineKSHIFT - fold kshiftr(kshiftr/extract_subvector(X,C1),C2) --> kshiftr(X,C1+C2) (#115528)

Merge serial KSHIFTR nodes, possibly separated by EXTRACT_SUBVECTOR, to allow mask instructions to be computed in parallel.

[gn build] Port 1ee740a

[github/CODEOWNERS] Add yota9 as BOLT reviewer

[ARM] Reduce loop unroll when low overhead branching is available (#120065)

For processors with low overhead branching (LOB), runtime unrolling the
innermost loop is often detrimental to performance. In these cases the
loop remainder gets unrolled into a series of compare-and-jump blocks,
which in deeply nested loops get executed multiple times, negating the
benefits of LOB.

This is particularly noticable when the loop trip count of the innermost
loop varies within the outer loop, such as in the case of triangular
matrix decompositions.

In these cases we will prefer to not unroll the innermost loop, with the
intention for it to be executed as a low overhead loop.

Add support for single reductions in ComplexDeinterleavingPass (#112875)

The Complex Deinterleaving pass assumes that all values emitted will
result in complex numbers, this patch aims to remove that assumption and
adds support for emitting just the real or imaginary components, not
both.

Reland [Clang] skip default argument instantiation for non-defining friend declarations to meet [dcl.fct.default] p4 (#115487)

This fixes a crash when instantiating default arguments for templated
friend function declarations which lack a definition.
There are implementation limits which prevents us from finding the
pattern for such functions, and this causes difficulties
setting up the instantiation scope for the function parameters.

This patch skips instantiating the default argument in these cases,
which causes a minor regression in error recovery, but otherwise avoids
the crash.

The previous attempt #113777 accidentally skipped all default argument
constructions, causing some regressions. This patch resolves that by
moving the guard to InstantiateDefaultArgument() where the handling of
templates takes place.

Fixes #113324

[AMDGPU] Modify Dyn Alloca test to account for Machine-Verifier bug (#120393)

Machine-Verifier crashes in kernel functions,
but fails gracefully in device functions.

This is due to the buffer resource descriptor selected
during G-ISEL, before the fallback path.
Device functions use $sgpr0_sgpr1_sgpr2_sgpr3.
while Kernel functions select $private_rsrc_reg
where machine-verifier complains:
$private_rsrc_reg is not a SReg_128 register.

Modifying test case to capture both behaviors, this is related to
#120063

[clang-tidy] use local config (#120004)

follow up patch for #119948.

[NVPTX] fix nvcl-param-align.ll

fix for f9c8c01

SourceCoverageViewHTML.cpp: Reformat JS

Introduce CounterMappingRegion::isBranch(). NFC.

llvm-cov: Refactor SourceCoverageView::renderBranchView().

NFC except for calculating Total. I've replaced
(uint64_t)+(uint64_t) with (double)+(double).

This is still inexact with large numbers (1LL << 53) but will be expected to prevent possible overflow.

[SCEV] Bail out on mixed int/pointer in SCEVWrapPredicate::implies.

Fixes a crash when trying to extend the pointer start value to a narrow
integer type after b6c29fd.

LLVMContext: rem constexpr to unblock build w/ gcc (#120402)

Address issues observed in buildbots with older GCC versions:
https://lab.llvm.org/buildbot/#/builders/140/builds/13302

[X86] LowerShift - track the number and location of constant shift elements. (#120270)

We have several vector shift lowering strategies that have to analyse
the distribution of non-uniform constant vector shift amounts, at the
moment there is very little sharing of data between these analysis.

This patch creates a SmallDenseMap of the different LEGAL constant shift
amounts used, with a mask of which elements they are used in. So far
I've only updated the shuffle(immshift(x,c1),immshift(x,c2)) lowering
pattern to use it for clarity, there's several more that can be done in
followups. Its hoped that the proposed patch #117980 can be simplified
after this patch as well.

vec_shift6.ll - the existing shuffle(immshift(x,c1),immshift(x,c2))
lowering bails on out of range shift amounts, while this patch now skips
them and treats them as UNDEF - this means we manage to fold more cases
that before would have to lower to a SHL->MUL pattern, including some
legalized cases.

[TableGen][GISel] Import more "multi-level" patterns (#120332)

Previously, if the destination DAG has an untyped leaf, we would import
the pattern only if that leaf is defined by the top-level source DAG.
This is an unnecessary restriction.

Here is an example of such pattern:

def : Pat<(add (mul v8i16:$vA, v8i16:$vB), v8i16:$vC),
          (VMLADDUHM $vA, $vB, $vC)>;

Previously, it failed to import because add doesn't define neither
$vA nor $vB.

This change reduces the number of skipped patterns as follows:

AArch64: 8695 ->  8548 (-147)
AMDGPU: 11333 -> 11240 (-93)
ARM:     4297 ->  4278 (-1)
PowerPC: 3955 ->  3010 (-945)

Other GISel-enabled targets are unaffected.

[LLVM][AsmPrinter] Add vector ConstantInt/FP support to emitGlobalConstantImpl. (#120077)

The fixes a failure path for fixed length vector globals when
ConstantInt/FP is used to represent splats instead of
ConstantDataVector.

[Exegesis][RISCV] Add RISCV support for llvm-exegesis (#89047)

This patch also makes following amendments to core exegesis:

  • Added distinction between regular registers aliasing check and
    registers used as memory address in instruction.
  • Added scratch memory space pointer register.
  • General exegesis options were amended:
    * mattr - new option to pass a list of enabled target features

Llvm-exegesis RISCV port is a result of team effort. Below everyone
involved listed.
Co-authored-by: Konstantin Vladimirov
konstantin.vladimirov@syntacore.com
Co-authored-by: Dmitrii Petrov dmitrii.petrov@syntacore.com
Co-authored-by: Dmitry Bushev dmitry.bushev@syntacore.com
Co-authored-by: Mark Goncharov mark.goncharov@syntacore.com
Co-authored-by: Anastasiya Chernikova
anastasiya.chernikova@syntacore.com


Co-authored-by: Dmitry Bushev dmitry.bushev@syntacore.com

[X86] urem-seteq-illegal-types.ll - regenerate VPTERNLOG comment

Fix unused variable warning. NFC.

Revert "[Exegesis][RISCV] Add RISCV support for llvm-exegesis (#89047)"

This reverts commit bc3eee1.

These tests are failing because of no REQUIRES.

[Xtensa] Implement Code Density Option. (#119639)

The Code Density option adds 16-bit encoding for frequently used
instructions.

[InstCombine] Drop samesign flags in foldLogOpOfMaskedICmps_NotAllZeros_BMask_Mixed (#120373)

Counterexamples: https://alive2.llvm.org/ce/z/6Ks8Qz
Closes #120361.

[lldb][AIX] Header Parsing for XCOFF Object File in AIX (#116338)

This PR is in reference to porting LLDB on AIX.

Link to discussions on llvm discourse and github:

  1. https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640
  2. Extending LLDB to work on AIX #101657
    The complete changes for porting are present in this draft PR:
    Extending LLDB to work on AIX #102601

Added XCOFF Object File Header Parsing for AIX.

Details about XCOFF file format on AIX:
XCOFF

Reapply "[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas" (#120410)

This reapplies commit #120063.

A machine-verifier bug was causing a crash in the previous commit.
This has been addressed in
#120393.

[AMDGPU] Use -triple instead of -arch in MC tests

[Python] Use raw string literals for regexes (#120401)

Previously these backslashes were not followed by a valid escape
sequence character so were treated as literal backslashes, which was the
intended behaviour of the code. However python as of 3.12 has started
warning about these, so we should use raw string literals for regexes so
that backslashes are always interpreted literally. I've done this for
every regex in this file for consistency, including the ones which do
not contain backslashes.

[mlir][SCF] Unify tileUsingFor and tileReductionUsingFor implementation (#120115)

This patch unifies the tiling implementation for tileUsingFor and
tileReductionUsingFor. This is done by passing an addition option to
SCFTilingOptions, allowing it to set how reduction dimensions should be
tiled. Currently, there are 3 different options for reduction tiling:
FullReduction (old tileUsingFor), PartialReductionOuterReduction (old
tileReductionUsingFor) and PartialReductionOuterParallel
(linalg::tileReductionUsingForall, this isn't implemented in this
patch).

The patch makes tileReductionUsingFor use the tileUsingFor
implementation with the new reduction tiling options.

There are no test changes because the implementation was doing almost
the exactly same thing. This was also tested in IREE (which uses both
these APIs heavily) and there were no test changes.

Revert "[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different" (#120422)

Reverts #115209 - investigating a reported regression

[VPlan] Handle exit phis with multiple operands in addUsersInExitBlocks. (#120260)

Currently the addUsersInExitBlocks incorrectly assumes exit phis only
have a single operand, which may not be the case for loops with early
exits when they share a common exit block.

Also further relax the assertion in fixupIVUsers to allow exit values if
they come from theloop latch/middle.block.

PR: #120260

[OpenMP][Clang] Migrate OpenMP UserDefinedMapper from Clang to OMPIRBuilder (#110001)

This patch migrates the OpenMP UserDefinedMapper codegen from Clang to
the OpenMPIRBuilder. I will be adding further patches in the near future
so that OpenMP dialect in MLIR can make use of these.

[flang] Add UNSIGNED (#113504)

Implement the UNSIGNED extension type and operations under control of a
language feature flag (-funsigned).

This is nearly identical to the UNSIGNED feature that has been available
in Sun Fortran for years, and now implemented in GNU Fortran for
gfortran 15, and proposed for ISO standardization in J3/24-116.txt.

See the new documentation for details; but in short, this is C's
unsigned type, with guaranteed modular arithmetic for +, -, and *, and
the related transformational intrinsic functions SUM & al.

Revert "Add support for single reductions in ComplexDeinterleavingPass (#112875)"

This reverts commit b3eede5.

This has been breaking most AArch64 stage2 builds for 4+ hours,
reverting to get the bots back to green.

https://lab.llvm.org/buildbot/#/builders/41/builds/4172
https://lab.llvm.org/buildbot/#/builders/4/builds/4281
https://lab.llvm.org/buildbot/#/builders/199/builds/263
https://lab.llvm.org/buildbot/#/builders/198/builds/334
https://lab.llvm.org/buildbot/#/builders/143/builds/4276
https://lab.llvm.org/buildbot/#/builders/17/builds/4725

[Exegesis][RISCV] Add RISCV support for llvm-exegesis (#120419)

This patch also makes following amendments to core exegesis:

  • Added distinction between regular registers aliasing check and
    registers used as memory address in instruction.
  • Added scratch memory space pointer register.
  • General exegesis options were amended:
    * mattr - new option to pass a list of enabled target features

Llvm-exegesis RISCV port is a result of team effort. Below everyone
involved listed.
Co-authored-by: Konstantin Vladimirov
konstantin.vladimirov@syntacore.com
Co-authored-by: Dmitrii Petrov dmitrii.petrov@syntacore.com
Co-authored-by: Dmitry Bushev dmitry.bushev@syntacore.com
Co-authored-by: Mark Goncharov mark.goncharov@syntacore.com
Co-authored-by: Anastasiya Chernikova
anastasiya.chernikova@syntacore.com


Co-authored-by: Anastasiya Chernikova anastasiya.chernikova@syntacore.com

Fix #110001 build error.

[TableGen][GISel] Improve dead register handling (#120426)

A dead implicit def wasn't marked as dead if it is also an implicit use.
The new approach should also be more straightforward and simplifies
future changes for supporting optional defs and physical register defs.

Pull Request: #120426

[DirectX] Split resource info into type and binding info. NFC (#119773)

This splits the DXILResourceAnalysis pass into TypeAnalysis and
BindingAnalysis passes. The type analysis pass is made immutable and
populated lazily so that it can be used earlier in the pipeline without
needing to carefully maintain the invariants of the binding analysis.

Fixes #118400

[X86] LowerShift - don't prematurely lower to x86 vector shift imm instructions (#120282)

When splitting 2 unique amount shifts to shuffle(shift(x,c1),shift(x,c2)), don't use getTargetVShiftByConstNode directly to lower, use generic shifts to ensure we make use of any further canonicalization: shl(X,1) to add(X,X) etc. - this can have notably better throughput on some x86 targets.

Noticed on #120270

[Clang] Set __cpp_explicit_this_parameter (#107451)

There are not a lot of outstanding known issues
with deducing this (besides #95112), so it
seems reasonable to claim full support.

Fixes #82780

[clang-doc] Use LangOpts when printing types (#120308)

The implementation in the clang-doc serializer failed to take in the
LangOpts from the declaration. As a result, we'd do things like print
_Bool instead of bool, even in C++ code.

Fixes #62970

Reland 2de7881 (#119650) (#120454)

[NFC] Move DroppedVariableStats to its own file and redesign it to be
extensible. (#115563)

Move DroppedVariableStats code to its own file and change the class to
have an extensible design so that we can use it to add dropped
statistics to MIR passes and the instruction selector.

[libc][docs] convert stdio.h to docgen (#120334)

Add info from n3220 and POSIX.1-2024.

[flang][NFC] static assert intrinsic table is sorted (#120399)

This invariant is used below when searching for intrinsic
implementation. Currently, if the map is not sorted, the compiler will
just silently assume there is no such implementation.

[DirectX] Introduce the DXILResourceAccess pass (#116726)

This pass transforms resource access via llvm.dx.resource.getpointer
into buffer loads and stores.

Fixes #114848.

[lld] Move BPSectionOrderer from MachO to Common for reuse in ELF (#117514)

Add lld/Common/BPSectionOrdererBase from MachO for reuse in ELF

[DirectX] Create symbols for resource handles (#119775)

We need to create symbols with "the original shape of resource and
element type" to put in the resource metadata in order to generate valid
DXIL.

Note that DXC generally doesn't emit an actual symbol outside of library
shaders (it emits an undef of a pointer to the type), but since we have
to deal with opaque pointers we would need a way to smuggle the type
through to match that. Instead, we simply emit symbols for now.

Fixed #116849

Revert "[Exegesis][RISCV] Add RISCV support for llvm-exegesis (#120419)"

This reverts commit 6993d32.

Reason: buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/51/builds/7908)

CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/tools/llvm-exegesis/lib/RISCV -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/tools/llvm-exegesis/lib/RISCV -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/RISCV -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/RISCV -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/llvm-exegesis/lib/RISCV/CMakeFiles/LLVMExegesisRISCV.dir/Target.cpp.o -MF tools/llvm-exegesis/lib/RISCV/CMakeFiles/LLVMExegesisRISCV.dir/Target.cpp.o.d -o tools/llvm-exegesis/lib/RISCV/CMakeFiles/LLVMExegesisRISCV.dir/Target.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/tools/llvm-exegesis/lib/RISCV/Target.cpp
In file included from /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/tools/llvm-exegesis/lib/RISCV/Target.cpp:139:
/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/RISCV/RISCVGenAsmMatcher.inc:239:19: error: unused function 'MatchRegisterName' [-Werror,-Wunused-function]
239 | static MCRegister MatchRegisterName(StringRef Name) {
| ^~~~~~~~~~~~~~~~~
/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/RISCV/RISCVGenAsmMatcher.inc:568:19: error: unused function 'MatchRegisterAltName' [-Werror,-Wunused-function]
568 | static MCRegister MatchRegisterAltName(StringRef Name) {
| ^~~~~~~~~~~~~~~~~~~~

[AMDGPU][True16][MC] test update for v_ldexp_f16 in true16 (#119313)

This is a NFC change. Update mc test for v_ldexp_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo

[AMDGPU][True16][MC] test update for v_subrev_f16 in true16 (#119315)

This is a NFC change. Update mc test for v_subrev_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo

Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460)

Reverts #119225 due to the lack of sanitizer support,
large potential of breaking code containing latent UB, non-trivial
localization and investigation, and what seems to be a bad interaction
with msan (a test is in the works).

Related discussions:
#119225 (comment)
#118472 (comment)

[NFC] update gfx12 vop test to use sed instead of grep (#120458)

changes from #119778 breaks the
AIX clang ppc64 bot:
https://lab.llvm.org/buildbot/#/builders/64/builds/1714 as grep -o is
not supported on AIX and is not POSIX compatible as per:
https://www.unix.com/man-page/posix/1p/grep/

Co-authored-by: Mark Danial mark.danial@ibm.com

[PhaseOrdering] Update test for #120460

[AMDGPU][True16][MC] true16 for v_pack_b32_f16 (#119630)

Support true16 format for v_pack_b32_f16 in MC.

Since we are replacing v_alignbit_b32 to
v_pack_b32_f16_t16/v_pack_b32_f16_fake16 in Post-GFX11, have to update
the CodeGen pattern for v_pack_b32_f16_fake16 to get CodeGen test
passing. There is no pattern modified/created, but just replacing the
v_pack_b32_f16 with fake16 format.

Some of the true16 CodeGen test are impacted since v_pack_b32_f16
selection are removed in Post-GFX11 while v_pack_b32_f16_t16 are not
yet supported. The CodeGen patch for v_pack_b32_f16_t16 will be done
is the following patch.

[clang] Change initialization of a vector from undef to poison [NFC] (#120446)

It is fully initialized with insertelements.

[driver] Fix sanitizer libc++ runtime linking (#120370)

  1. -f[no-]sanitize-link-c++-runtime suppose to
    override defauld behavior implied from CCCIsCXX
  2. Take into account -nostdlib++ (unblocks [runtimes] Probe for -nostdlib++ and -nostdinc++ with the C compiler #108357)
  3. Fix typo hasFlag vs hasArg.

[gn build] Port 5717a99

[gn build] Port 79e859e

[AMDGPU][MC] Disallow op_sel in some VOP3P dot instructions (#100485)

In v_dot4 and v_dot8 instructions with 4- or 8-bit packed data (e.g.,
v_dot4_u32_u8, v_dot8_u32_u4), the op_sel modifier should not be
allowed.

[MemRef] Migrate away from PointerUnion::{is,get} (NFC) (#120382)

Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa, cast and the llvm::dyn_cast

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.

[memprof] Move Frame::hash and hashCallStack to IndexedMemProfData (NFC) (#120365)

Now that IndexedMemProfData::{addFrame,addCallStack} are the only
callers of Frame::hash and hashCallStack, respectively, this patch
moves those functions into IndexedMemProfData and makes them private.
With this patch, we can obtain FrameId and CallStackId only through
addFrame and addCallStack, respectively.

[DirectX] Lower ops after translating metadata (#120157)

Move the DXILOpLoweringPass after DXILTranslateMetadata, and add asserts
in DXILShaderFlags to ensure it isn't scheduled after op lowering. This
will allow us to rely on DirectX intrinsics in the shader flags analysis
rather than having to recover information from lowered operations.

Fixes #120119.

[mlir python] Port Python core code to nanobind. (#118583)

Why? https://nanobind.readthedocs.io/en/latest/why.html says it better
than I can, but my primary motivation for this change is to improve MLIR
IR construction time from JAX.

For a complicated Google-internal LLM model in JAX, this change improves
the MLIR
lowering time by around 5s (out of around 30s), which is a significant
speedup for simply switching binding frameworks.

To a large extent, this is a mechanical change, for instance changing
pybind11::
to nanobind::.

Notes:

  • this PR needs Nanobind 2.4.0, because it needs a bug fix
    (Support overriding static properties defined via def_prop_ro_static. wjakob/nanobind#806) that landed in that
    release.
  • this PR does not port the in-tree dialect extension modules. They can
    be ported in a future PR.
  • I removed the py::sibling() annotations from def_static and def_class
    in PybindAdapters.h. These ask pybind11 to try to form an overload
    with an existing method, but it's not possible to form mixed
    pybind11/nanobind overloads this ways and the parent class is now
    defined in nanobind. Better solutions may be possible here.
  • nanobind does not contain an exact equivalent of pybind11's buffer
    protocol support. It was not hard to add a nanobind implementation of a
    similar API.
  • nanobind is pickier about casting to std::vector, expecting that
    the input is a sequence of bool types, not truthy values. In a couple of
    places I added code to support truthy values during casting.
  • nanobind distinguishes bytes (nb::bytes) from strings (e.g.,
    std::string). This required nb::bytes overloads in a few places.

Revert "[mlir python] Port Python core code to nanobind. (#118583)"

This reverts commit 41bd35b.

Breakage detected, rolling back.

[flang] Don't needlessly instantiate distinct UNSIGNED cases for FINDLOC (#120471)

The FINDLOC runtime doesn't need to distinguish between INTEGER and
UNSIGNED data, so use the code for INTEGER also for UNSIGNED.

[flang][cuda] Using nvvm intrinsics for the syncthread and threadfence families of calls (#120020)

[VPlan] Don't use VPlan ctor taking trip count in most unit tests (NFC).

Update tests to use constructor not passing a trip count VPValue. The
tests don't need that and are simpler as a result.

[libc++] Remove some unused includes (#120219)

[DirectX] TypedUAVLoadAdditionalFormats shader flag (#120477)

Set the TypedUAVLoadAddtionalFormats flag if the shader contains a load
from a multicomponent UAV.

Fixes #114557

[clang-format] Don't change breaking before CtorInitializerColon (#119522)

Don't change breaking before CtorInitializerColon with ColumnLimit: 0.

Fixes #119519.

[clang-format] Fix a bug in annotating arrows after init braces (#119958)

Fixes #59066.

[MemProf] Skip unmatched callers when cloning (#120455)

Don't unnecessarily clone for a caller that wasn't matched to a call
instruction.

This necessitated updated a couple of tests that were either
unnecessarily cloning or unnecessarily processing an allocation and
hinting it not cold.

[MemProf] Add quotes around FileCheck pattern (#120481)

Some bots are failing with 2916352,
likely due to the escapes in the FileCheck pattern. Add extra quotes to
try to fix this.
E.g. https://lab.llvm.org/buildbot/#/builders/46/builds/9442

[llvm][Support] Use __NR_gettid on Linux for compat with older glibc (#120007)

[DirectX] Bug fix for Data Scalarization crash (#118426)

Two bugs here. First calling Inst->getFunction() has undefined
behavior if the instruction is not tracked to a function. I suspect the
replaceAllUsesWith was leaving the GEPs in a weird ghost parent
situation. I switched up the visitor to be able to eraseFromParent as
part of visiting and then everything started working.

The second bug was in DXILFlattenArrays.cpp. I was unaware that you
can have multidimensional arrays of zeroinitializer, and undef so
fixed up the initializer to handle these two cases.

fixes #117273

[mlir][bufferization]-Replace only one use in TensorEmptyElimination (#118958)

In many cases the emptyTensorElimination can not transform or eliminate
the empty tensor which is being inserted into the
SubsetInsertionOpInterface.

Two major reasons for that:

1- Failing when trying to find a legal/suitable insertion point for the
subsetExtract which is about to replace the empty tensor. However, we
may try to handle this issue by moving the needed values which
responsible on building the subsetExtract nearby the empty tensor
(which is about to be eliminated). Thus increasing the probability to
find a legal insertion point.

2-The EmptyTensorElimination transform replaces the tensor.empty's uses
all at once in one apply, rather than replacing only the specific use
which was visited in the use-def chain (when traversing from the
tensor.insert_slice). This scenario of replacing all the uses of the
tensor.empty may lead into additional read effects after bufferization
of the specific subset extract/subview which should not be the case.

Both cases may result in many copies in the coming bufferization which
can not be canonicalized.

The first case can be noticed when having a tensor.empty followed by
SubsetInsertionOpInterface (or in simple words tensor.insert_slice),
which have been lowered from tensor/tosa.concat.

The second case can be noticed when having a tensor.empty, with many
uses and leading to applying the transformation only once, since the
whole uses have been replaced at once.

The first commit in the PR only adds the lit tests for the cases shown
above (NFC), to emphasize how the transform works, in the coming MRs
will upload a slight changes to handle these case.

The second commit in this PR, we want to replace only the specific use
which was visited in the use-def chain (when traversing from the
tensor.insert_slice's source).

[VPlan] Move initial VPlan block creation to constructor. (NFC)

This sets up the initial blocks needed to initialize a VPlan directly
in the constructor. This will allow tracking of all created blocks
directly in VPlan, simplifying block deletion.

[mlir] Add predicates to tablegen-defined properties (#120176)

Give the properties from tablegen a predicate field that holds the
predicate that the property needs to satisfy, if one exists, and hook
that field up to verifier generation.

[memprof] Undrift MemProfRecord (#120138)

This patch undrifts source locations in MemProfRecord before readMemprof
starts the matching process.

The thoery of operation is as follows:

  1. Collect the lists of direct calls, one from the IR and the other
    from the profile.

  2. Compute the correspondence (called undrift map in the patch)
    between the two lists with longestCommonSequence.

  3. Apply the undrift map just before readMemprof consumes
    MemProfRecord.

The new function gated by a flag that is off by default.

[SLP] Check if instructions exist after vectorization (#120434)

Fixes #120433.

[mlir][IR] Fix bug in AffineExpr simplifier lhs % rhs where lhs = lhs floordiv rhs (#119245)

Fixes an issue where the SimpleAffineExprFlattener would simplify
lhs % rhs to just -(lhs floordiv rhs) instead of
lhs - (lhs floordiv rhs)
if lhs happened to be equal to lhs floordiv rhs.

The reported failure case was
(d0, d1) -> (((d1 - (d1 + 2)) floordiv 8) % 8)
from #114654.

Note that many paths that simplify AffineMaps (e.g. the AffineApplyOp
folder and canonicalization) would not observe this bug because of
of slightly different paths taken by the code. Slightly different
grouping of the terms could also result in avoiding the bug.

Resolves #114654.

[APINotes] Avoid assertion failure with expensive checks (#120487)

Found assertion failures when using EXPENSIVE_CHECKS and running lit
tests for APINotes:
Assertion `left.first != right.first && "two entries for the same
version"' failed.

It seems like std::is_sorted is verifying that the comparison function
is reflective (comp(a,a)=false) when using expensive checks. So we would
get callbacks to the lambda used for comparison, even for vectors with a
single element in APINotesReader::VersionedInfo::VersionedInfo, with
"left" and "right" being the same object. Therefore the assert checking
that we never found equal values would fail.

Fix makes sure that we skip the check for equal values when "left" and
"right" is the same object.

[Exegesis][RISCV] Add RISCV support for llvm-exegesis (#120467)

This patch also makes following amendments to core exegesis:

  • Added distinction between regular registers aliasing check and
    registers used as memory address in instruction.
  • Added scratch memory space pointer register.
  • General exegesis options were amended:
    * mattr - new option to pass a list of enabled target features

Llvm-exegesis RISCV port is a result of team effort. Below everyone
involved listed.
Co-authored-by: Konstantin Vladimirov
konstantin.vladimirov@syntacore.com
Co-authored-by: Dmitrii Petrov dmitrii.petrov@syntacore.com
Co-authored-by: Dmitry Bushev dmitry.bushev@syntacore.com
Co-authored-by: Mark Goncharov mark.goncharov@syntacore.com
Co-authored-by: Anastasiya Chernikova
anastasiya.chernikova@syntacore.com

Original pr: #89047


Co-authored-by: Kazu Hirata kazu@google.com

[AMDGPU][True16][MC] true16 for v_cvt_pknorm_i16/u16_f16 (#119605)

Support true16 format for v_cvt_pknorm_i16/u16_f16 in MC.

[AMDGPU][True16][MC] true16 for v_div_fixup_f16 (#119613)

Support true16 format for v_div_fixup_f16 in MC.

[AMDGPU][True16][MC] true16 for v_minmax/maxmin_f16 (#119586)

Support true16 format for v_minmax/maxmin_f16 in MC.

Since we are replacing v_minmax/maxmin_f16 to v_minmax/maxmin_f16_t16 / v_minmax/maxmin_f16_fake16 in Post-GFX11, have to update the CodeGen
pattern for v_minmax/maxmin_f16 to get CodeGen test passing.

[OpenACC] Implement 'wait' construct

The arguments to this are the same as for the 'wait' clause, so this
reuses all of that infrastructure. So all this has to do is support a
pair of clauses that are already implemented (if and async), plus create
an AST node. This patch does so, and adds proper testing.

[ubsan] Add runtime test for -fsanitize=local-bounds (#120038)

[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120464)

'-mllvm -ubsan-unique-traps'
(#65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).

N.B. we do not use "trap" in the argument name since
#119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).

This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.

[Coverage] Resurrect Branch:FalseCnt in SwitchStmt that was pruned in #112694 (#120418)

I missed that FalseCnt for each Case was used to calculate percentage in
the SwitchStmt. At the moment I resurrect them.

In !HasDefaultCase, the pair of Counters shall be [CaseCountSum, FalseCnt]. (Reversal of before #112694)
I think it can be considered as the False count on SwitchStmt.

FalseCnt shall be folded (same as current impl) in the coming
SingleByteCoverage changes, since percentage would not make sense.

Allow CoverageMapping::getCoverageForFile() to show Branches also outside functions (#120416)

Fixes #119952

Revert "[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120464)"

This reverts commit 7eaf470.

Reason: buildbot breakage (e.g.,
https://lab.llvm.org/buildbot/#/builders/144/builds/14299/steps/6/logs/FAIL__Clang__ubsan-trap-debugloc_c)

[llvm][CodeGen] Intrinsic llvm.powi.* code gen for vector arguments (#118242)

Scalarize vector FPOWI instead of promoting the type. This allows the
scalar FPOWIs to be visited and converted to libcalls before promoting
the type.

FIXME: This should be done in LegalizeVectorOps/LegalizeDAG, but call
lowering needs the unpromoted EVT.

Without this patch, in some backends, such as RISCV64 and LoongArch64,
the i32 type is illegal and will be promoted. This causes exponent type
check to fail when ISD::FPOWI node generates a libcall.

Fix #118079

Revert "[driver] Fix sanitizer libc++ runtime linking (#120370)"

This reverts commit 9af5de3.

Reason: buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/24/builds/3394/steps/10/logs/stdio)
"Unexpectedly Passed Tests (1):
llvm-libc++-shared.cfg.in :: libcxx/language.support/support.dynamic/libcpp_deallocate.sh.cpp"

Reapply "[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120…464)" (#120511)

This reverts commit 2691b96. This
reapply fixes the buildbot breakage of the original patch, by updating
clang/test/CodeGen/ubsan-trap-debugloc.c to specify -fsanitize-merge
(the default, which is merge, is applied by the driver but not
clang_cc1).

This reapply also expands clang/test/CodeGen/ubsan-trap-merge.c.


Original commit message:
'-mllvm -ubsan-unique-traps'
(#65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).

N.B. we do not use "trap" in the argument name since
#119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).

This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.

[flang][cuda] Allocate descriptor in managed memory when emboxing device memory (#120485)

When emboxing memory that comes from CUFMemAlloc, we need to allocate
the descriptor in manage memory as it might be passed to a kernel.

[gn] port 8e8692a (RISCV support for llvm-exegesis)

[mlir python] Port Python core code to nanobind. (#120473)

Relands #118583, with a fix for Python 3.8 compatibility. It was not
possible to set the buffer protocol accessers via slots in Python 3.8.

Why? https://nanobind.readthedocs.io/en/latest/why.html says it better
than I can, but my primary motivation for this change is to improve MLIR
IR construction time from JAX.

For a complicated Google-internal LLM model in JAX, this change improves
the MLIR
lowering time by around 5s (out of around 30s), which is a significant
speedup for simply switching binding frameworks.

To a large extent, this is a mechanical change, for instance changing
pybind11:: to nanobind::.

Notes:

  • this PR needs Nanobind 2.4.0, because it needs a bug fix
    (Support overriding static properties defined via def_prop_ro_static. wjakob/nanobind#806) that landed in that
    release.
  • this PR does not port the in-tree dialect extension modules. They can
    be ported in a future PR.
  • I removed the py::sibling() annotations from def_static and def_class
    in PybindAdapters.h. These ask pybind11 to try to form an overload
    with an existing method, but it's not possible to form mixed
    pybind11/nanobind overloads this ways and the parent class is now
    defined in nanobind. Better solutions may be possible here.
  • nanobind does not contain an exact equivalent of pybind11's buffer
    protocol support. It was not hard to add a nanobind implementation of a
    similar API.
  • nanobind is pickier about casting to std::vector, expecting that
    the input is a sequence of bool types, not truthy values. In a couple of
    places I added code to support truthy values during casting.
  • nanobind distinguishes bytes (nb::bytes) from strings (e.g.,
    std::string). This required nb::bytes overloads in a few places.

[RISCV] Custom legalize vp.merge for mask vectors. (#120479)

The default legalization uses vmslt with a vector of XLen to compute a
mask. This doesn't work if the type isn't legal. For fixed vectors it
will scalarize. For scalable vectors it crashes the compiler.

This patch uses an alternate strategy that promotes the i1 vector to an
i8 vector and does the merge. I don't claim this to be the best
lowering. I wrote it quickly almost 3 years ago when a crash was
reported in our downstream.

Fixes #120405.

[Sema] Fix tautological bounds check warning with -fwrapv (#120480)

The tautological bounds check warning added in #120222 does not take
into account whether signed integer overflow is well defined or not,
which could result in a developer removing a bounds check that may not
actually be always false because of different overflow semantics.

int check(const int* foo, unsigned int idx)
{
    return foo + idx < foo;
}
$ clang -O2 -c test.c
test.c:3:19: warning: pointer comparison always evaluates to false [-Wtautological-compare]
    3 |         return foo + idx < foo;
      |                          ^
1 warning generated.

# Bounds check is eliminated without -fwrapv, warning was correct
$ llvm-objdump -dr test.o
...
0000000000000000 <check>:
       0: 31 c0                         xorl    %eax, %eax
       2: c3                            retq
$ clang -O2 -fwrapv -c test.c
test.c:3:19: warning: pointer comparison always evaluates to false [-Wtautological-compare]
    3 |         return foo + idx < foo;
      |                          ^
1 warning generated.

# Bounds check remains, warning was wrong
$ llvm-objdump -dr test.o
0000000000000000 <check>:
       0: 89 f0                         movl    %esi, %eax
       2: 48 8d 0c 87                   leaq    (%rdi,%rax,4), %rcx
       6: 31 c0                         xorl    %eax, %eax
       8: 48 39 f9                      cmpq    %rdi, %rcx
       b: 0f 92 c0                      setb    %al
       e: c3                            retq

[ADT] Add a unittest for the ScopedHashTable class (#120183)

The ScopedHashTable class is particularly used to develop string tables
for parsers and code convertors. For instance, the MLIRGen class from the
toy example for MLIR actively uses this class to define scopes for
declared variables. To demonstrate common use cases for the
ScopedHashTable class as well as to check its behavior in different
situations, the unittest has been added.

Signed-off-by: Pavel Samolysov samolisov@gmail.com

[gn build] Port 1cc926b

[clang-format] Fix a crash caused by commit f03bf8c

[ADT] Fix warnings

This patch fixes warnings of the form:

llvm/unittests/ADT/ScopedHashTableTest.cpp:41:20: error:
'ScopedHashTableScope' may not intend to support class template
argument deduction [-Werror,-Wctad-maybe-unsupported]

[SelectionDAG] Rename SDNode::uses() to users(). (#120499)

This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.

I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.

[Coroutines][Docs] Add a discussion on the handling of certain parameter attribs (#117183)

ByVal arguments and Swifterror require special handling in the coroutine
passes. The goal of this section is to provide a description of how
these parameter attributes are handled.

[RISCV] Add software pipeliner support (#117546)

This patch adds basic support of MachinePipeliner and disable
it by default.

The functionality should be OK and all llvm-test-suite tests have
passed.

[Clang] Don't assume unexpanded PackExpansions' size when expanding packs (#120380)

CheckParameterPacksForExpansion() previously assumed that template
arguments don't include PackExpansion types when attempting another pack
expansion (i.e. when NumExpansions is present). However, this assumption
doesn't hold for type aliases, whose substitution might involve
unexpanded packs. This can lead to incorrect diagnostics during
substitution because the pack size is not yet determined.

To address this, this patch calculates the minimum pack size (ignoring
unexpanded PackExpansionTypes) and compares it to the previously
expanded size. If the minimum pack size is smaller, then there's still a
chance for future substitution to expand it to a correct size, so we
don't diagnose it too eagerly.

Fixes #61415
Fixes #32252
Fixes #17042

[SelectionDAG] Replace findGlueUse in SelectionDAGISel with SDNode::getGluedUser. NFC (#120512)

[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509)

Most of these are just places that want the first user and aren't
iterating over the whole list.

While there I changed some use_size() == 1 to hasOneUse() which
is more efficient.

This is part of an effort to rename use_iterator to user_iterator
and provide a use_iterator that dereferences to SDUse&. This patch
helps reduce the diff on later patches.

[RISCV][MCA] Move sifive-x280 tests to directory SiFiveX280 (#120522)

[llvm-mc] --no-exec-stack: replace initSection with switchSection. NFC

AsmParser will call initSection unless -n is specified.
It is not good to call initSection twice.

[NFC] Move DroppedVariableStats code to Analysis (#120502)

This is done because the CodeGen library and Passes library both link
against Analysis, to avoid adding a dependency between CodeGen and
Passes if we want to extend the DroppedVariableStats code for MIR stats
as well, as seen in #120501

[gn build] Port e389492

[LLVM] Update BPF maintainer (#120429)

Nowadays yonghong-song and eddyz87 are more involved with LLVM
BPF development than 4ast, so update the maintainer list to reflect
this.

[LLVM] Move Bigcheese to inactive maintainer for Windows object tools (#120425)

Bigcheese isn't actively working on Windows support in object tools
anymore, so move him to the inactive maintainer list. I'm also not
aware of anyone else who is actively involved in this area currently,
so I'm dropping the category entirely for now.

[LLVM] Update maintainers for binary utilities (#120428)

We currently list jakehehrlich as the maintainer for llvm-objcopy /
ObjCopy, but he hasn't been involved with LLVM for more than 5 years.

Convert the llvm-object category into a broader binary utilities
category and add jh7370 and MaskRay as the new maintainers.

Add a pass to collect dropped var stats for MIR (#120501)

Reland "Add a pass to collect dropped var stats for MIR" (#117044)

I am trying to reland #115566

I also moved the DroppedVariableStats code to the Analysis lib

This is part of a stack of patches with
#120502 being the first one in
the stack

Revert "Add a pass to collect dropped var stats for MIR (#120501)"

This reverts commit 223c764.

Reverted due to vuildbot failure:

flang-aarch64-libcxx

Linking CXX shared library lib/libLLVMAnalysis.so.20.0git
FAILED: lib/libLLVMAnalysis.so.20.0git

[RISCV] Add scheduling model for mips p8700 CPU (#119885)

Depends on #119882.

[LLVM] Update ADT/Support maintainers (#120423)

Nominate dwblaikie and kuhar as new maintainers for ADT/Support,
replacing chandlerc.

Revert "[RISCV] Add scheduling model for mips p8700 CPU" (#120537)

Reverts #119885

llvm-project/llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td:20:5:
error: Processor does not define resources for WriteFCvtF32ToF16
def MIPSP8700Model : SchedMachineModel {

[TOSA] Don't run validation pass on non TOSA operations (#120205)

This commit ensures the validation pass is not run on operations from
other dialects. In doing so, operations from other dialects that, for
example, use types not supported by TOSA don't result in an error.

Signed-off-by: Luke Hutton luke.hutton@arm.com

Reapply "[driver] Fix sanitizer libc++ runtime linking (#120370)" (#120538)

Reland without item 2 from #120370 to avoid breaking libc++ tests.

This reverts commit 60a2f32.

[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#120449)

Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.

This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.

Fixes #111293.

[lldb][AIX] clang-format changes for ProcessLauncherPosixFork.cpp (#120459)

This PR is in reference to porting LLDB on AIX.

Link to discussions on llvm discourse and github:

  1. https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640
  2. Extending LLDB to work on AIX #101657
    The complete changes for porting are present in this draft PR:
    Extending LLDB to work on AIX #102601

Added clang-format changes for ProcessLauncherPosixFork.cpp which will
be followed by ptrace changes in:

[AArch64][SME2] Extend getRegAllocationHints for ZPRStridedOrContiguousReg (#119865)

ZPR2StridedOrContiguous loads used by a FORM_TRANSPOSED_REG_TUPLE
pseudo should attempt to assign a strided register to avoid unnecessary
copies, even though this may overlap with the list of SVE callee-saved registers.

[PAC][MC][ELF][AArch64] Support signed TLSDESC (#120010)

Support the following relocations and assembly operators:

  • R_AARCH64_AUTH_TLSDESC_ADR_PAGE21 (:tlsdesc_auth: for adrp)
  • R_AARCH64_AUTH_TLSDESC_LD64_LO12 (:tlsdesc_auth_lo12: for ldr)
  • R_AARCH64_AUTH_TLSDESC_ADD_LO12 (:tlsdesc_auth_lo12: for add)

[X86] LowerShift - directly initialize SmallVector with build vector operands. NFC.

Don't push_back the operands separately.

[X86] ExtendToType - directly initialize SmallVector with build vector operands. NFC.

Don't push_back the operands separately.

[PS5][Driver] Pass user search paths to linker before implict ones (#119875)

Responsibility for setting up implicit library search paths was recently
transferred to the PS5 driver (#109796). Prior to this, SIE private
patches in lld performed this function. During the transition, I failed
to maintain the order in which implicit and user-supplied search paths
were supplied/considered. This change ensures user-supplied search paths
appear before any implicit ones on the link line.

SIE tracker: TOOLCHAIN-17490

[bazel] port 79e859e

Revert "[NFC] Move DroppedVariableStats code to Analysis (#120502)"

that introduces a circular dependency of analysis -> codegen -> target

This reverts commit e389492.

[gn build] Port cffe22a

[LoopVectorize] Use new single string variant of reportVectorizationFailure (#120414)

[AArch64] Tweak truncate costs for some scalable vector types (#119542)

== We were previously returning an invalid cost when truncating
anything to <vscale x 2 x i1>, which is incorrect since we can
generate perfectly good code for this.

== The costs for truncating legal or unpacked types to predicates
seemed overly optimistic. For example, when truncating
<vscale x 8 x i16> to <vscale x 8 x i1> we typically do
something like

and z0.h, z0.h, #0x1
cmpne p0.h, p0/z, z0.h, #0

I guess it might depend upon whether the input value is
generated in the same block or not and if we can avoid the
inreg zero-extend. However, it feels safe to take the more
conservative cost here.

== The costs for some truncates such as

trunc <vscale x 2 x i32> %a to <vscale x 2 x i16>

were 1, whereas in actual fact they are free and no instructions
are required.

== Also, for this

trunc <vscale x 8 x i32> %a to <vscale x 8 x i16>

it's just a single uzp1 instruction so I reduced the cost to 1.

In general, I've added costs for all cases where the destination
type is legal or unpacked. One unfortunate side effect of this
is the costs for some fixed-width truncates when using SVE now
look too optimistic.

[ARM] Fix BF16 lowering with FullFP16

This adds test coverage for bf16 instructions, making sure that lowering bf16
works with and without +fullfp16.

[Clang] Fix crash in __builtin_assume_aligned (#114217)

The CodeGen for __builtin_assume_aligned assumes that the first argument
is a pointer, so crashes if the int-conversion error is downgraded or
disabled. Emit a non-downgradable error if the argument is not a
pointer, like we currently do for __builtin_launder.

Fixes #110914.

[AArch64] Fixup destructive floating-point precision conversions (#118788)

This patch changes the zeroing forms of FCVTXNT, FCVTNT, and
BFCVTNT such that their destination operand is also listed as a dag
input. These narrowing down-conversions leave the even elements of the
destination vector unchanged, regardless of the predicate type.

This patch also makes the merging form of BFCVTNT non-movprfx'able.

[LLParser] Remove redundant code (NFC) (#120478)

ARM: Handle vldrh and vstrh in stack access hooks (#120527)

[AMDGPU] Remove unneeded use of !dag. NFC. (#120546)

[analyzer][NFC] Introduce APSIntPtr, a safe wrapper of APSInt (1/4) (#120435)

One could create dangling APSInt references in various ways in the past, that were sometimes assumed to be persisted in the BasicValueFactor.

One should always use BasicValueFactory to create persistent APSInts, that could be used by ConcreteInts or SymIntExprs and similar long-living objects.
If one used a temporary or local variables for this, these would dangle.
To enforce the contract of the analyzer BasicValueFactory and the uses of APSInts, let's have a dedicated strong-type for this.

The idea is that APSIntPtr is always owned by the BasicValueFactory, and that is the only component that can construct it.

These PRs are all NFC - besides fixing dangling APSInt references.

[LoopVectorizer] Add support for partial reductions (#92418)

Following on from #94499, this
patch adds support to the Loop Vectorizer to emit the partial reduction
intrinsics where they may be beneficial for the target.


Co-authored-by: Samuel Tebbs samuel.tebbs@arm.com

[analyzer][NFC] Migrate nonloc::ConcreteInt to use APSIntPtr (2/4) (#120436)

[analyzer][NFC] Migrate loc::ConcreteInt to use APSIntPtr (3/4) (#120437)

[analyzer][NFC] Migrate {SymInt,IntSym}Expr to use APSIntPtr (4/4) (#120438)

[clang] NFC, simplify the shouldLifetimeExtendThroughPath.

[FMV][AArch64] Emit mangled default version if explicitly specified. (#120022)

Currently we need at least one more version other than the default to
trigger FMV. However we would like a header file declaration

attribute((target_version("default"))) void f(void);

to guarantee that there will be f.default

[X86] Put R20/R21/R28/R29 later in GR64 list (#120510)

Because these registers require an extra byte to encode in certain
memory form. Putting them later in the list will reduce code size when
EGPR is enabled. And align the same order in GR8, GR16 and GR32 lists.
Example:

movq (%r20), %r11  # encoding: [0xd5,0x1c,0x8b,0x1c,0x24]
movq (%r22), %r11  # encoding: [0xd5,0x1c,0x8b,0x1e]

[analyzer] Handle [[assume(cond)]] as __builtin_assume(cond) (#116462)

Resolves #100762

Gist of the change:

  1. All the symbol analysis, constraint manager and expression parsing
    logic was already present, but the previous code didn't "visit" the
    expressions within assume() by parsing those expressions, all of the
    code "just works" by evaluating the SVals, and hence leaning on the same
    logic that makes the code with __builtin_assume work
  2. "Ignore" an expression from adding in CFG if it has side-effects (
    similar to CGStmt.cpp (todo add link))
  3. Add additional test case for ternary operator handling and modify
    CFG.cpp's VisitGuardedExpr code for continue-ing if the ProgramPoint
    is a StmtPoint

Co-authored-by: Balazs Benics benicsbalazs@gmail.com

[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561)

If the shuffle split results in referencing a single legalised whole vector (i.e. no permutation), then this can be treated as free.

We already do something similar for broadcasts / whole subvector insertion + extraction - its purely an issue for register allocation.

[AArch64][SVE] Use SVE for scalar FP converts in streaming[-compatible] functions (1/n) (#118505)

In streaming[-compatible] functions, use SVE for scalar FP conversions
to/from integer types. This can help avoid moves between FPRs and GRPs,
which could be costly.

This patch also updates definitions of SCVTF_ZPmZ_StoD and
UCVTF_ZPmZ_StoD to disallow lowering to them from ISD nodes, as doing so
requires creating a [U|S]INT_TO_FP_MERGE_PASSTHRU node with inconsistent
types.

Follow up to #112213.

Note: This PR does not include support for f64 <-> i32 conversions (like
#112564), which needs a bit more work to support.

Reland "[RISCV] Add scheduling model for mips p8700 CPU" (#120550)

This patch introduces a scheduling model for the MIPS p8700, an
out-of-order
RISC-V processor. The model includes pipelines for the following units:

  • 2 Integer Arithmetic/Logical Units (ALU and AL2)
  • Multiply/Divide Unit (MDU)
  • Branch Unit (CTI)
  • Load/Store Unit (LSU)
  • Short Floating-Point Pipe (FPUS)
  • Long Floating-Point Pipe (FPUL)

For additional details, refer to the official product page:
https://mips.com/products/hardware/p8700/.

Also adds UnsupportedSchedZfhmin to handle cases like
WriteFCvtF16ToF32 that
previously caused build failures.

[Clang][AArch64] Add signed index/offset variants of sve2p1 qword stores (#120549)

This patch adds signed offset/index variants to the SVE2p1 quadword
store intrinsics, in accordance with
ARM-software/acle#359.

[lldb][AIX] GetOpt support in AIX (#120574)

This PR is in reference to porting LLDB on AIX.

Link to discussions on llvm discourse and github:

  1. https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640
  2. Extending LLDB to work on AIX #101657
    The complet

jofrn added 8 commits June 2, 2025 00:15
Vector types on atomics are assumed to be invalid by the verifier. However,
this type can be valid if it is lowered by codegen.

commit-id:72529270
`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size.

commit-id:2894ccd1
This change adds patterns to optimize out an extra MOV
present after widening the atomic load.

commit-id:45989503
This commit casts floats to ints in an atomic load during AtomicExpand to support
floating point types. It also is required to support 128 bit vectors in SSE/AVX.

commit-id:80b9b6a7
AtomicExpand fails for aligned `load atomic <n x T>` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered. It also adds support for
128 bit lowering in tablegen to support SSE/AVX.

commit-id:f430c1af
Copy link
Contributor Author

jofrn commented Jun 2, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment