[MTE] [NFC] use vector to collect globals to tag (#120283) #142330
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[MTE] [NFC] use vector to collect globals to tag (#120283)
The same pattern caused test failures in the HWASan pass, so is brittle.
Let's go for the easier approach.
[DOCS] Rename LLVM Security Group to LLVM Security Response Group. (#116986)
Rename LLVM Security Group to LLVM Security Response Group. Take the
opportunity to canonicalise security group and Security Group to LLVM
Security Response Group.
At the 2024-11-19 LLVM Security Group meeting [1] we discussed that in
practice the LLVM Security Group was performing an incident response
role, but it was not proactively adding additional testing, fuzzing and
hardening. We do not want projects that use LLVM to see the LLVM
Security Group as guaranteeing security for LLVM.
We decided that it would be useful to rename the group to LLVM Security
Response Group as that reflects the work that it is doing.
There may be a case for a proactive security group with a different
remit, but this is out of scope of this commit.
[1]
https://discourse.llvm.org/t/llvm-security-group-public-sync-ups/62735/32
[DOCS] Remove bullet point on improving security over time. (#116980)
Remove the 6th bullet point "Strive to improve security over time, for
example by adding additional testing, fuzzing and hardening after fixing
issues."
At the security group meeting on 2024-11-19 we discussed the role the
security group was performing in practice. We are in effect acting as a
security response group, dealing with issues raised via the process
given in the LLVM Security group page. We are not proactively adding
additional testing fuzzing and hardening. While this could be considered
an aspirational goal, it may give the implication that the LLVM Security
Group is handling or at worst guaranteeing security for the LLVM project
when in practice it is not.
Meeting notes:
https://discourse.llvm.org/t/llvm-security-group-public-sync-ups/62735/32
[Github] Add LLVM Premerge Checks to the watchlist (#120230)
LLVM Premerge Checks is running on the new GCP cluster. Tracking its
metrics will allow us to determine the stability of the presubmit and
make sure the new infra is working as intended.
Signed-off-by: Nathan Gauër brioche@google.com
[SPIR-V] Fix issue #120078 and simplifies parsing of floating point decoration tips in demangled function name (#120128)
This PR fixes #120078 and
improves/simplifies parsing of demangled function name that aims to
detect a tip for floating point decorations. The latter improvement
fixes also a complaint from
LLVM_USE_SANITIZER=Address
.[AArch64] Prevent unnecessary truncation in bool vector reduce code generation (#120096)
Prevent unnecessarily truncating results of 128 bit wide vector
comparisons to 64 bit wide vector values in boolean vector reduce
operations.
[LoopVectorize] Enable more early exit vectorisation tests (#117008)
PR #112138 introduced initial support for dispatching to
multiple exit blocks via split middle blocks. This patch
fixes a few issues so that we can enable more tests to use
the new enable-early-exit-vectorization flag. Fixes are:
too late. This is because collectUsersInExitBlocks ignores
induction variables, which get dealt with in fixupIVUsers.
I've moved the check much earlier in processLoop by looking
for outside users of loop-defined values.
with uncountable early exits, since we've not added support
for this yet.
that require scalar epilogues, although I plan to add that
as part of PR [LoopVectorize] Add support for vectorisation of more early exit loops #88385.
parent loop.
[flang][HLFIR] fix FORALL issue 120190 (#120236)
Fix #120190.
The hlfir.forall lowering code was not properly checking for forall
index reference in mask value computation before trying to hoist it: it
was only looking at the ops directly nested in the hlfir.forall_mask
region, but not the operation indirectly nested. This caused triggered
bogus hoisting in #120190 leading to undefined behavior (reference to
uinitialized data). The added regression test would die at compile time
with a dominance error.
Fix this by doing a deep walk of the region operation instead. Also
clean-up the region cloning to use without_terminator.
[llvm][RISCV] Set ScalableVector stack id in proper place (#117862)
Without this patch ScalableVector frame index property is used before
assignment. More precisely, let's take a look at
RISCVFrameLowering::assignCalleeSavedSpillSlots. In this function we
divide callee saved registers on scalar and vector ones, based on
ScalableVector property of their frame indexes:
But we assign ScalableVector property several lines below:
Due to it, list of RVV callee saved registers will always be empty.
Currently this problem doesn't appear, but if you slightly change the
code and, for example, put some instructions between scalar and vector
spills, the resulting code will be ill formed.
[LV] Fixup check lines after 13107cb.
[lldb][NFC] clang-format MainLoopPosix.cpp
Since AIX support is about to change this.
[Clang] Implement CWG2813: Class member access with prvalues (#120223)
This is a rebase of #95112 with my own feedback apply as @MitalAshok has
been inactive for a while.
It's fairly important this makes clang 20 as it is a blocker for #107451
CWG2813
prvalue.member_fn(expression-list) now will not materialize a temporary
for prvalue if member_fn is an explicit object member function, and
prvalue will bind directly to the object parameter.
The E1 in E1.static_member is now a discarded-value expression, so if E1
was a call to a [[nodiscard]] function, there will now be a warning.
This also affects C++98 with [[gnu::warn_unused_result]] functions.
This should not affect C where TemporaryMaterializationConversion is a
no-op.
Closes #100314
Fixes #100341
Co-authored-by: Mital Ashok mital@mitalashok.co.uk
[lldb] Add lldb/source/Host/posix/MainLoopPosix.cpp to git blame ignores
[VFABI] Add support for vector functions that return struct types (#119000)
This patch updates the
VFABIDemangler
to support vector functions thatreturn struct types. For example, a vector variant of
sincos
thatreturns a vector of sine values and a vector of cosine values within a
struct.
This patch also adds some helpers for vectorizing types (including
struct types). Some of these are used in the
VFABIDemangler
, andothers will be used in subsequent patches, so this patch simply adds
tests for them.
[X86] combineKSHIFT - fold kshiftr(kshiftr/extract_subvector(X,C1),C2) --> kshiftr(X,C1+C2) (#115528)
Merge serial KSHIFTR nodes, possibly separated by EXTRACT_SUBVECTOR, to allow mask instructions to be computed in parallel.
[gn build] Port 1ee740a
[github/CODEOWNERS] Add yota9 as BOLT reviewer
[ARM] Reduce loop unroll when low overhead branching is available (#120065)
For processors with low overhead branching (LOB), runtime unrolling the
innermost loop is often detrimental to performance. In these cases the
loop remainder gets unrolled into a series of compare-and-jump blocks,
which in deeply nested loops get executed multiple times, negating the
benefits of LOB.
This is particularly noticable when the loop trip count of the innermost
loop varies within the outer loop, such as in the case of triangular
matrix decompositions.
In these cases we will prefer to not unroll the innermost loop, with the
intention for it to be executed as a low overhead loop.
Add support for single reductions in ComplexDeinterleavingPass (#112875)
The Complex Deinterleaving pass assumes that all values emitted will
result in complex numbers, this patch aims to remove that assumption and
adds support for emitting just the real or imaginary components, not
both.
Reland [Clang] skip default argument instantiation for non-defining friend declarations to meet [dcl.fct.default] p4 (#115487)
This fixes a crash when instantiating default arguments for templated
friend function declarations which lack a definition.
There are implementation limits which prevents us from finding the
pattern for such functions, and this causes difficulties
setting up the instantiation scope for the function parameters.
This patch skips instantiating the default argument in these cases,
which causes a minor regression in error recovery, but otherwise avoids
the crash.
The previous attempt #113777 accidentally skipped all default argument
constructions, causing some regressions. This patch resolves that by
moving the guard to InstantiateDefaultArgument() where the handling of
templates takes place.
Fixes #113324
[AMDGPU] Modify Dyn Alloca test to account for Machine-Verifier bug (#120393)
Machine-Verifier crashes in kernel functions,
but fails gracefully in device functions.
This is due to the buffer resource descriptor selected
during G-ISEL, before the fallback path.
Device functions use
$sgpr0_sgpr1_sgpr2_sgpr3
.while Kernel functions select
$private_rsrc_reg
where machine-verifier complains:
$private_rsrc_reg is not a SReg_128 register.
Modifying test case to capture both behaviors, this is related to
#120063
[clang-tidy] use local config (#120004)
follow up patch for #119948.
[NVPTX] fix nvcl-param-align.ll
fix for f9c8c01
SourceCoverageViewHTML.cpp: Reformat JS
Introduce CounterMappingRegion::isBranch(). NFC.
llvm-cov: Refactor SourceCoverageView::renderBranchView().
NFC except for calculating
Total
. I've replaced(uint64_t)+(uint64_t)
with(double)+(double)
.This is still inexact with large numbers
(1LL << 53)
but will be expected to prevent possible overflow.[SCEV] Bail out on mixed int/pointer in SCEVWrapPredicate::implies.
Fixes a crash when trying to extend the pointer start value to a narrow
integer type after b6c29fd.
LLVMContext: rem constexpr to unblock build w/ gcc (#120402)
Address issues observed in buildbots with older GCC versions:
https://lab.llvm.org/buildbot/#/builders/140/builds/13302
[X86] LowerShift - track the number and location of constant shift elements. (#120270)
We have several vector shift lowering strategies that have to analyse
the distribution of non-uniform constant vector shift amounts, at the
moment there is very little sharing of data between these analysis.
This patch creates a SmallDenseMap of the different LEGAL constant shift
amounts used, with a mask of which elements they are used in. So far
I've only updated the shuffle(immshift(x,c1),immshift(x,c2)) lowering
pattern to use it for clarity, there's several more that can be done in
followups. Its hoped that the proposed patch #117980 can be simplified
after this patch as well.
vec_shift6.ll - the existing shuffle(immshift(x,c1),immshift(x,c2))
lowering bails on out of range shift amounts, while this patch now skips
them and treats them as UNDEF - this means we manage to fold more cases
that before would have to lower to a SHL->MUL pattern, including some
legalized cases.
[TableGen][GISel] Import more "multi-level" patterns (#120332)
Previously, if the destination DAG has an untyped leaf, we would import
the pattern only if that leaf is defined by the top-level source DAG.
This is an unnecessary restriction.
Here is an example of such pattern:
Previously, it failed to import because
add
doesn't define neither$vA
nor$vB
.This change reduces the number of skipped patterns as follows:
Other GISel-enabled targets are unaffected.
[LLVM][AsmPrinter] Add vector ConstantInt/FP support to emitGlobalConstantImpl. (#120077)
The fixes a failure path for fixed length vector globals when
ConstantInt/FP is used to represent splats instead of
ConstantDataVector.
[Exegesis][RISCV] Add RISCV support for llvm-exegesis (#89047)
This patch also makes following amendments to core exegesis:
registers used as memory address in instruction.
* mattr - new option to pass a list of enabled target features
Llvm-exegesis RISCV port is a result of team effort. Below everyone
involved listed.
Co-authored-by: Konstantin Vladimirov
konstantin.vladimirov@syntacore.com
Co-authored-by: Dmitrii Petrov dmitrii.petrov@syntacore.com
Co-authored-by: Dmitry Bushev dmitry.bushev@syntacore.com
Co-authored-by: Mark Goncharov mark.goncharov@syntacore.com
Co-authored-by: Anastasiya Chernikova
anastasiya.chernikova@syntacore.com
Co-authored-by: Dmitry Bushev dmitry.bushev@syntacore.com
[X86] urem-seteq-illegal-types.ll - regenerate VPTERNLOG comment
Fix unused variable warning. NFC.
Revert "[Exegesis][RISCV] Add RISCV support for llvm-exegesis (#89047)"
This reverts commit bc3eee1.
These tests are failing because of no
REQUIRES
.[Xtensa] Implement Code Density Option. (#119639)
The Code Density option adds 16-bit encoding for frequently used
instructions.
[InstCombine] Drop samesign flags in
foldLogOpOfMaskedICmps_NotAllZeros_BMask_Mixed
(#120373)Counterexamples: https://alive2.llvm.org/ce/z/6Ks8Qz
Closes #120361.
[lldb][AIX] Header Parsing for XCOFF Object File in AIX (#116338)
This PR is in reference to porting LLDB on AIX.
Link to discussions on llvm discourse and github:
The complete changes for porting are present in this draft PR:
Extending LLDB to work on AIX #102601
Added XCOFF Object File Header Parsing for AIX.
Details about XCOFF file format on AIX:
XCOFF
Reapply "[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas" (#120410)
This reapplies commit #120063.
A machine-verifier bug was causing a crash in the previous commit.
This has been addressed in
#120393.
[AMDGPU] Use -triple instead of -arch in MC tests
[Python] Use raw string literals for regexes (#120401)
Previously these backslashes were not followed by a valid escape
sequence character so were treated as literal backslashes, which was the
intended behaviour of the code. However python as of 3.12 has started
warning about these, so we should use raw string literals for regexes so
that backslashes are always interpreted literally. I've done this for
every regex in this file for consistency, including the ones which do
not contain backslashes.
[mlir][SCF] Unify tileUsingFor and tileReductionUsingFor implementation (#120115)
This patch unifies the tiling implementation for tileUsingFor and
tileReductionUsingFor. This is done by passing an addition option to
SCFTilingOptions, allowing it to set how reduction dimensions should be
tiled. Currently, there are 3 different options for reduction tiling:
FullReduction (old tileUsingFor), PartialReductionOuterReduction (old
tileReductionUsingFor) and PartialReductionOuterParallel
(linalg::tileReductionUsingForall, this isn't implemented in this
patch).
The patch makes tileReductionUsingFor use the tileUsingFor
implementation with the new reduction tiling options.
There are no test changes because the implementation was doing almost
the exactly same thing. This was also tested in IREE (which uses both
these APIs heavily) and there were no test changes.
Revert "[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different" (#120422)
Reverts #115209 - investigating a reported regression
[VPlan] Handle exit phis with multiple operands in addUsersInExitBlocks. (#120260)
Currently the addUsersInExitBlocks incorrectly assumes exit phis only
have a single operand, which may not be the case for loops with early
exits when they share a common exit block.
Also further relax the assertion in fixupIVUsers to allow exit values if
they come from theloop latch/middle.block.
PR: #120260
[OpenMP][Clang] Migrate OpenMP UserDefinedMapper from Clang to OMPIRBuilder (#110001)
This patch migrates the OpenMP UserDefinedMapper codegen from Clang to
the OpenMPIRBuilder. I will be adding further patches in the near future
so that OpenMP dialect in MLIR can make use of these.
[flang] Add UNSIGNED (#113504)
Implement the UNSIGNED extension type and operations under control of a
language feature flag (-funsigned).
This is nearly identical to the UNSIGNED feature that has been available
in Sun Fortran for years, and now implemented in GNU Fortran for
gfortran 15, and proposed for ISO standardization in J3/24-116.txt.
See the new documentation for details; but in short, this is C's
unsigned type, with guaranteed modular arithmetic for +, -, and *, and
the related transformational intrinsic functions SUM & al.
Revert "Add support for single reductions in ComplexDeinterleavingPass (#112875)"
This reverts commit b3eede5.
This has been breaking most AArch64 stage2 builds for 4+ hours,
reverting to get the bots back to green.
https://lab.llvm.org/buildbot/#/builders/41/builds/4172
https://lab.llvm.org/buildbot/#/builders/4/builds/4281
https://lab.llvm.org/buildbot/#/builders/199/builds/263
https://lab.llvm.org/buildbot/#/builders/198/builds/334
https://lab.llvm.org/buildbot/#/builders/143/builds/4276
https://lab.llvm.org/buildbot/#/builders/17/builds/4725
[Exegesis][RISCV] Add RISCV support for llvm-exegesis (#120419)
This patch also makes following amendments to core exegesis:
registers used as memory address in instruction.
* mattr - new option to pass a list of enabled target features
Llvm-exegesis RISCV port is a result of team effort. Below everyone
involved listed.
Co-authored-by: Konstantin Vladimirov
konstantin.vladimirov@syntacore.com
Co-authored-by: Dmitrii Petrov dmitrii.petrov@syntacore.com
Co-authored-by: Dmitry Bushev dmitry.bushev@syntacore.com
Co-authored-by: Mark Goncharov mark.goncharov@syntacore.com
Co-authored-by: Anastasiya Chernikova
anastasiya.chernikova@syntacore.com
Co-authored-by: Anastasiya Chernikova anastasiya.chernikova@syntacore.com
Fix #110001 build error.
[TableGen][GISel] Improve dead register handling (#120426)
A dead implicit def wasn't marked as dead if it is also an implicit use.
The new approach should also be more straightforward and simplifies
future changes for supporting optional defs and physical register defs.
Pull Request: #120426
[DirectX] Split resource info into type and binding info. NFC (#119773)
This splits the DXILResourceAnalysis pass into TypeAnalysis and
BindingAnalysis passes. The type analysis pass is made immutable and
populated lazily so that it can be used earlier in the pipeline without
needing to carefully maintain the invariants of the binding analysis.
Fixes #118400
[X86] LowerShift - don't prematurely lower to x86 vector shift imm instructions (#120282)
When splitting 2 unique amount shifts to shuffle(shift(x,c1),shift(x,c2)), don't use getTargetVShiftByConstNode directly to lower, use generic shifts to ensure we make use of any further canonicalization: shl(X,1) to add(X,X) etc. - this can have notably better throughput on some x86 targets.
Noticed on #120270
[Clang] Set
__cpp_explicit_this_parameter
(#107451)There are not a lot of outstanding known issues
with deducing this (besides #95112), so it
seems reasonable to claim full support.
Fixes #82780
[clang-doc] Use LangOpts when printing types (#120308)
The implementation in the clang-doc serializer failed to take in the
LangOpts from the declaration. As a result, we'd do things like print
_Bool
instead ofbool
, even in C++ code.Fixes #62970
Reland 2de7881 (#119650) (#120454)
[NFC] Move DroppedVariableStats to its own file and redesign it to be
extensible. (#115563)
Move DroppedVariableStats code to its own file and change the class to
have an extensible design so that we can use it to add dropped
statistics to MIR passes and the instruction selector.
[libc][docs] convert stdio.h to docgen (#120334)
Add info from n3220 and POSIX.1-2024.
[flang][NFC] static assert intrinsic table is sorted (#120399)
This invariant is used below when searching for intrinsic
implementation. Currently, if the map is not sorted, the compiler will
just silently assume there is no such implementation.
[DirectX] Introduce the DXILResourceAccess pass (#116726)
This pass transforms resource access via
llvm.dx.resource.getpointer
into buffer loads and stores.
Fixes #114848.
[lld] Move BPSectionOrderer from MachO to Common for reuse in ELF (#117514)
Add lld/Common/BPSectionOrdererBase from MachO for reuse in ELF
[DirectX] Create symbols for resource handles (#119775)
We need to create symbols with "the original shape of resource and
element type" to put in the resource metadata in order to generate valid
DXIL.
Note that DXC generally doesn't emit an actual symbol outside of library
shaders (it emits an undef of a pointer to the type), but since we have
to deal with opaque pointers we would need a way to smuggle the type
through to match that. Instead, we simply emit symbols for now.
Fixed #116849
Revert "[Exegesis][RISCV] Add RISCV support for llvm-exegesis (#120419)"
This reverts commit 6993d32.
Reason: buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/51/builds/7908)
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/tools/llvm-exegesis/lib/RISCV -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/tools/llvm-exegesis/lib/RISCV -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/RISCV -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/RISCV -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/llvm-exegesis/lib/RISCV/CMakeFiles/LLVMExegesisRISCV.dir/Target.cpp.o -MF tools/llvm-exegesis/lib/RISCV/CMakeFiles/LLVMExegesisRISCV.dir/Target.cpp.o.d -o tools/llvm-exegesis/lib/RISCV/CMakeFiles/LLVMExegesisRISCV.dir/Target.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/tools/llvm-exegesis/lib/RISCV/Target.cpp
In file included from /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/tools/llvm-exegesis/lib/RISCV/Target.cpp:139:
/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/RISCV/RISCVGenAsmMatcher.inc:239:19: error: unused function 'MatchRegisterName' [-Werror,-Wunused-function]
239 | static MCRegister MatchRegisterName(StringRef Name) {
| ^~~~~~~~~~~~~~~~~
/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/RISCV/RISCVGenAsmMatcher.inc:568:19: error: unused function 'MatchRegisterAltName' [-Werror,-Wunused-function]
568 | static MCRegister MatchRegisterAltName(StringRef Name) {
| ^~~~~~~~~~~~~~~~~~~~
[AMDGPU][True16][MC] test update for v_ldexp_f16 in true16 (#119313)
This is a NFC change. Update mc test for v_ldexp_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
[AMDGPU][True16][MC] test update for v_subrev_f16 in true16 (#119315)
This is a NFC change. Update mc test for v_subrev_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460)
Reverts #119225 due to the lack of sanitizer support,
large potential of breaking code containing latent UB, non-trivial
localization and investigation, and what seems to be a bad interaction
with msan (a test is in the works).
Related discussions:
#119225 (comment)
#118472 (comment)
[NFC] update gfx12 vop test to use sed instead of grep (#120458)
changes from #119778 breaks the
AIX clang ppc64 bot:
https://lab.llvm.org/buildbot/#/builders/64/builds/1714 as
grep -o
isnot supported on AIX and is not POSIX compatible as per:
https://www.unix.com/man-page/posix/1p/grep/
Co-authored-by: Mark Danial mark.danial@ibm.com
[PhaseOrdering] Update test for #120460
[AMDGPU][True16][MC] true16 for v_pack_b32_f16 (#119630)
Support true16 format for v_pack_b32_f16 in MC.
Since we are replacing v_alignbit_b32 to
v_pack_b32_f16_t16/v_pack_b32_f16_fake16
in Post-GFX11, have to updatethe CodeGen pattern for
v_pack_b32_f16_fake16
to get CodeGen testpassing. There is no pattern modified/created, but just replacing the
v_pack_b32_f16
with fake16 format.Some of the true16 CodeGen test are impacted since
v_pack_b32_f16
selection are removed in Post-GFX11 while
v_pack_b32_f16_t16
are notyet supported. The CodeGen patch for
v_pack_b32_f16_t16
will be doneis the following patch.
[clang] Change initialization of a vector from undef to poison [NFC] (#120446)
It is fully initialized with insertelements.
[driver] Fix sanitizer libc++ runtime linking (#120370)
override defauld behavior implied from
CCCIsCXX
[gn build] Port 5717a99
[gn build] Port 79e859e
[AMDGPU][MC] Disallow op_sel in some VOP3P dot instructions (#100485)
In v_dot4 and v_dot8 instructions with 4- or 8-bit packed data (e.g.,
v_dot4_u32_u8, v_dot8_u32_u4), the op_sel modifier should not be
allowed.
[MemRef] Migrate away from PointerUnion::{is,get} (NFC) (#120382)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa, cast and the llvm::dyn_cast
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
[memprof] Move Frame::hash and hashCallStack to IndexedMemProfData (NFC) (#120365)
Now that IndexedMemProfData::{addFrame,addCallStack} are the only
callers of Frame::hash and hashCallStack, respectively, this patch
moves those functions into IndexedMemProfData and makes them private.
With this patch, we can obtain FrameId and CallStackId only through
addFrame and addCallStack, respectively.
[DirectX] Lower ops after translating metadata (#120157)
Move the DXILOpLoweringPass after DXILTranslateMetadata, and add asserts
in DXILShaderFlags to ensure it isn't scheduled after op lowering. This
will allow us to rely on DirectX intrinsics in the shader flags analysis
rather than having to recover information from lowered operations.
Fixes #120119.
[mlir python] Port Python core code to nanobind. (#118583)
Why? https://nanobind.readthedocs.io/en/latest/why.html says it better
than I can, but my primary motivation for this change is to improve MLIR
IR construction time from JAX.
For a complicated Google-internal LLM model in JAX, this change improves
the MLIR
lowering time by around 5s (out of around 30s), which is a significant
speedup for simply switching binding frameworks.
To a large extent, this is a mechanical change, for instance changing
pybind11::
to
nanobind::
.Notes:
(Support overriding static properties defined via def_prop_ro_static. wjakob/nanobind#806) that landed in that
release.
be ported in a future PR.
in
PybindAdapters.h
. These ask pybind11 to try to form an overloadwith an existing method, but it's not possible to form mixed
pybind11/nanobind overloads this ways and the parent class is now
defined in nanobind. Better solutions may be possible here.
protocol support. It was not hard to add a nanobind implementation of a
similar API.
the input is a sequence of bool types, not truthy values. In a couple of
places I added code to support truthy values during casting.
nb::bytes
) from strings (e.g.,std::string
). This required nb::bytes overloads in a few places.Revert "[mlir python] Port Python core code to nanobind. (#118583)"
This reverts commit 41bd35b.
Breakage detected, rolling back.
[flang] Don't needlessly instantiate distinct UNSIGNED cases for FINDLOC (#120471)
The FINDLOC runtime doesn't need to distinguish between INTEGER and
UNSIGNED data, so use the code for INTEGER also for UNSIGNED.
[flang][cuda] Using nvvm intrinsics for the syncthread and threadfence families of calls (#120020)
[VPlan] Don't use VPlan ctor taking trip count in most unit tests (NFC).
Update tests to use constructor not passing a trip count VPValue. The
tests don't need that and are simpler as a result.
[libc++] Remove some unused includes (#120219)
[DirectX] TypedUAVLoadAdditionalFormats shader flag (#120477)
Set the TypedUAVLoadAddtionalFormats flag if the shader contains a load
from a multicomponent UAV.
Fixes #114557
[clang-format] Don't change breaking before CtorInitializerColon (#119522)
Don't change breaking before CtorInitializerColon with
ColumnLimit: 0
.Fixes #119519.
[clang-format] Fix a bug in annotating arrows after init braces (#119958)
Fixes #59066.
[MemProf] Skip unmatched callers when cloning (#120455)
Don't unnecessarily clone for a caller that wasn't matched to a call
instruction.
This necessitated updated a couple of tests that were either
unnecessarily cloning or unnecessarily processing an allocation and
hinting it not cold.
[MemProf] Add quotes around FileCheck pattern (#120481)
Some bots are failing with 2916352,
likely due to the escapes in the FileCheck pattern. Add extra quotes to
try to fix this.
E.g. https://lab.llvm.org/buildbot/#/builders/46/builds/9442
[llvm][Support] Use __NR_gettid on Linux for compat with older glibc (#120007)
[DirectX] Bug fix for Data Scalarization crash (#118426)
Two bugs here. First calling
Inst->getFunction()
has undefinedbehavior if the instruction is not tracked to a function. I suspect the
replaceAllUsesWith
was leaving the GEPs in a weird ghost parentsituation. I switched up the visitor to be able to
eraseFromParent
aspart of visiting and then everything started working.
The second bug was in
DXILFlattenArrays.cpp
. I was unaware that youcan have multidimensional arrays of
zeroinitializer
, andundef
sofixed up the initializer to handle these two cases.
fixes #117273
[mlir][bufferization]-Replace only one use in TensorEmptyElimination (#118958)
In many cases the emptyTensorElimination can not transform or eliminate
the empty tensor which is being inserted into the
SubsetInsertionOpInterface
.Two major reasons for that:
1- Failing when trying to find a legal/suitable insertion point for the
subsetExtract
which is about to replace the empty tensor. However, wemay try to handle this issue by moving the needed values which
responsible on building the
subsetExtract
nearby the empty tensor(which is about to be eliminated). Thus increasing the probability to
find a legal insertion point.
2-The EmptyTensorElimination transform replaces the tensor.empty's uses
all at once in one apply, rather than replacing only the specific use
which was visited in the use-def chain (when traversing from the
tensor.insert_slice). This scenario of replacing all the uses of the
tensor.empty may lead into additional read effects after bufferization
of the specific subset extract/subview which should not be the case.
Both cases may result in many copies in the coming bufferization which
can not be canonicalized.
The first case can be noticed when having a
tensor.empty
followed bySubsetInsertionOpInterface
(or in simple wordstensor.insert_slice
),which have been lowered from
tensor/tosa.concat
.The second case can be noticed when having a
tensor.empty
, with manyuses and leading to applying the transformation only once, since the
whole uses have been replaced at once.
The first commit in the PR only adds the lit tests for the cases shown
above (NFC), to emphasize how the transform works, in the coming MRs
will upload a slight changes to handle these case.
The second commit in this PR, we want to replace only the specific use
which was visited in the
use-def
chain (when traversing from thetensor.insert_slice
's source).[VPlan] Move initial VPlan block creation to constructor. (NFC)
This sets up the initial blocks needed to initialize a VPlan directly
in the constructor. This will allow tracking of all created blocks
directly in VPlan, simplifying block deletion.
[mlir] Add predicates to tablegen-defined properties (#120176)
Give the properties from tablegen a
predicate
field that holds thepredicate that the property needs to satisfy, if one exists, and hook
that field up to verifier generation.
[memprof] Undrift MemProfRecord (#120138)
This patch undrifts source locations in MemProfRecord before readMemprof
starts the matching process.
The thoery of operation is as follows:
Collect the lists of direct calls, one from the IR and the other
from the profile.
Compute the correspondence (called undrift map in the patch)
between the two lists with longestCommonSequence.
Apply the undrift map just before readMemprof consumes
MemProfRecord.
The new function gated by a flag that is off by default.
[SLP] Check if instructions exist after vectorization (#120434)
Fixes #120433.
[mlir][IR] Fix bug in AffineExpr simplifier
lhs % rhs
wherelhs = lhs floordiv rhs
(#119245)Fixes an issue where the
SimpleAffineExprFlattener
would simplifylhs % rhs
to just-(lhs floordiv rhs)
instead oflhs - (lhs floordiv rhs)
if
lhs
happened to be equal tolhs floordiv rhs
.The reported failure case was
(d0, d1) -> (((d1 - (d1 + 2)) floordiv 8) % 8)
from #114654.
Note that many paths that simplify AffineMaps (e.g. the AffineApplyOp
folder and canonicalization) would not observe this bug because of
of slightly different paths taken by the code. Slightly different
grouping of the terms could also result in avoiding the bug.
Resolves #114654.
[APINotes] Avoid assertion failure with expensive checks (#120487)
Found assertion failures when using EXPENSIVE_CHECKS and running lit
tests for APINotes:
Assertion `left.first != right.first && "two entries for the same
version"' failed.
It seems like std::is_sorted is verifying that the comparison function
is reflective (comp(a,a)=false) when using expensive checks. So we would
get callbacks to the lambda used for comparison, even for vectors with a
single element in APINotesReader::VersionedInfo::VersionedInfo, with
"left" and "right" being the same object. Therefore the assert checking
that we never found equal values would fail.
Fix makes sure that we skip the check for equal values when "left" and
"right" is the same object.
[Exegesis][RISCV] Add RISCV support for llvm-exegesis (#120467)
This patch also makes following amendments to core exegesis:
registers used as memory address in instruction.
* mattr - new option to pass a list of enabled target features
Llvm-exegesis RISCV port is a result of team effort. Below everyone
involved listed.
Co-authored-by: Konstantin Vladimirov
konstantin.vladimirov@syntacore.com
Co-authored-by: Dmitrii Petrov dmitrii.petrov@syntacore.com
Co-authored-by: Dmitry Bushev dmitry.bushev@syntacore.com
Co-authored-by: Mark Goncharov mark.goncharov@syntacore.com
Co-authored-by: Anastasiya Chernikova
anastasiya.chernikova@syntacore.com
Original pr: #89047
Co-authored-by: Kazu Hirata kazu@google.com
[AMDGPU][True16][MC] true16 for v_cvt_pknorm_i16/u16_f16 (#119605)
Support true16 format for v_cvt_pknorm_i16/u16_f16 in MC.
[AMDGPU][True16][MC] true16 for v_div_fixup_f16 (#119613)
Support true16 format for v_div_fixup_f16 in MC.
[AMDGPU][True16][MC] true16 for v_minmax/maxmin_f16 (#119586)
Support true16 format for v_minmax/maxmin_f16 in MC.
Since we are replacing
v_minmax/maxmin_f16
tov_minmax/maxmin_f16_t16 / v_minmax/maxmin_f16_fake16
in Post-GFX11, have to update the CodeGenpattern for
v_minmax/maxmin_f16
to get CodeGen test passing.[OpenACC] Implement 'wait' construct
The arguments to this are the same as for the 'wait' clause, so this
reuses all of that infrastructure. So all this has to do is support a
pair of clauses that are already implemented (if and async), plus create
an AST node. This patch does so, and adds proper testing.
[ubsan] Add runtime test for -fsanitize=local-bounds (#120038)
[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120464)
'-mllvm -ubsan-unique-traps'
(#65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).
N.B. we do not use "trap" in the argument name since
#119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).
This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
[Coverage] Resurrect Branch:FalseCnt in SwitchStmt that was pruned in #112694 (#120418)
I missed that FalseCnt for each Case was used to calculate percentage in
the SwitchStmt. At the moment I resurrect them.
In
!HasDefaultCase
, the pair of Counters shall be[CaseCountSum, FalseCnt]
. (Reversal of before #112694)I think it can be considered as the False count on SwitchStmt.
FalseCnt shall be folded (same as current impl) in the coming
SingleByteCoverage changes, since percentage would not make sense.
Allow
CoverageMapping::getCoverageForFile()
to show Branches also outside functions (#120416)Fixes #119952
Revert "[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120464)"
This reverts commit 7eaf470.
Reason: buildbot breakage (e.g.,
https://lab.llvm.org/buildbot/#/builders/144/builds/14299/steps/6/logs/FAIL__Clang__ubsan-trap-debugloc_c)
[llvm][CodeGen] Intrinsic
llvm.powi.*
code gen for vector arguments (#118242)Scalarize vector FPOWI instead of promoting the type. This allows the
scalar FPOWIs to be visited and converted to libcalls before promoting
the type.
FIXME: This should be done in LegalizeVectorOps/LegalizeDAG, but call
lowering needs the unpromoted EVT.
Without this patch, in some backends, such as RISCV64 and LoongArch64,
the i32 type is illegal and will be promoted. This causes exponent type
check to fail when ISD::FPOWI node generates a libcall.
Fix #118079
Revert "[driver] Fix sanitizer libc++ runtime linking (#120370)"
This reverts commit 9af5de3.
Reason: buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/24/builds/3394/steps/10/logs/stdio)
"Unexpectedly Passed Tests (1):
llvm-libc++-shared.cfg.in :: libcxx/language.support/support.dynamic/libcpp_deallocate.sh.cpp"
Reapply "[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120…464)" (#120511)
This reverts commit 2691b96. This
reapply fixes the buildbot breakage of the original patch, by updating
clang/test/CodeGen/ubsan-trap-debugloc.c to specify -fsanitize-merge
(the default, which is merge, is applied by the driver but not
clang_cc1).
This reapply also expands clang/test/CodeGen/ubsan-trap-merge.c.
Original commit message:
'-mllvm -ubsan-unique-traps'
(#65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).
N.B. we do not use "trap" in the argument name since
#119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).
This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
[flang][cuda] Allocate descriptor in managed memory when emboxing device memory (#120485)
When emboxing memory that comes from CUFMemAlloc, we need to allocate
the descriptor in manage memory as it might be passed to a kernel.
[gn] port 8e8692a (RISCV support for llvm-exegesis)
[mlir python] Port Python core code to nanobind. (#120473)
Relands #118583, with a fix for Python 3.8 compatibility. It was not
possible to set the buffer protocol accessers via slots in Python 3.8.
Why? https://nanobind.readthedocs.io/en/latest/why.html says it better
than I can, but my primary motivation for this change is to improve MLIR
IR construction time from JAX.
For a complicated Google-internal LLM model in JAX, this change improves
the MLIR
lowering time by around 5s (out of around 30s), which is a significant
speedup for simply switching binding frameworks.
To a large extent, this is a mechanical change, for instance changing
pybind11::
tonanobind::
.Notes:
(Support overriding static properties defined via def_prop_ro_static. wjakob/nanobind#806) that landed in that
release.
be ported in a future PR.
in
PybindAdapters.h
. These ask pybind11 to try to form an overloadwith an existing method, but it's not possible to form mixed
pybind11/nanobind overloads this ways and the parent class is now
defined in nanobind. Better solutions may be possible here.
protocol support. It was not hard to add a nanobind implementation of a
similar API.
the input is a sequence of bool types, not truthy values. In a couple of
places I added code to support truthy values during casting.
nb::bytes
) from strings (e.g.,std::string
). This required nb::bytes overloads in a few places.[RISCV] Custom legalize vp.merge for mask vectors. (#120479)
The default legalization uses vmslt with a vector of XLen to compute a
mask. This doesn't work if the type isn't legal. For fixed vectors it
will scalarize. For scalable vectors it crashes the compiler.
This patch uses an alternate strategy that promotes the i1 vector to an
i8 vector and does the merge. I don't claim this to be the best
lowering. I wrote it quickly almost 3 years ago when a crash was
reported in our downstream.
Fixes #120405.
[Sema] Fix tautological bounds check warning with -fwrapv (#120480)
The tautological bounds check warning added in #120222 does not take
into account whether signed integer overflow is well defined or not,
which could result in a developer removing a bounds check that may not
actually be always false because of different overflow semantics.
[ADT] Add a unittest for the ScopedHashTable class (#120183)
The ScopedHashTable class is particularly used to develop string tables
for parsers and code convertors. For instance, the MLIRGen class from the
toy example for MLIR actively uses this class to define scopes for
declared variables. To demonstrate common use cases for the
ScopedHashTable class as well as to check its behavior in different
situations, the unittest has been added.
Signed-off-by: Pavel Samolysov samolisov@gmail.com
[gn build] Port 1cc926b
[clang-format] Fix a crash caused by commit f03bf8c
[ADT] Fix warnings
This patch fixes warnings of the form:
llvm/unittests/ADT/ScopedHashTableTest.cpp:41:20: error:
'ScopedHashTableScope' may not intend to support class template
argument deduction [-Werror,-Wctad-maybe-unsupported]
[SelectionDAG] Rename SDNode::uses() to users(). (#120499)
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.
I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
[Coroutines][Docs] Add a discussion on the handling of certain parameter attribs (#117183)
ByVal arguments and Swifterror require special handling in the coroutine
passes. The goal of this section is to provide a description of how
these parameter attributes are handled.
[RISCV] Add software pipeliner support (#117546)
This patch adds basic support of
MachinePipeliner
and disableit by default.
The functionality should be OK and all llvm-test-suite tests have
passed.
[Clang] Don't assume unexpanded PackExpansions' size when expanding packs (#120380)
CheckParameterPacksForExpansion() previously assumed that template
arguments don't include PackExpansion types when attempting another pack
expansion (i.e. when NumExpansions is present). However, this assumption
doesn't hold for type aliases, whose substitution might involve
unexpanded packs. This can lead to incorrect diagnostics during
substitution because the pack size is not yet determined.
To address this, this patch calculates the minimum pack size (ignoring
unexpanded PackExpansionTypes) and compares it to the previously
expanded size. If the minimum pack size is smaller, then there's still a
chance for future substitution to expand it to a correct size, so we
don't diagnose it too eagerly.
Fixes #61415
Fixes #32252
Fixes #17042
[SelectionDAG] Replace findGlueUse in SelectionDAGISel with SDNode::getGluedUser. NFC (#120512)
[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509)
Most of these are just places that want the first user and aren't
iterating over the whole list.
While there I changed some use_size() == 1 to hasOneUse() which
is more efficient.
This is part of an effort to rename use_iterator to user_iterator
and provide a use_iterator that dereferences to SDUse&. This patch
helps reduce the diff on later patches.
[RISCV][MCA] Move sifive-x280 tests to directory SiFiveX280 (#120522)
[llvm-mc] --no-exec-stack: replace initSection with switchSection. NFC
AsmParser will call initSection unless -n is specified.
It is not good to call initSection twice.
[NFC] Move DroppedVariableStats code to Analysis (#120502)
This is done because the CodeGen library and Passes library both link
against Analysis, to avoid adding a dependency between CodeGen and
Passes if we want to extend the DroppedVariableStats code for MIR stats
as well, as seen in #120501
[gn build] Port e389492
[LLVM] Update BPF maintainer (#120429)
Nowadays yonghong-song and eddyz87 are more involved with LLVM
BPF development than 4ast, so update the maintainer list to reflect
this.
[LLVM] Move Bigcheese to inactive maintainer for Windows object tools (#120425)
Bigcheese isn't actively working on Windows support in object tools
anymore, so move him to the inactive maintainer list. I'm also not
aware of anyone else who is actively involved in this area currently,
so I'm dropping the category entirely for now.
[LLVM] Update maintainers for binary utilities (#120428)
We currently list jakehehrlich as the maintainer for llvm-objcopy /
ObjCopy, but he hasn't been involved with LLVM for more than 5 years.
Convert the llvm-object category into a broader binary utilities
category and add jh7370 and MaskRay as the new maintainers.
Add a pass to collect dropped var stats for MIR (#120501)
Reland "Add a pass to collect dropped var stats for MIR" (#117044)
I am trying to reland #115566
I also moved the DroppedVariableStats code to the Analysis lib
This is part of a stack of patches with
#120502 being the first one in
the stack
Revert "Add a pass to collect dropped var stats for MIR (#120501)"
This reverts commit 223c764.
Reverted due to vuildbot failure:
flang-aarch64-libcxx
Linking CXX shared library lib/libLLVMAnalysis.so.20.0git
FAILED: lib/libLLVMAnalysis.so.20.0git
[RISCV] Add scheduling model for mips p8700 CPU (#119885)
Depends on #119882.
[LLVM] Update ADT/Support maintainers (#120423)
Nominate dwblaikie and kuhar as new maintainers for ADT/Support,
replacing chandlerc.
Revert "[RISCV] Add scheduling model for mips p8700 CPU" (#120537)
Reverts #119885
llvm-project/llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td:20:5:
error: Processor does not define resources for WriteFCvtF32ToF16
def MIPSP8700Model : SchedMachineModel {
[TOSA] Don't run validation pass on non TOSA operations (#120205)
This commit ensures the validation pass is not run on operations from
other dialects. In doing so, operations from other dialects that, for
example, use types not supported by TOSA don't result in an error.
Signed-off-by: Luke Hutton luke.hutton@arm.com
Reapply "[driver] Fix sanitizer libc++ runtime linking (#120370)" (#120538)
Reland without item 2 from #120370 to avoid breaking libc++ tests.
This reverts commit 60a2f32.
[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#120449)
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.
This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.
Fixes #111293.
[lldb][AIX] clang-format changes for ProcessLauncherPosixFork.cpp (#120459)
This PR is in reference to porting LLDB on AIX.
Link to discussions on llvm discourse and github:
The complete changes for porting are present in this draft PR:
Extending LLDB to work on AIX #102601
Added clang-format changes for ProcessLauncherPosixFork.cpp which will
be followed by ptrace changes in:
[AArch64][SME2] Extend getRegAllocationHints for ZPRStridedOrContiguousReg (#119865)
ZPR2StridedOrContiguous loads used by a FORM_TRANSPOSED_REG_TUPLE
pseudo should attempt to assign a strided register to avoid unnecessary
copies, even though this may overlap with the list of SVE callee-saved registers.
[PAC][MC][ELF][AArch64] Support signed TLSDESC (#120010)
Support the following relocations and assembly operators:
R_AARCH64_AUTH_TLSDESC_ADR_PAGE21
(:tlsdesc_auth:
foradrp
)R_AARCH64_AUTH_TLSDESC_LD64_LO12
(:tlsdesc_auth_lo12:
forldr
)R_AARCH64_AUTH_TLSDESC_ADD_LO12
(:tlsdesc_auth_lo12:
foradd
)[X86] LowerShift - directly initialize SmallVector with build vector operands. NFC.
Don't push_back the operands separately.
[X86] ExtendToType - directly initialize SmallVector with build vector operands. NFC.
Don't push_back the operands separately.
[PS5][Driver] Pass user search paths to linker before implict ones (#119875)
Responsibility for setting up implicit library search paths was recently
transferred to the PS5 driver (#109796). Prior to this, SIE private
patches in lld performed this function. During the transition, I failed
to maintain the order in which implicit and user-supplied search paths
were supplied/considered. This change ensures user-supplied search paths
appear before any implicit ones on the link line.
SIE tracker: TOOLCHAIN-17490
[bazel] port 79e859e
Revert "[NFC] Move DroppedVariableStats code to Analysis (#120502)"
that introduces a circular dependency of analysis -> codegen -> target
This reverts commit e389492.
[gn build] Port cffe22a
[LoopVectorize] Use new single string variant of reportVectorizationFailure (#120414)
[AArch64] Tweak truncate costs for some scalable vector types (#119542)
== We were previously returning an invalid cost when truncating
anything to <vscale x 2 x i1>, which is incorrect since we can
generate perfectly good code for this.
== The costs for truncating legal or unpacked types to predicates
seemed overly optimistic. For example, when truncating
<vscale x 8 x i16> to <vscale x 8 x i1> we typically do
something like
and z0.h, z0.h, #0x1
cmpne p0.h, p0/z, z0.h, #0
I guess it might depend upon whether the input value is
generated in the same block or not and if we can avoid the
inreg zero-extend. However, it feels safe to take the more
conservative cost here.
== The costs for some truncates such as
trunc <vscale x 2 x i32> %a to <vscale x 2 x i16>
were 1, whereas in actual fact they are free and no instructions
are required.
== Also, for this
trunc <vscale x 8 x i32> %a to <vscale x 8 x i16>
it's just a single uzp1 instruction so I reduced the cost to 1.
In general, I've added costs for all cases where the destination
type is legal or unpacked. One unfortunate side effect of this
is the costs for some fixed-width truncates when using SVE now
look too optimistic.
[ARM] Fix BF16 lowering with FullFP16
This adds test coverage for bf16 instructions, making sure that lowering bf16
works with and without +fullfp16.
[Clang] Fix crash in __builtin_assume_aligned (#114217)
The CodeGen for __builtin_assume_aligned assumes that the first argument
is a pointer, so crashes if the int-conversion error is downgraded or
disabled. Emit a non-downgradable error if the argument is not a
pointer, like we currently do for __builtin_launder.
Fixes #110914.
[AArch64] Fixup destructive floating-point precision conversions (#118788)
This patch changes the zeroing forms of
FCVTXNT
,FCVTNT
, andBFCVTNT
such that their destination operand is also listed as a daginput. These narrowing down-conversions leave the even elements of the
destination vector unchanged, regardless of the predicate type.
This patch also makes the merging form of
BFCVTNT
non-movprfx'able.FCVTXNT
- ArmDeveloper
FCVTNT
- ArmDeveloper
BFCVTNT
- ArmDeveloper
[LLParser] Remove redundant code (NFC) (#120478)
ARM: Handle vldrh and vstrh in stack access hooks (#120527)
[AMDGPU] Remove unneeded use of !dag. NFC. (#120546)
[analyzer][NFC] Introduce APSIntPtr, a safe wrapper of APSInt (1/4) (#120435)
One could create dangling APSInt references in various ways in the past, that were sometimes assumed to be persisted in the BasicValueFactor.
One should always use BasicValueFactory to create persistent APSInts, that could be used by ConcreteInts or SymIntExprs and similar long-living objects.
If one used a temporary or local variables for this, these would dangle.
To enforce the contract of the analyzer BasicValueFactory and the uses of APSInts, let's have a dedicated strong-type for this.
The idea is that APSIntPtr is always owned by the BasicValueFactory, and that is the only component that can construct it.
These PRs are all NFC - besides fixing dangling APSInt references.
[LoopVectorizer] Add support for partial reductions (#92418)
Following on from #94499, this
patch adds support to the Loop Vectorizer to emit the partial reduction
intrinsics where they may be beneficial for the target.
Co-authored-by: Samuel Tebbs samuel.tebbs@arm.com
[analyzer][NFC] Migrate nonloc::ConcreteInt to use APSIntPtr (2/4) (#120436)
[analyzer][NFC] Migrate loc::ConcreteInt to use APSIntPtr (3/4) (#120437)
[analyzer][NFC] Migrate {SymInt,IntSym}Expr to use APSIntPtr (4/4) (#120438)
[clang] NFC, simplify the shouldLifetimeExtendThroughPath.
[FMV][AArch64] Emit mangled default version if explicitly specified. (#120022)
Currently we need at least one more version other than the default to
trigger FMV. However we would like a header file declaration
attribute((target_version("default"))) void f(void);
to guarantee that there will be f.default
[X86] Put R20/R21/R28/R29 later in GR64 list (#120510)
Because these registers require an extra byte to encode in certain
memory form. Putting them later in the list will reduce code size when
EGPR is enabled. And align the same order in GR8, GR16 and GR32 lists.
Example:
[analyzer] Handle [[assume(cond)]] as __builtin_assume(cond) (#116462)
Resolves #100762
Gist of the change:
logic was already present, but the previous code didn't "visit" the
expressions within
assume()
by parsing those expressions, all of thecode "just works" by evaluating the SVals, and hence leaning on the same
logic that makes the code with
__builtin_assume
worksimilar to CGStmt.cpp (todo add link))
CFG.cpp's VisitGuardedExpr code for
continue
-ing if theProgramPoint
is a
StmtPoint
Co-authored-by: Balazs Benics benicsbalazs@gmail.com
[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561)
If the shuffle split results in referencing a single legalised whole vector (i.e. no permutation), then this can be treated as free.
We already do something similar for broadcasts / whole subvector insertion + extraction - its purely an issue for register allocation.
[AArch64][SVE] Use SVE for scalar FP converts in streaming[-compatible] functions (1/n) (#118505)
In streaming[-compatible] functions, use SVE for scalar FP conversions
to/from integer types. This can help avoid moves between FPRs and GRPs,
which could be costly.
This patch also updates definitions of SCVTF_ZPmZ_StoD and
UCVTF_ZPmZ_StoD to disallow lowering to them from ISD nodes, as doing so
requires creating a [U|S]INT_TO_FP_MERGE_PASSTHRU node with inconsistent
types.
Follow up to #112213.
Note: This PR does not include support for f64 <-> i32 conversions (like
#112564), which needs a bit more work to support.
Reland "[RISCV] Add scheduling model for mips p8700 CPU" (#120550)
This patch introduces a scheduling model for the MIPS p8700, an
out-of-order
RISC-V processor. The model includes pipelines for the following units:
For additional details, refer to the official product page:
https://mips.com/products/hardware/p8700/.
Also adds
UnsupportedSchedZfhmin
to handle cases likeWriteFCvtF16ToF32
thatpreviously caused build failures.
[Clang][AArch64] Add signed index/offset variants of sve2p1 qword stores (#120549)
This patch adds signed offset/index variants to the SVE2p1 quadword
store intrinsics, in accordance with
ARM-software/acle#359.
[lldb][AIX] GetOpt support in AIX (#120574)
This PR is in reference to porting LLDB on AIX.
Link to discussions on llvm discourse and github:
The complet