Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from llvm:main #56

Merged
merged 61 commits into from
Dec 10, 2024
Merged

[pull] main from llvm:main #56

merged 61 commits into from
Dec 10, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented Dec 10, 2024

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

nikic and others added 30 commits December 10, 2024 12:52
I believe that this code doesn't care whether the offsets are known to
be inbounds a priori. For the same reason the change is not testable, as
the SCEV based fallback code will look through non-inbounds offsets
anyway. So make it clear that there is no special inbounds requirement
here.
I'm exploring marking microsoft/STL's std::expected as [[nodiscard]],
which affects all functions returning std::expected, including its
own monadic member functions.

As usual, libc++'s test suite contains calls to these member functions
to make sure they compile, but it's discarding the returns. I'm adding
void casts to silence the [[nodiscard]] warnings without altering
what the test is covering.
…ializers in nested anonymous unions (#113049)

Fixes: #112560

This PR create an RecoveryExpr for invalid in-class-initializer.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
…huffle operand. (#119354)

foldShuffleOfShuffles already handles "shuffle (shuffle x, undef), (shuffle y, undef)" patterns, this patch relaxes the requirement so it can handle cases where only a single operand is a shuffle (and the other can be any other value and will be kept in place).

Fixes #86068
)

This patch adds the following intrinsics:
* 8-bit floating-point convert to half-precision and BFloat16.

  // Variants are also available for: _bf16
  svfloat16_t svcvt1_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
  svfloat16_t svcvt2_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);

* 8-bit floating-point convert to half-precision and BFloat16 (top).

  // Variants are also available for: _bf16
  svfloat16_t svcvtlt1_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
  svfloat16_t svcvtlt2_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
This patch supplements the fix introduced by PR #119319.
hlfir.elemental codegen optimize-out the final as_expr copy for temps
local to its body, but sometimes, clean-up may have been emitted for
this temp, and the code did not handle that.
This caused #118922 and @113843.

Only elide the copy if the as_expr is the last op.
#119366)

…igibilityType

This is the function we use to diagnose invalid types, so use it for
those checks as well.

NFC.
When expanding a load into two loads, use nuw for the add that computes
the offset from the base of the second load, because the original load
doesn't straddle the address space.

It turns out there's already a dedicated helper function for doing this,
`getObjectPtrOffset`.

This is in target-independent code, however in practice it only seems to
affact WebAssembly code, because WebAssembly load and store
instructions' constant offsets don't perform wrapping, so constant
folding often depends on the nuw flag being present.

This was noticed in the development of #119204.
)

Also switch them over to the new depot runners.
…19320)

This patch limits the permission of pre-commit libc pipelines. It also
adds detailed comments to help future modifications.
Sven has been a long-standing contributor to OpenCL support in Clang and
LLVM, consistently delivering high-quality commits and thorough code
reviews. His deep expertise and proven track record demonstrate his
commitment to advancing the project and maintaining its standards. I
strongly believe Sven would excel as an OpenCL maintainer for Clang,
ensuring the continued growth and reliability of OpenCL within the LLVM
ecosystem.

Unfortunately, due to other commitments I am stepping down from my duty
as an OpenCL maintainer.

Co-authored-by: Anastasia Stulova <astulova@nvidia.com>
Unfortunately, due to other commitments I am no longer able to host this
community meeting.

Co-authored-by: Anastasia Stulova <astulova@nvidia.com>
Including The frozen C++03 headers results in a lot of formatting
changes in the main headers, so this splits these changes into a
separate commit instead.

This is part of
https://discourse.llvm.org/t/rfc-freezing-c-03-headers-in-libc.
We've been having issues with cancelled CI jobs due to Docker-in-Docker
failures (spurious). I tried addressing that by adding a new flavor of
the restarter action, but that clearly hasn't worked out since we see
a lot of these spurious failures.

This PR is an attempt to fix this by changing the mainline CI restarter
used by libc++, not just the workflow job I added for testing. I have to
check this in to test it because this workflow uses the version that's
on the main branch.
…info [NFC] (#119135)

The `poison` values are used to substitute debug information of values
moved from the original header into the preheader that are no longer
available in the former.
They have been out for over 10 days, which only causes confusion
when looking at CI results.
…8189)

This patch fixes a const-qualification on the return type of a method
of `limited_allocator`, which is widely used for testing allocator-aware
containers.
…115713)

* Avoid unnecessary truncation of comparison results in vecreduce_xor
* Optimize generated code for vecreduce_and and vecreduce_or by
comparing against 0.0 to check if all/any of the values are set

Alive2 proof of vecreduce_and and vecreduce_or transformation:
https://alive2.llvm.org/ce/z/SRfPtw
Add getMainOp and getAltOp.
Use `InstructionsState &` instead of `const InstructionsState &`.
Use `!S.isAltShuffle()` instead of `S.MainOp == S.AltOp`.
amy-kwan and others added 28 commits December 10, 2024 11:11
This PR emits implements the ability to emit the PPC version for both
assembly and object files on AIX.
… the VSX FMA mutation pass. (#116071)

The patch fix #116061

The root cause of the assertion is that the FMA mutation pass does not
update the subranges of the live interval for the defined register of
the modified instruction .

it recalculate the live interval of the defined register of xvmaddmdp in
the VSX FMA mutation pass.
…patterns of legalized types (#119363)

In cases where the base/sub vector type in an insert_subvector pattern legalize to the same width through splitting, we can assume that the shuffle becomes free as the legalized vectors will not overlap.

Note this isn't true if the vectors have been widened during legalization (e.g. v2f32 insertion into v4f32 would legalize to v4f32 into v4f32).

Noticed while working on adding processShuffleMasks handling for SK_PermuteTwoSrc.
This patch simplifies the code in two different ways:
* When SVE is available, return `cntd` directly to avoid the need for
bitfield insert.
* When SME is available, check the PSTATE.SM bit of `SVCR` directly
rather than calling `__arm_sme_state`.
Enable `-fstack-clash-protection` for RISCV and stack probe for function
prologues.
We probe the stack by creating a loop that allocates and probe the stack
in ProbeSize chunks.
We emit an unrolled probe loop for small allocations and emit a variable
length probe loop for bigger ones.
Adds initial CMake cache definition that is similar to what we use in
one of our production buidlbots. The goal is to consolidate the
configurations and make them accessible.
This cache file is a first step and to prepare for full pipeline testing
once the new bot comes online.
…p to strings.h (#118899)

docgen relies on the convention that we have a file foo.cpp in
libc/src/\<header\>/. Because the above functions weren't in libc/src/strings/
but rather libc/src/string/, docgen could not find that we had implemented
these.

Rather than add special carve outs to docgen, let's fix up our sources for
these 7 functions to stick with the existing conventions the rest of the
codebase follows.

Link: #118860
Fixes: #118875
So that docgen can find our implementations.

Fixes: #119272
so that docgen can find our definitions.

Also eliminate the enums. POSIX is careful to call these "symbolic constants"
rather than specifically whether they are preprocessor macro defines or not.
Enums are useful to expressing mutual exclusion when the enum values are in
distinct enums which can improve type safety. Our enum values weren't using
that pattern though; they were all in one big anonymous enum.

Link:
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/pthread.h.html
Fixes: #88997
…more places

We've decided to use vadd as the user instruction where possible for simplicity
in this file. This patch uses vadd as user in more places.
…opt (#119372)

Adds a new mlir-opt test-only pass, -test-vulkan-runner-pipeline, which
runs a set of passes needed for mlir-vulkan-runner, and removes them
from the runner. The tests are changed to invoke mlir-opt with this flag
before invoking the runner. The passes moved are ones concerned with
lowering of the device code prior to serialization to SPIR-V. This is an
incremental step towards moving the entire pipeline to mlir-opt, to
align with other runners (see #73457).
To temporarily address exchange2 perf regression reported in #118556
I disabled the inlining by default, and put it under engineering
option `-flang-simplify-hlfir-sum`.
Clang [defaults to aligning `__int128_t` to 16 bytes], while LLVM
`datalayout` strings [default to aligning `i128` to 8 bytes]. Wasm is
currently using the defaults for both, so it's inconsistent. Fix this by
adding `-i128:128` to Wasm's `datalayout` string so that it aligns
`i128` to 16 bytes too.

This is similar to
[dbad963](dbad963)
for SPARC.

This fixes rust-lang/rust#133991; see that issue for further discussion.

[defaults to aligning `__int128_t` to 16 bytes]:
https://github.com/llvm/llvm-project/blob/f8b4182f076f8fe55f9d5f617b5a25008a77b22f/clang/lib/Basic/TargetInfo.cpp#L77
[default to aligning `i128` to 8 bytes]:
https://llvm.org/docs/LangRef.html#langref-datalayout
A linkonce_odr definition can be omitted in LTO compilation if
`canBeOmittedFromSymbolTable()` is true in all bitcode files.
Test more linkage merge scenarios.

The lo_and_wo symbol tests #111341.
Update root file in DWARF file/line table as soon as we see the first
"#line" directive.

This was moved from "enabledGenDwarfForAssembly", which is called right
before we emit DWARF information. But if the file is empty or contains
expressions that doesn't need DWARF, it is never called, leaving an
original root file and not the file in the "#line" directive.

Add a test checking for this case.

Fixes: #119020
Add tests for the concatenation of boolean vectors bitcast to integers - similar to the MOVMSK pattern.
This PR adds default option below. The new options will come as default
to true and not change the original lowering behavior of pack and unpack
op.
 - lowerPadLikeWithInsertSlice to packOp (with default = true)
 - lowerUnpadLikeWithExtractSlice to unPackOp (with default = true)

The motivation of the PR is finer granular control of the lowering of
pack and unpack Ops. This is useful in particular when we want to
guarantee that there's no additional insertslice and extractslice that
interfere with tiling. With the original lowering pipeline, packOp and
unPackOp may be lowered to insertslice and extractslice when the high
dimensions are unit dimensions and no transpose is invovled. Under such
circumstances, such insert and extract slice ops will block
producer/consumer fusion tile + fuse transforms. With this PR, we will
be able to disable such lowering path and allow consumer fusion to go
through as expected.
Add support for type summaries embedded into the binary.

These embedded summaries will typically be generated by Swift macros,
but can also be generated by any other means.

rdar://115184658
Compared to the python version, this also does type checking and error
handling, so it's slightly longer, however, it's still comfortably
under 500 lines.
MRI is already available where this is instantiated.
When checking for the poison elements in the matches node, need to
consider the register number, when clearing the corresponding mask
element.

Fixes #119393
@pull pull bot added the ⤵️ pull label Dec 10, 2024
@pull pull bot merged commit 0469bb9 into optimizecompile:main Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.