Skip to content

Conversation

@pratlucas
Copy link
Contributor

  • Bump version to 20.1.0git (#125067)
  • [SCEV] Check correct value for UB (#124302)
  • workflows/release-binaries: Stop using ccache (#124415)
  • workflows/premerge: Add macOS testing for release branch (#124303)
  • workflows/premerge: Fix condition for macos job (#125237)
  • workflows/premerge: Enable macos builds (#125176)
  • [GlobalISel] Do not run verifier after ResetMachineFunctionPass (#124799)
  • [flang] Build fix (#125087)
  • *[flang] Fix building on aarch64 BSD and musl libc after 9d8dc45 (#125183)
  • [analyzer][docs] CSA release notes for clang-20 (#124798)
  • [AArch64] Add MSVC mangling for the __mfp8 type (#124968)
  • [CodeGenPrepare] Replace deleted ext instr with the promoted value. (#71058)
  • Fix false negative when value initializing a field annotated with [[clang::require_field_initialization]] (#124329)
  • [AArch64] PAUTH_PROLOGUE should not be duplicated with PAuthLR (#124775)
  • Allow 'inline' on some declarations in MS compatibility mode (#125250) (#125275)
  • [clang][SME] Account for C++ lambdas in SME builtin diagnostics (#124750)
  • [libcxx] Use _ftelli64/_fseeki64 on Windows (#123128)
  • release/20.x: [Clang][ReleaseNotes] Document -fclang-abi-compat=19 re: #110503 (#125368)
  • Set version to 20.1.0-rc1 (#125367)
  • workflows/premerge: Cancel in progress jobs when a PR is merged (#125329) (#125588)
  • workflows/release-tasks: Re-use release-binaries-all workflow (#125378)
  • [AArch64] Enable vscale_range with +sme (#124466)
  • [AArch64] Disallow vscale x 1 partial reductions (#125252)
  • [offload] gnu::format with variadic template functions is Clang-only (#124406)
  • [offload] [test] Use test compiler ID rather than host (#124408)
  • [asan][test] Attempt to fix suppressions-alloc-dealloc-mismatch.cpp on Darwin (#124987)
  • [asan][android] XFAIL suppressions-alloc-dealloc-mismatch
  • [asan][test] Disable suppressions-alloc-dealloc-mismatch.cpp on Darwin
  • [VPlan] Only use SCEV for live-ins in tryToWiden. (#125436)
  • workflows/release-binaries: Enable PGO (#124442)
  • [flang][runtime] Make sure to link libexecinfo if it exists (#125344)
  • [clang] Support member function poiners in Decl::getFunctionType() (#125077)
  • [FMV][AArch64] Release notes for LLVM20. (#125525)
  • workflows/premerge: Re-enable tests (#125978)
  • [RISCV] Use getSignedConstant for negative values. (#125903)
  • [X86][AVX10] Disable m[no-]avx10.1 and switch m[no-]avx10.2 to alias of 512 bit options (#124511) (#125057)
  • [InstCombine] Fix FMF propagation in foldSelectWithFCmpToFabs (#121580)
  • [llvm] Add CMake flag to compile out the telemetry framework (#124850)
  • [CMake] Fix typo in docstring: telemtry -> telemetry (NFC)
  • [lldb] Add support for gdb-style 'x' packet (#124733)
  • Add info about the gdb x packet into the release notes (#125680)
  • [libc++][TZDB] Fixes %z escaping. (#125399)
  • fix: removes invalid token from LLVM_VERSION_SUFFIX in LIBC namespace (#126193)
  • [TableGen] Reduce size of MatchTableRecord (NFC) (#125221)
  • [TableGen] Don't use inline storage for ReferenceLocs (NFC) (#125231)
  • [clang] Stop parsing warning suppression mappings in driver (#125722)
  • [flang][Driver] When linking with the Fortran runtime also link with libexecinfo (#125998)
  • [LoopVectorize] Fix cost model assert when vectorising calls (#125716)
  • [LoopVectorize] Fix build error (#126218)
  • [libclc] Allow default path when looking for llvm-spirv (#126071)
  • workflows/premerge: Move concurrency definition to workflow level (#126308)
  • [X86] Do not combine LRINT and TRUNC (#125848)
  • [clang] Parse warning-suppression-mapping after setting up diagengine (#125714)
  • [Offload] Stop the RPC server faiilng with more than one GPU (#125982)
  • [LLD][ELF][AArch64] Discard .ARM.attributes sections (#125838)
  • [AArch64] Update feature dep. for Armv9.6 extensions (#125874)
  • [AArch64] Enable AvoidLDAPUR for cpu=generic between armv8.4 and armv9.3. (#125261)
  • [mlir][CMake] Fix dependency on MLIRTestDialect in Transforms tests (#125894)
  • [libc++] Replace __is_trivially_relocatable by is_trivially_copyable (#124970)
  • Allow 128-bit discriminants in DWARF variants (#125578)
  • Fix llvm/test/DebugInfo/Generic/discriminated-union.ll on big-endian targets (#125849)
  • [C++20][Modules][Serialization] Delay marking pending incomplete decl chains until the end of finishPendingActions. (#121245)
  • [ORC] Fix file comment formatting. NFC.
  • [ORC] Drop 'Info' from MachOCompactUnwindInfoSectionName.
  • [ORC] Add minimal-throw-catch.ll regression test for lli -jit-mode=orc.
  • [ORC] Actually use -jit-kind=orc for the new minimal-throw-catch.ll test.
  • [ORC] Rename MachOCompactUnwindSectionName to MachOUnwindInfoSectionName.
  • [ORC] Fix eh-frame record target finding in MachOPlatform.
  • [ORC] Moch MachOPlatform unwind-info fixes.
  • Re-reapply "[ORC] Enable JIT support for the compact-unwind..." with fixes.
  • [ORC-RT] Add a comment explaining the purpose of this testcase. NFC.
  • [ORC] Fix buggy calculation of second-level-page offset in unwind-info.
  • [JITLink] Add a jitlink::Symbol::getSection() convenience method.
  • [JITLink] Handle compact-unwind records that depend on DWARF FDEs.
  • [JITLink] Add missing testcase for compact-unwind-needs-dwarf.
  • [ORC-RT] Use templates to express deeply nested function calls in testcase.
  • [ORC] Add a FIXME. NFC.
  • [ORC] Add ExecutionSession convenience methods to access bootstrap values.
  • [ORC] Force eh-frame use for older Darwins on x86-64 in MachOPlatform, LLJIT.
  • [Mips] Use getSignedConstant() in or combine
  • [ATfL] Stop using timezone database in our build of libc++ ([ATfL] Stop using timezone database in our build of libc++ #77)
  • [ATfL] Fix LOG_DIR vs. LOGS_DIR inconsistency ([ATfL] Fix LOG_DIR vs. LOGS_DIR inconsistency #78)
  • [ATfE] Add scripts to build newlib and llvmlibc overlay packages ([ATfE] Add scripts to build newlib and llvmlibc overlay packages #81)
  • [ATfE] Use python executable path for picolibc tests ([ATfE] Use python executable path for picolibc tests #80)
  • [SystemZ] Replace SELRMux with COPY in case of identical operands. (#125108)
  • [VPlan] Check VPWidenIntrinsicSC in VPRecipeWithIRFlags::classof.
  • [RISCV] Check isFixedLengthVector before calling getVectorNumElements in getSingleShuffleSrc. (#125455)
  • [C++20] [Modules] Don't diagnose duplicated friend declarations between modules incorrectly
  • [mlir][cmake] Add missing MLIRTestDialect dependencies
  • [ARM] Empty structs are 1-byte for C++ ABI (#124762)
  • [llvm-objcopy] Fix prints wrong path when dump-section output path doesn't exist (#125345)
  • [LLVM][Support] Add new CreateFileError functions (#125906)
  • [clang] Expose -f(no-)strict-overflow as a clang-cl option (#126512)
  • [AArch64] Fix op mask detection in performZExtDeinterleaveShuffleCombine (#126054)
  • [LV] Forget LCSSA phi with new pred before other SCEV invalidation. (#119897)
  • [ELF] --package-metadata: support %[0-9a-fA-F][0-9a-fA-F]
  • [flang] Use clang_target_link_libraries() for clang dependency (#126037)
  • [InstSimplify] Add additional checks when substituting pointers (#125385)
  • [benchmark] Get number of CPUs with sysconf() on Linux (#125603)
  • [libc++] Also provide an alignment assumption for vector in C++03 mode (#124839)
  • [CG][RISCV]Fix shuffling of odd number of input vectors
  • [AArch64][SME] Spill p-regs as z-regs when streaming hazards are possible (#123752)
  • [AArch64][SME] Reduce ptrue count when filling p-regs from z-regs (#125523)
  • [libc++] Make benchmarks dry-run by default on the release branch
  • Revert "[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis." (#124962)
  • [clang] Handle f(no-)strict-overflow, f(no-)wrapv, f(no-)wrapv-pointer like gcc (#126524)
  • [PAC] Do not support some values of branch-protection with ptrauth-returns (#125280)
  • [mlir] Fix MLIRTestDialect dependency in MLIRTestIR
  • [flang] Move FIRSupport dependency to correct place (#125697)
  • [flang][cmake] Fix bcc dependencies (#125822)
  • [ATfL] Add ability to specify OS name, and update help text ([ATfL] Add ability to specify OS name, and update help text #88)
  • [Clang] Fix __{add,remove}_pointer in Objective-C++ (#123678)
  • [Clang] Add width handling for <gpuintrin.h> shuffle helper (#125896)
  • [Clang] Fix test after new argument was added
  • [BOLT,test] Link against a shared object to test PLT (#125625)
  • [Offload] Properly guard modifications to the RPC device array (#126790)
  • Fix false positive of [[clang::require_explicit_initialization]] on copy/move constructors (#126553)
  • [C++20] [Modules] Don't diagnose duplicated declarations in different modules which is not in file scope
  • [AVX10.2] Fix wrong mask casting in some convert intrinsics (#126627)
  • [AVX10.2] Fix wrong intrinsic names after rename (#126390)
  • [DSE] Don't use initializes on byval argument (#126259)
  • [IndVars] Add test for #126012 (NFC)
  • [ScalarEvolution] Handle addrec incoming value in isImpliedViaMerge() (#126236)
  • [ValueTracking] Add additional tests for computeKnownBits on GEPs (NFC)
  • [ValueTracking] Fix bit width handling in computeKnownBits() for GEPs (#125532)
  • [clang-format] Handle C-style cast of member function pointer type (#126340)
  • release/20.x: [llvm-objcopy][ReleaseNotes] Fix prints wrong path when dump-section output path doesn't exist #125345 (#126607)
  • [VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235)
  • AMDGPU: Handle gfx950 XDL-write-VGPR-Overlap-Src-AB wait state (#126732)
  • AMDGPU: Handle gfx950 XDL-write-VGPR-VALU-Mem-Exp wait state change (#126727)
  • [InstCombine] Check nowrap flags when folding comparison of GEPs with the same base pointer (#121892)
  • [ORC][LLI] Remove redundant eh-frame registration plugin construction from lli.
  • [RISCV][compiler-rt] drop __riscv_vendor_feature_bits (#126460)
  • Bump version to 20.1.0-rc2 (#126859)
  • [AArch64][DAG] Allow fptos/ui.sat to scalarized. (#126799)
  • [ATfE] Fix typo on python executable reference for picolibc tests ([ATfE] Fix typo on python executable reference for picolibc tests #90)
  • [llvm] [cmake] Expose LLVM_BUILD_TELEMETRY in LLVMConfig.cmake (#126710)
  • [BOLT] Use getMainExecutable() (#126698)
  • Add release note for Armv9.6 updates (#126513)
  • [clang-format] Hanlde qualified type name for QualifierAlignment (#125327)
  • [clang-format] Fix a crash on parsing requires clause (#125021)
  • [ORC][unittests] Remove hard coded 16k page size (#127115)
  • [clang][AST] Handle dependent representation of call to function with explicit object parameter in CallExpr::getBeginLoc() (#126868)
  • libc/cmake: don't fail if LLVM_VERSION_SUFFIX isn't defined (#126359)
  • [reland][DebugInfo] Update DIBuilder insertion to take InsertPosition (#126967)
  • [libc] Move LLVM_LIBC define to __llvm-libc-common.h (#126877)
  • Diagnose the code with trailing comma in the function call. (#125232)
  • [clang-format] Fix annotation of Java/JavaScript keyword extends (#125038)
  • [clang-format] Add ClassHeadName to help annotating StartOfName (#124891)
  • [SLP] Check for PHI nodes (potentially cycles!) when checking dependencies
  • Revert "[SLP] Check for PHI nodes (potentially cycles!) when checking dependencies"
  • [ORC] Switch to singleton pattern for UnwindInfoManager. (#126691)
  • [NFC] [clang] fixed unused variable warning
  • [llvm][Support] Enable dl_iterate_phdr support on OpenBSD and DragonFly (#125186)
  • [libc++][format] Disables the FTM on older MacOS versions. (#126547)
  • [PowerPC] Use getSignedTargetConstant in SelectOptimalAddrMode. (#127305)
  • Use fixed picolibc version on 20.x release branch (Use fixed picolibc version on 20.x release branch #97)
  • [ELF] ICF: replace includeInDynsym with isExported
  • [ELF] Merge exportDynamic/isExported and remove Symbol::includeInDynsym
  • [ELF] Refine isExported/isPreemptible condition
  • [InstCombine] Do not keep samesign when speculatively executing icmps (#127007)
  • [ReleaseNotes][RemoveDIs] Add release note for deprecated insertion methods (#127493)
  • [OpenMP][libomp] Add OpenBSD, NetBSD and DragonFly stdarg handling (#126182)
  • [TBAA] Don't emit pointer-tbaa for void pointers. (#122116)
  • [clang] StmtPrinter: Handle DeclRefExpr to a Decomposition (#125001)
  • [libc++] Fixes (|multi)_set spaceship operator. (#127326)
  • [Hexagon] Explicitly truncate constant in UAddSubO (#127360)
  • [clang-format] Fix a bug in annotating StartOfName (#127545)
  • [libclc] Disable external-calls testing for clspv targets (#127529)
  • [clang] Fix false positive regression for lifetime analysis warning. (#127460)
  • AMDGPU: Handle gfx950 XDL Write-VGPR-VALU-WAW wait state change (#126132)
  • Revert "[libc++] Reduce std::conjunction overhead (#124259)"
  • [libc++] Avoid including <features.h> on arbitrary platforms (#125587)
  • flang: Fix build with latest libc++ (#127362)
  • [LLVM][AArch64] Remove aliases of LSUI instructions (#126072)
  • [libc++][TZDB] Fixes mapping of nonexisting time. (#127330)
  • workflows/release-binaries: Disable Flang on x86_64 macOS (#127216)
  • [libc++] Set feature-test macro __cpp_lib_atomic_float (#127559)
  • [GlobalISel][AArch64] Fix fptoi.sat lowering. (#127901)
  • release/20.x: [Clang] Remove the PackExpansion restrictions for rewrite substitution (#127174)
  • Revert "[C++20][Modules][Serialization] Delay marking pending incompl… (#127136)
  • Backport: [clang] fix P3310 overload resolution flag propagation (#125791)
  • [libc++] Guard include of <features.h> with __has_include (#127691)
  • AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487) (#127751)
  • [CMake][Release] Statically link clang with stage1 runtimes (#127268)
  • [CSKY] Default to unsigned char
  • [clang] Fix preprocessor output from #embed (#126742)
  • [VPlan] Compute cost for binary op VPInstruction with underlying values. #125434
  • [clang] Track function template instantiation from definition (#125266) (#127777)
  • [RISCV] [MachineOutliner] Analyze all candidates (#127659)
  • AMDGPU: Add some release 20 notes (#128136)
  • Add Wasm, RISC-V, BPF, and NVPTX targets back to Windows release packaging (#127794)
  • [CUDA] Add support for sm101 and sm120 target architectures (#127187)
  • [libc++] Fix stray usage of _LIBCPP_HAS_NO_WIDE_CHARACTERS on Windows
  • [libc++] Reduce the dependency of the locale base API on the base system from the headers (#117764)
  • [Clang] Fix cross-lane scan when given divergent lanes (#127703)
  • AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121)
  • [ATfL] Prepare compilers.yaml to help integrating the fresh ATfL build with Spack ([ATfL] Prepare compilers.yaml to help integrating the fresh ATfL build with Spack #111)
  • (release 20.x non-upstream patch) [LV] Add initial support for vectorizing literal struct return values ((release 20.x non-upstream patch) [LV] Add initial support for vectorizing literal struct return values #82)
  • (release 20.x non-upstream patch) [clang] Lower non-builtin sincos[f|l] calls to llvm.sincos. when -fno-math-errno is set ((release 20.x non-upstream patch) [clang] Lower non-builtin sincos[f|l] calls to llvm.sincos.* when -fno-math-errno is set #83)*
  • [SLP] Check for PHI nodes (potentially cycles!) when checking dependencies
  • Add armv8r+fp+simd (little endian) variants. (Add armv8r+fp+simd (little endian) variants. #64) (Add armv8r+fp+simd (little endian) variants. (#64) #112)
  • Add big endian base library variants for aarch64 R Target with strictly aligned. (Add big endian base library variants for aarch64 R Target with strictly aligned. (#68) #114)
  • [mlir][cmake] Do not export MLIR_MAIN_SRC_DIR and MLIR_INCLUDE_DIR (#125842)
  • [OpenMP] Fix misspelled symbol name (#126120)
  • [C++20] [Modules] handling selectAny attribute for vardecl
  • [DAGCombiner] visitFREEZE: Early exit when N is deleted (#128161)
  • [flang] fix AArch64 PCS for struct following pointer (#127802)
  • [Clang] Handle instantiating captures in addInstantiatedCapturesToScope() (#128478)
  • [Serialization] Update DECL_LAST
  • [X86][DAGCombiner] Skip x87 fp80 values in combineFMulOrFDivWithIntPow2 (#128618)
  • Revert "[clang][OpenCL][CodeGen][AMDGPU] Do not use private as the default AS for when generic is available (#112442)"
  • [Support] Ensure complete type DelimitedScope (#127459)
  • On Windows, remove the UCRT libraries from the release script (#128378)
  • [Hexagon] Add a case to BitTracker for new register class (#128580)
  • Do not treat llvm.fake.use as a debug instruction (#128684)
  • [PowerPC] Update LLVM 20.1.0 Release Notes (#128764)
  • [PPC][MC] Restore support for case-insensitive register names (#128525)
  • [CMake][Release] Enable bolt optimization for clang on Linux (#128090)
  • [CMake][Release] Statically link ZSTD on all OSes (#128554)
  • Bump version to 20.1.0-rc3
  • [ATfL] Get rid of the dot from the version suffix ([ATfL] Get rid of the dot from the version suffix #119)
  • Add armebv7m_soft_fpv4_sp_d16 variants (Add armebv7m_soft_fpv4_sp_d16 variants #50) (Add armebv7m_soft_fpv4_sp_d16 variants (#50) #120)
  • Add armebv7m_soft_nofp variants (Add armebv7m_soft_nofp variants #51) (Add armebv7m_soft_nofp variants (#51) #121)
  • Disable libcxx tests on armebv6m_soft_nofp variants (Disable libcxx tests on armebv6m_soft_nofp variants #66) (Disable libcxx tests on armebv6m_soft_nofp variants (#66) #123)
  • [ATfL] Do not overwrite the COMMON_CMAKE_FLAGS variable, extend it instead ([ATfL] Do not overwrite the COMMON_CMAKE_FLAGS variable, extend it instead #125)
  • Add armebv7a_soft_nofp and armebv7a_hard_vfpv3_d16 variants (Add armebv7a_soft_nofp and armebv7a_hard_vfpv3_d16 variants #92) (Add armebv7a_soft_nofp and armebv7a_hard_vfpv3_d16 variants (#92) #128)
  • Add armebv7a_soft_vfpv3_d16 variants (Add armebv7a_soft_vfpv3_d16 variants #129)
  • (release 20.x non-upstream patch) [LV] Teach the vectorizer to cost and vectorize llvm.sincos intrinsics ((release 20.x non-upstream patch) [LV] Teach the vectorizer to cost and vectorize llvm.sincos intrinsics #84)
  • Add newlib-nano build script (Add newlib-nano build script #106)
  • Add logic to multilib.yaml to handle cases of armebv7a_soft_nofp and armebv7a_hard_vfpv3_d16 variants (Add logic to multilib.yaml to handle cases of armebv7a_soft_nofp and armebv7a_hard_vfpv3_d16 variants #99)
  • Add AArch64 R-profile soft nofp variants (LE and BE) (Add AArch64 R-profile soft nofp variants (LE and BE) #102)
  • Remove -mno-unaligned-access flag from AArch32 big endian variants. (Remove -mno-unaligned-access flag from AArch32 big endian variants. #115)
  • Filter Corstone's stdout (Filter Corstone's stdout #61)
  • Fixes in FVP invocation (Fixes in FVP invocation #62)
  • Add support for downloading AArch64 versions of FVPs (Add support for downloading AArch64 versions of FVPs #73)
  • Update scripts and config for Corstone-310 v11.27 (Update scripts and config for Corstone-310 v11.27 #76)
  • [ATfE] Disable use of zlib when building the toolchain ([ATfE] Disable use of zlib when building the toolchain #91)
  • [ATfE] Add script for running longer libcxx tests ([ATfE] Add script for running longer libcxx tests #94)
  • [ATfE] Add copyright and license banner to build and test scripts ([ATfE] Add copyright and license banner to build and test scripts #95)
  • [ATfE] Enable FVP testing in build script if available ([ATfE] Enable FVP testing in build script if available #100)
  • [ATfE] Adjust expected FVP file locations based on platform ([ATfE] Adjust expected FVP file locations based on platform #105)
  • [ATfE] Update downstream performance patches ([ATfE] Update downstream performance patches #107)
  • [ATfE] Disable use of zstd when building the toolchain ([ATfE] Disable use of zstd when building the toolchain #108)
  • Add newlib-nano as multilib overlay package (Add newlib-nano as multilib #60)
  • Fix cmake syntax in generate_version_txt.cmake. (Fix cmake syntax in generate_version_txt.cmake. #96)
  • Disable debug symbols in picolibc builds (Disable debug symbols in picolibc builds (#117) #135)
  • Fix llvmlibc sample, by adding -lm to the link command. (Fix llvmlibc sample, by adding -lm to the link command. #69)
  • [ATfE] Do not build aarch64r soft_nofp variants with newlib/llvmlibc ([ATfE] Do not build aarch64r soft_nofp variants with newlib/llvmlibc #110)
  • Update llvm-project patch files to integrate changes from #126277, #127096 and #127662 (Update llvm-project patch files to integrate changes from #126277, #127096 and #127662 #140)
  • [ATfE] Rebase LLVM performance patch files on 20.x branch

nikic and others added 30 commits February 11, 2025 14:59
This is a test library which is not part of libMLIR, so it should
use normal LINK_LIBS instead of mlir_target_link_libraries.

This fixes an issue introduced in #123910 and follows up on the
fix in #125004, which added the library to DEPENDS, which is not
sufficient.
This library is provided by flang, not MLIR, so it should not be part of
MLIR_LIBS.

Fixes an issue introduced in llvm/llvm-project#120966.

(cherry picked from commit ee76bda)
The Fortran libraries are not part of MLIR, so they should use
target_link_libraries() rather than mlir_target_link_libraries().

This fixes an issue introduced in
llvm/llvm-project#120966.

(cherry picked from commit f9af5c1)
It's a cherry-pick from the arm-software branch.

OS name can be used to specify the target OS, along with the targeted
distribution name and version, e.g., `RHEL10`.
This is a test library which is not part of libMLIR, so it should
use normal LINK_LIBS instead of mlir_target_link_libraries.

This fixes an issue introduced in #123910 and follows up on the
fix in #125004, which added the library to DEPENDS, which is not
sufficient.
This library is provided by flang, not MLIR, so it should not be part of
MLIR_LIBS.

Fixes an issue introduced in llvm/llvm-project#120966.

(cherry picked from commit ee76bda)
The Fortran libraries are not part of MLIR, so they should use
target_link_libraries() rather than mlir_target_link_libraries().

This fixes an issue introduced in
llvm/llvm-project#120966.

(cherry picked from commit f9af5c1)
This aligns the builtins with how implementations work which don't use
the buitins.
Summary:
The CUDA impelementation has long supported the `width` argument on its
shuffle instrucitons, which makes it more difficult to replace those
uses with this helper. This patch just correctly implements that for
AMDGPU and NVPTX so it's equivalent to `__shfl_sync` in CUDA. This will
ease porting.

Fortunately these get optimized out correctly when passing in known
widths.

(cherry picked from commit 2d8106c)
A few tests generate a statically-linked position-independent executable
with `-nostdlib -Wl,--unresolved-symbols=ignore-all -pie` (`%clang`) and
test PLT handling. (--unresolved-symbols=ignore-all suppresses undefined
symbol errors and serves as a convenience hack.)

This relies on an unguaranteed linker behavior: a statically-linked PIE
does not necessarily generate PLT entries.
While current lld generates a PLT entry, it will change to suppress the
PLT entry to simplify internal handling and improve consistency.

(The behavior has no consistency in GNU ld, some ports generated a
.dynsym entry while some don't. While most seem to generate a PLT entry
but some ports use a weird `R_*_NONE` relocation.)

(cherry picked from commit a907008)
This aligns the builtins with how implementations work which don't use
the buitins.
…r (#125896)

Summary:
The CUDA impelementation has long supported the `width` argument on its
shuffle instrucitons, which makes it more difficult to replace those
uses with this helper. This patch just correctly implements that for
AMDGPU and NVPTX so it's equivalent to `__shfl_sync` in CUDA. This will
ease porting.

Fortunately these get optimized out correctly when passing in known
widths.

(cherry picked from commit 2d8106c)
…625)

A few tests generate a statically-linked position-independent executable
with `-nostdlib -Wl,--unresolved-symbols=ignore-all -pie` (`%clang`) and
test PLT handling. (--unresolved-symbols=ignore-all suppresses undefined
symbol errors and serves as a convenience hack.)

This relies on an unguaranteed linker behavior: a statically-linked PIE
does not necessarily generate PLT entries.
While current lld generates a PLT entry, it will change to suppress the
PLT entry to simplify internal handling and improve consistency.

(The behavior has no consistency in GNU ld, some ports generated a
.dynsym entry while some don't. While most seem to generate a PLT entry
but some ports use a weird `R_*_NONE` relocation.)

(cherry picked from commit a907008)
Summary:
If the user deallocates an RPC device this can sometimes fail if the RPC
server is still running. This will happen if the modification happens
while the server is still checking it. This patch adds a mutex to guard
modifications to it.

(cherry picked from commit baf7a3c)
…opy/move constructors (#126553)

Fixes #126490

(cherry picked from commit 90192e8)
…rray (#126790)

Summary:
If the user deallocates an RPC device this can sometimes fail if the RPC
server is still running. This will happen if the modification happens
while the server is still checking it. This patch adds a mutex to guard
modifications to it.

(cherry picked from commit baf7a3c)
…tion]] on copy/move constructors (#126553)

Fixes #126490

(cherry picked from commit 90192e8)
… modules which is not in file scope

Close llvm/llvm-project#126373

Although the root problems should be we shouldn't place the friend
declaration to the incorrect module, let's avoid bleeding the edge by
stoping diagnosing declarations not in file scope.

(cherry picked from commit 569e94f)
Found during work on #120927. This caused the compiler to silently drop
ignore half of the mask in the specific intrinsics.

(cherry picked from commit af522c5)
In my previous PR (#123656) to update the names of AVX10.2 intrinsics
and mnemonics, I have erroneously deleted `_ph` from few intrinsics.
This PR corrects this.

(cherry picked from commit 161cfc6)
There are two ways we can fix this problem, depending on how the
semantics of byval and initializes should interact:

* Don't infer initializes on byval arguments. initializes on byval
refers to the original caller memory (or having both attributes is made
a verifier error).
* Infer initializes on byval, but don't use it in DSE. initializes on
byval refers to the callee copy. This matches the semantics of readonly
on byval. This is slightly more powerful, for example, we could do a
backend optimization where byval + initializes will allocate the full
size of byval on the stack but not copy over the parts covered by
initializes.

I went with the second variant here, skipping byval + initializes in DSE
(FunctionAttrs already doesn't propagate initializes past byval). I'm
open to going in the other direction though.

Fixes llvm/llvm-project#126181.

(cherry picked from commit 2d31a12)
… (#126236)

The code already guards against values coming from a previous iteration
using properlyDominates(). However, addrecs are considered to properly
dominate the loop they are defined in.

Handle this special case separately, by checking for expressions that
have computable loop evolution (this should cover cases like a zext of
an addrec as well).

I considered changing the definition of properlyDominates() instead, but
decided against it. The current definition is useful in other context,
e.g. when deciding whether an expression is safe to expand in a given
block.

Fixes llvm/llvm-project#126012.

(cherry picked from commit 7aed53e)
These demonstrate miscompiles in the existing code.

(cherry picked from commit 3dc1ef1)
… (#125532)

For GEPs, we have three bit widths involved: The pointer bit width, the
index bit width, and the bit width of the GEP operands.

The correct behavior here is:
* We need to sextOrTrunc the GEP operand to the index width *before*
multiplying by the scale.
* If the index width and pointer width differ, GEP only ever modifies
the low bits. Adds should not overflow into the high bits.

I'm testing this via unit tests because it's a bit tricky to test in IR
with InstCombine canonicalization getting in the way.

(cherry picked from commit 3bd11b5)
…126340)

Fixes #125012.

(cherry picked from commit 8d373ce)
… dump-section output path doesn't exist #125345 (#126607)

Add release note for llvm-objcopy fixing prints wrong path when
dump-section output path doesn't exist in #125345
…n. (#125235)

Update getOrCreateVPValueForSCEVExpr to only skip expansion of
SCEVUnknown if the underlying value isn't an instruction. Instructions
may be defined in a loop and using them without expansion may break
LCSSA form. SCEVExpander will take care of preserving LCSSA if needed.

We could also try to pass LoopInfo, but there are some users of the
function where it won't be available and main benefit from skipping
expansion is slightly more concise VPlans.

Note that SCEVExpander is now used to expand SCEVUnknown with floats.
Adjust the check in expandCodeFor to only check the types and casts if
the type of the value is different to the requested type. Otherwise we
crash when trying to expand a float and requesting a float type.

Fixes llvm/llvm-project#121518.

PR: llvm/llvm-project#125235
(cherry picked from commit e258bca)
simpal01 and others added 28 commits February 27, 2025 13:32
Some libcxx tests are failing on this variant, despite whether any of
the failures are exclusive to this variant is unknown.

For now the libcxx tests will remain disabled while we focus on having C
library tests passing for new variants we add.

(cherry picked from arm#66)
…stead (arm#125)

It's a cherry-pick from the arm-software branch.

This will allow us to tweak the CMake configuration.
…nd vectorize llvm.sincos intrinsics (arm#84)

This teaches the loop vectorizer that `llvm.sincos` is trivially
vectorizable. Additionally, this patch updates the cost model to cost
intrinsics that return multiple values correctly. Previously, the cost
model only thought intrinsics that return `VectorType` need scalarizing,
which meant it cost intrinsics that return multiple vectors (that need
scalarizing) way too cheap (giving it the cost of a single function
call).

The `llvm.sincos` intrinsic also has a custom cost when a vector
function library is available, as certain VFs can be expanded (later in
code-gen) to a vector function, reducing the cost to a single call (+
the possible loads from the vector function returns values via output
pointers).

---

Downstream issue: arm#87
Add newlib-nano build script based on newlib one.
…armebv7a_hard_vfpv3_d16 variants (arm#99)

These variants were missing some logic in multilib.yaml: some of the
flag matching rules existed only for little endian. Thus they had to be
copied for big endian as well.

Besides the changes to multilib.yaml, tests have been added to properly
test the library selection of the variants.

(cherry picked from commit 128f026)
…rm#115)

The big endian library can be selected with or without the
-mno-unaligned-access flag. Currently, selecting the variants below
requires explicitly passing -mno-unaligned-access. This change removes
the flag from the selection criteria, ensuring that below variants can
be selected regardless of whether -mno-unaligned-access is used.

- armebv7a_soft_nofp
- armebv7a_soft_nofp_exn_rtti
- armebv7a_hard_vfpv3_d16
- armebv7a_hard_vfpv3_d16_exn_rtti
- armebv7a_soft_vfpv3_d16_exn_rtti
- armebv7a_soft_vfpv3_d16

(cherry picked from commit fa1cfc6)
Corstone prints out some informational text to stdout, but this text
conflicts with expectations from tests about the contents of stdout.

This patch filters out this text to send to stdout only what matters.
This patches fixes two bugs:
 - Properly propagate FVP's return code to the caller.
- Prepend `INST=` to the `--application` argument value. This is
required when the application's path contains equal signs (`=`). This
sign is interpreted as a special case inside FVP: its presence in the
path leads to ambiguity. One way to work around this is to do the
prepend. After that, any `=` sign that comes up in the path is treated
as part of the path.

This is the help message from Corstone's `--application`:
> -a, --application FILE application to load, format: -a [INST=]FILE
(use -a INST=FILE for a specific instance, use -a INST*=FILE to match
multiple instances using wildcards e.g. for SMP cores)
The get_fvps.sh script currently only downloads and installs the Linux64
versions of the FVPs, which does not necessarily match the host. AArch64
versions of the downloads are all available, so this patch adds a simple
check to download the appropriate ones.

This also updates the version of the Corstone-310 FVP to 11.27, and
modifies the script to infer the package names from the URL to reduce
duplication when modifying links.
The new version of Corstone-310 FVP model has a few differences compared
to v11.24 which require changes:

* The `ID_ISAR5.PACBTI` parameter has been deprecated and removed.
* The install now includes its own copy of pythonlib and fmtplib, the
paths to which need to be set as environment variables before starting
the model. `PYTHONHOME` should be set to `<install>/python`, and
`LD_LIBRARY_PATH` to `<install>/fmtplib:<install>/python/lib`.
* The boilerplate message printed to stdout has a different date, and
the Info message on stop is no longer present. The regex has been
updated to still match.
This patch disables the use of zlib when building the toolchain, as
using it creates an extra dependency on the `libzstd` shared library
which is not guaranteed to be present on all supported platforms.
This introduces a new shell script for running ATfE tests using the
`check-cxx` target. These tests are performed using a separate script
due to the considerably long run times needed for completion, as the
libcxx tests need to be executed for each of the underlying C library
variants.
If FVPs have been installed using the `get_fvps.sh` script, then this
patch allows them to be used in the `build.sh` script by setting the
`FVP_INSTALL_DIR` environment variable to the install location.
The `run_fvp.py` script currently only supports running the x86_64
versions of the models, due to the paths containing a directory that
only exists in the x86_64 package. This patch adjusts the paths
depending on the platform, which allows the script to run both x86_64
and AArch64 versions.
The `Additional unrolling in LTO` patch has been rebased against an
upstream change, while `Prefer MEMCPY LDM/STM inlining for v7-m` has had
the original commit message restored in order to provide context for the
change.

The filenames now also match the subject of the commit message, as they
have been regenerated using git format-patch.
This patch disables the use of zstd when building the toolchain, as
using it creates an extra dependency on the `libzstd` shared library
which is not guaranteed to be present on all supported platforms.
See
ARM-software/LLVM-embedded-toolchain-for-Arm#628
for context.

This enables newlib-nano as a multilib. I didn't find any release
scripts, so I don't know how to add it to the CI such that it also
generates an overlay package for newlib-nano on release.

Note: this also fixes an issue with the current newlib builds where they
don't get built correctly for anything above ARMv6 as the `-march`,
`-mfloat-abi` and `-mfpu` arguments weren't being passed on to the
newlib compilation configuration. A little unsure why this hasn't popped
up before...
Commit 23f1d75 included `$(LLVM_TOOLCHAIN_C_LIBRARY)`, which
isn't valid cmake syntax – to interpolate a cmake variable you have to
use braces, not parentheses.

This led to a mysterious error from git

fatal: cannot change to 'rev-parse': No such file or directory

because once `${base_library}` is set to something that doesn't make
sense, `git -C ${${base_library}_SOURCE_DIR} rev-parse HEAD` substitutes
nothing at all for the source directory, and the command collapses to
just `git -C rev-parse` which indeed interprets `rev-parse` as the
directory to change into.
The sample program calls two libm functions (lround and atanf) to turn a
floating-point number into decimal digits. But libm.a is built as a
separate library from libc.a, so linking the sample program fails
because the compile command doesn't use the -lm option. Easily fixed.
…rm#110)

The aarch64r soft_nofp variants will fail to build using newlib. The
aarch64a soft_nofp variants are already disabled in newlib/llvmlibc
builds due to missing support, so the aarch64r should be similarly
limited to picolibc.
…27096 and #127662 (arm#140)

We need to incorporate the following upstreamed patches into the 20.x
branch. So applying these as patch files.

[compiler-rt] Add support for big endian for Arm's __negdf2vfp - (cherry
picked from llvm/llvm-project#127096)

[compiler-rt] Fix tests of _aeabi(idivmod|uidivmod|uldivmod) to support
big endian - (cherry picked from
llvm/llvm-project#126277)

[libcxx] Work around picolibc argv handling in tests - (cherry picked
from llvm/llvm-project#127662)
@pratlucas pratlucas closed this Feb 28, 2025
@pratlucas pratlucas deleted the rebase-perf-patches branch February 28, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.