-
Notifications
You must be signed in to change notification settings - Fork 34
rebase perf patches #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
rebase perf patches #141
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a test library which is not part of libMLIR, so it should use normal LINK_LIBS instead of mlir_target_link_libraries. This fixes an issue introduced in #123910 and follows up on the fix in #125004, which added the library to DEPENDS, which is not sufficient.
This library is provided by flang, not MLIR, so it should not be part of MLIR_LIBS. Fixes an issue introduced in llvm/llvm-project#120966. (cherry picked from commit ee76bda)
The Fortran libraries are not part of MLIR, so they should use target_link_libraries() rather than mlir_target_link_libraries(). This fixes an issue introduced in llvm/llvm-project#120966. (cherry picked from commit f9af5c1)
It's a cherry-pick from the arm-software branch. OS name can be used to specify the target OS, along with the targeted distribution name and version, e.g., `RHEL10`.
This is a test library which is not part of libMLIR, so it should use normal LINK_LIBS instead of mlir_target_link_libraries. This fixes an issue introduced in #123910 and follows up on the fix in #125004, which added the library to DEPENDS, which is not sufficient.
This library is provided by flang, not MLIR, so it should not be part of MLIR_LIBS. Fixes an issue introduced in llvm/llvm-project#120966. (cherry picked from commit ee76bda)
The Fortran libraries are not part of MLIR, so they should use target_link_libraries() rather than mlir_target_link_libraries(). This fixes an issue introduced in llvm/llvm-project#120966. (cherry picked from commit f9af5c1)
This aligns the builtins with how implementations work which don't use the buitins.
Summary: The CUDA impelementation has long supported the `width` argument on its shuffle instrucitons, which makes it more difficult to replace those uses with this helper. This patch just correctly implements that for AMDGPU and NVPTX so it's equivalent to `__shfl_sync` in CUDA. This will ease porting. Fortunately these get optimized out correctly when passing in known widths. (cherry picked from commit 2d8106c)
(cherry picked from commit 718cdeb)
A few tests generate a statically-linked position-independent executable with `-nostdlib -Wl,--unresolved-symbols=ignore-all -pie` (`%clang`) and test PLT handling. (--unresolved-symbols=ignore-all suppresses undefined symbol errors and serves as a convenience hack.) This relies on an unguaranteed linker behavior: a statically-linked PIE does not necessarily generate PLT entries. While current lld generates a PLT entry, it will change to suppress the PLT entry to simplify internal handling and improve consistency. (The behavior has no consistency in GNU ld, some ports generated a .dynsym entry while some don't. While most seem to generate a PLT entry but some ports use a weird `R_*_NONE` relocation.) (cherry picked from commit a907008)
This aligns the builtins with how implementations work which don't use the buitins.
…r (#125896) Summary: The CUDA impelementation has long supported the `width` argument on its shuffle instrucitons, which makes it more difficult to replace those uses with this helper. This patch just correctly implements that for AMDGPU and NVPTX so it's equivalent to `__shfl_sync` in CUDA. This will ease porting. Fortunately these get optimized out correctly when passing in known widths. (cherry picked from commit 2d8106c)
(cherry picked from commit 718cdeb)
…625) A few tests generate a statically-linked position-independent executable with `-nostdlib -Wl,--unresolved-symbols=ignore-all -pie` (`%clang`) and test PLT handling. (--unresolved-symbols=ignore-all suppresses undefined symbol errors and serves as a convenience hack.) This relies on an unguaranteed linker behavior: a statically-linked PIE does not necessarily generate PLT entries. While current lld generates a PLT entry, it will change to suppress the PLT entry to simplify internal handling and improve consistency. (The behavior has no consistency in GNU ld, some ports generated a .dynsym entry while some don't. While most seem to generate a PLT entry but some ports use a weird `R_*_NONE` relocation.) (cherry picked from commit a907008)
Summary: If the user deallocates an RPC device this can sometimes fail if the RPC server is still running. This will happen if the modification happens while the server is still checking it. This patch adds a mutex to guard modifications to it. (cherry picked from commit baf7a3c)
…opy/move constructors (#126553) Fixes #126490 (cherry picked from commit 90192e8)
…rray (#126790) Summary: If the user deallocates an RPC device this can sometimes fail if the RPC server is still running. This will happen if the modification happens while the server is still checking it. This patch adds a mutex to guard modifications to it. (cherry picked from commit baf7a3c)
…tion]] on copy/move constructors (#126553) Fixes #126490 (cherry picked from commit 90192e8)
… modules which is not in file scope Close llvm/llvm-project#126373 Although the root problems should be we shouldn't place the friend declaration to the incorrect module, let's avoid bleeding the edge by stoping diagnosing declarations not in file scope. (cherry picked from commit 569e94f)
Found during work on #120927. This caused the compiler to silently drop ignore half of the mask in the specific intrinsics. (cherry picked from commit af522c5)
In my previous PR (#123656) to update the names of AVX10.2 intrinsics and mnemonics, I have erroneously deleted `_ph` from few intrinsics. This PR corrects this. (cherry picked from commit 161cfc6)
There are two ways we can fix this problem, depending on how the semantics of byval and initializes should interact: * Don't infer initializes on byval arguments. initializes on byval refers to the original caller memory (or having both attributes is made a verifier error). * Infer initializes on byval, but don't use it in DSE. initializes on byval refers to the callee copy. This matches the semantics of readonly on byval. This is slightly more powerful, for example, we could do a backend optimization where byval + initializes will allocate the full size of byval on the stack but not copy over the parts covered by initializes. I went with the second variant here, skipping byval + initializes in DSE (FunctionAttrs already doesn't propagate initializes past byval). I'm open to going in the other direction though. Fixes llvm/llvm-project#126181. (cherry picked from commit 2d31a12)
(cherry picked from commit ae08969)
… (#126236) The code already guards against values coming from a previous iteration using properlyDominates(). However, addrecs are considered to properly dominate the loop they are defined in. Handle this special case separately, by checking for expressions that have computable loop evolution (this should cover cases like a zext of an addrec as well). I considered changing the definition of properlyDominates() instead, but decided against it. The current definition is useful in other context, e.g. when deciding whether an expression is safe to expand in a given block. Fixes llvm/llvm-project#126012. (cherry picked from commit 7aed53e)
These demonstrate miscompiles in the existing code. (cherry picked from commit 3dc1ef1)
… (#125532) For GEPs, we have three bit widths involved: The pointer bit width, the index bit width, and the bit width of the GEP operands. The correct behavior here is: * We need to sextOrTrunc the GEP operand to the index width *before* multiplying by the scale. * If the index width and pointer width differ, GEP only ever modifies the low bits. Adds should not overflow into the high bits. I'm testing this via unit tests because it's a bit tricky to test in IR with InstCombine canonicalization getting in the way. (cherry picked from commit 3bd11b5)
…126340) Fixes #125012. (cherry picked from commit 8d373ce)
… dump-section output path doesn't exist #125345 (#126607) Add release note for llvm-objcopy fixing prints wrong path when dump-section output path doesn't exist in #125345
…n. (#125235) Update getOrCreateVPValueForSCEVExpr to only skip expansion of SCEVUnknown if the underlying value isn't an instruction. Instructions may be defined in a loop and using them without expansion may break LCSSA form. SCEVExpander will take care of preserving LCSSA if needed. We could also try to pass LoopInfo, but there are some users of the function where it won't be available and main benefit from skipping expansion is slightly more concise VPlans. Note that SCEVExpander is now used to expand SCEVUnknown with floats. Adjust the check in expandCodeFor to only check the types and casts if the type of the value is different to the requested type. Otherwise we crash when trying to expand a float and requesting a float type. Fixes llvm/llvm-project#121518. PR: llvm/llvm-project#125235 (cherry picked from commit e258bca)
Some libcxx tests are failing on this variant, despite whether any of the failures are exclusive to this variant is unknown. For now the libcxx tests will remain disabled while we focus on having C library tests passing for new variants we add. (cherry picked from arm#66)
…stead (arm#125) It's a cherry-pick from the arm-software branch. This will allow us to tweak the CMake configuration.
(cherry picked from arm#98)
…nd vectorize llvm.sincos intrinsics (arm#84) This teaches the loop vectorizer that `llvm.sincos` is trivially vectorizable. Additionally, this patch updates the cost model to cost intrinsics that return multiple values correctly. Previously, the cost model only thought intrinsics that return `VectorType` need scalarizing, which meant it cost intrinsics that return multiple vectors (that need scalarizing) way too cheap (giving it the cost of a single function call). The `llvm.sincos` intrinsic also has a custom cost when a vector function library is available, as certain VFs can be expanded (later in code-gen) to a vector function, reducing the cost to a single call (+ the possible loads from the vector function returns values via output pointers). --- Downstream issue: arm#87
Add newlib-nano build script based on newlib one.
…armebv7a_hard_vfpv3_d16 variants (arm#99) These variants were missing some logic in multilib.yaml: some of the flag matching rules existed only for little endian. Thus they had to be copied for big endian as well. Besides the changes to multilib.yaml, tests have been added to properly test the library selection of the variants. (cherry picked from commit 128f026)
(cherry picked from commit b04c742)
…rm#115) The big endian library can be selected with or without the -mno-unaligned-access flag. Currently, selecting the variants below requires explicitly passing -mno-unaligned-access. This change removes the flag from the selection criteria, ensuring that below variants can be selected regardless of whether -mno-unaligned-access is used. - armebv7a_soft_nofp - armebv7a_soft_nofp_exn_rtti - armebv7a_hard_vfpv3_d16 - armebv7a_hard_vfpv3_d16_exn_rtti - armebv7a_soft_vfpv3_d16_exn_rtti - armebv7a_soft_vfpv3_d16 (cherry picked from commit fa1cfc6)
Corstone prints out some informational text to stdout, but this text conflicts with expectations from tests about the contents of stdout. This patch filters out this text to send to stdout only what matters.
This patches fixes two bugs: - Properly propagate FVP's return code to the caller. - Prepend `INST=` to the `--application` argument value. This is required when the application's path contains equal signs (`=`). This sign is interpreted as a special case inside FVP: its presence in the path leads to ambiguity. One way to work around this is to do the prepend. After that, any `=` sign that comes up in the path is treated as part of the path. This is the help message from Corstone's `--application`: > -a, --application FILE application to load, format: -a [INST=]FILE (use -a INST=FILE for a specific instance, use -a INST*=FILE to match multiple instances using wildcards e.g. for SMP cores)
The get_fvps.sh script currently only downloads and installs the Linux64 versions of the FVPs, which does not necessarily match the host. AArch64 versions of the downloads are all available, so this patch adds a simple check to download the appropriate ones. This also updates the version of the Corstone-310 FVP to 11.27, and modifies the script to infer the package names from the URL to reduce duplication when modifying links.
The new version of Corstone-310 FVP model has a few differences compared to v11.24 which require changes: * The `ID_ISAR5.PACBTI` parameter has been deprecated and removed. * The install now includes its own copy of pythonlib and fmtplib, the paths to which need to be set as environment variables before starting the model. `PYTHONHOME` should be set to `<install>/python`, and `LD_LIBRARY_PATH` to `<install>/fmtplib:<install>/python/lib`. * The boilerplate message printed to stdout has a different date, and the Info message on stop is no longer present. The regex has been updated to still match.
This patch disables the use of zlib when building the toolchain, as using it creates an extra dependency on the `libzstd` shared library which is not guaranteed to be present on all supported platforms.
This introduces a new shell script for running ATfE tests using the `check-cxx` target. These tests are performed using a separate script due to the considerably long run times needed for completion, as the libcxx tests need to be executed for each of the underlying C library variants.
If FVPs have been installed using the `get_fvps.sh` script, then this patch allows them to be used in the `build.sh` script by setting the `FVP_INSTALL_DIR` environment variable to the install location.
The `run_fvp.py` script currently only supports running the x86_64 versions of the models, due to the paths containing a directory that only exists in the x86_64 package. This patch adjusts the paths depending on the platform, which allows the script to run both x86_64 and AArch64 versions.
The `Additional unrolling in LTO` patch has been rebased against an upstream change, while `Prefer MEMCPY LDM/STM inlining for v7-m` has had the original commit message restored in order to provide context for the change. The filenames now also match the subject of the commit message, as they have been regenerated using git format-patch.
This patch disables the use of zstd when building the toolchain, as using it creates an extra dependency on the `libzstd` shared library which is not guaranteed to be present on all supported platforms.
See ARM-software/LLVM-embedded-toolchain-for-Arm#628 for context. This enables newlib-nano as a multilib. I didn't find any release scripts, so I don't know how to add it to the CI such that it also generates an overlay package for newlib-nano on release. Note: this also fixes an issue with the current newlib builds where they don't get built correctly for anything above ARMv6 as the `-march`, `-mfloat-abi` and `-mfpu` arguments weren't being passed on to the newlib compilation configuration. A little unsure why this hasn't popped up before...
Commit 23f1d75 included `$(LLVM_TOOLCHAIN_C_LIBRARY)`, which isn't valid cmake syntax – to interpolate a cmake variable you have to use braces, not parentheses. This led to a mysterious error from git fatal: cannot change to 'rev-parse': No such file or directory because once `${base_library}` is set to something that doesn't make sense, `git -C ${${base_library}_SOURCE_DIR} rev-parse HEAD` substitutes nothing at all for the source directory, and the command collapses to just `git -C rev-parse` which indeed interprets `rev-parse` as the directory to change into.
The sample program calls two libm functions (lround and atanf) to turn a floating-point number into decimal digits. But libm.a is built as a separate library from libc.a, so linking the sample program fails because the compile command doesn't use the -lm option. Easily fixed.
…rm#110) The aarch64r soft_nofp variants will fail to build using newlib. The aarch64a soft_nofp variants are already disabled in newlib/llvmlibc builds due to missing support, so the aarch64r should be similarly limited to picolibc.
…27096 and #127662 (arm#140) We need to incorporate the following upstreamed patches into the 20.x branch. So applying these as patch files. [compiler-rt] Add support for big endian for Arm's __negdf2vfp - (cherry picked from llvm/llvm-project#127096) [compiler-rt] Fix tests of _aeabi(idivmod|uidivmod|uldivmod) to support big endian - (cherry picked from llvm/llvm-project#126277) [libcxx] Work around picolibc argv handling in tests - (cherry picked from llvm/llvm-project#127662)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
gnu::formatwith variadic template functions is Clang-only (#124406)foldSelectWithFCmpToFabs(#121580)llvm/test/DebugInfo/Generic/discriminated-union.llon big-endian targets (#125849)finishPendingActions. (#121245)LLVM_BUILD_TELEMETRYinLLVMConfig.cmake(#126710)QualifierAlignment(#125327)__cpp_lib_atomic_float(#127559)combineFMulOrFDivWithIntPow2(#128618)privateas the default AS for whengenericis available (#112442)"