forked from swiftlang/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
Add check inversion for x and xor x, -1 #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
zjaffal
wants to merge
10,000
commits into
llvm.org/main
from
analysis/value_tracking/isKnownInversion/add_inv
Closed
Add check inversion for x and xor x, -1 #1
zjaffal
wants to merge
10,000
commits into
llvm.org/main
from
analysis/value_tracking/isKnownInversion/add_inv
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Use a series of ops in that case, getting us to the right declaration field.
This is a three instruction expansion, and does not depend on zba, so most of the test changes are in base RV32/64I configurations. With zba, this gets immediates such as 14, 28, 30, 56, 60, 62.. which aren't covered by our other expansions.
… in documentation (NFC) (llvm#95003) The checker was renamed at some time ago but the documentation was not updated. The section is now just moved and renamed. The documentation is still very simple and needs improvement.
…elease (llvm#91862) ### Context We have a longstanding performance issue on Windows, where to this day, the default heap allocator is still lockfull. With the number of cores increasing, building and using LLVM with the default Windows heap allocator is sub-optimal. Notably, the ThinLTO link times with LLD are extremely long, and increase proportionally with the number of cores in the machine. In llvm@a6a37a2, I introduced the ability build LLVM with several popular lock-free allocators. Downstream users however have to build their own toolchain with this option, and building an optimal toolchain is a bit tedious and long. Additionally, LLVM is now integrated into Visual Studio, which AFAIK re-distributes the vanilla LLVM binaries/installer. The point being that many users are impacted and might not be aware of this problem, or are unable to build a more optimal version of the toolchain. The symptom before this PR is that most of the CPU time goes to the kernel (darker blue) when linking with ThinLTO:  With this PR, most time is spent in user space (light blue):  On higher core count machines, before this PR, the CPU usage becomes pretty much flat because of contention: <img width="549" alt="VM_176_windows_heap" src="https://github.com/llvm/llvm-project/assets/37383324/f27d5800-ee02-496d-a4e7-88177e0727f0"> With this PR, similarily most CPU time is now used: <img width="549" alt="VM_176_with_rpmalloc" src="https://github.com/llvm/llvm-project/assets/37383324/7d4785dd-94a7-4f06-9b16-aaa4e2e505c8"> ### Changes in this PR The avenue I've taken here is to vendor/re-licence rpmalloc in-tree, and use it when building the Windows 64-bit release. Given the permissive rpmalloc licence, prior discussions with the LLVM foundation and @lattner suggested this vendoring. Rpmalloc's author (@mjansson) kindly agreed to ~~donate~~ re-licence the rpmalloc code in LLVM (please do correct me if I misinterpreted our past communications). I've chosen rpmalloc because it's small and gives the best value overall. The source code is only 4 .c files. Rpmalloc is statically replacing the weak CRT alloc symbols at link time, and has no dynamic patching like mimalloc. As an alternative, there were several unsuccessfull attempts made by Russell Gallop to use SCUDO in the past, please see thread in https://reviews.llvm.org/D86694. If later someone comes up with a PR of similar performance that uses SCUDO, we could then delete this vendored rpmalloc folder. I've added a new cmake flag `LLVM_ENABLE_RPMALLOC` which essentialy sets `LLVM_INTEGRATED_CRT_ALLOC` to the in-tree rpmalloc source. ### Performance The most obvious test is profling a ThinLTO linking step with LLD. I've used a Clang compilation as a testbed, ie. ``` set OPTS=/GS- /D_ITERATOR_DEBUG_LEVEL=0 -Xclang -O3 -fstrict-aliasing -march=native -flto=thin -fwhole-program-vtables -fuse-ld=lld cmake -G Ninja %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=TRUE -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_PDB=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_C_COMPILER=clang-cl.exe -DCMAKE_CXX_COMPILER=clang-cl.exe -DCMAKE_LINKER=lld-link.exe -DLLVM_ENABLE_LLD=ON -DCMAKE_CXX_FLAGS="%OPTS%" -DCMAKE_C_FLAGS="%OPTS%" -DLLVM_ENABLE_LTO=THIN ``` I've profiled the linking step with no LTO cache, with Powershell, such as: ``` Measure-Command { lld-link /nologo @CMakeFiles\clang.rsp /out:bin\clang.exe /implib:lib\clang.lib /pdb:bin\clang.pdb /version:0.0 /machine:x64 /STACK:10000000 /DEBUG /OPT:REF /OPT:ICF /INCREMENTAL:NO /subsystem:console /MANIFEST:EMBED,ID=1 }` ``` Timings: | Machine | Allocator | Time to link | |--------|--------|--------| | 16c/32t AMD Ryzen 9 5950X | Windows Heap | 10 min 38 sec | | | **Rpmalloc** | **4 min 11 sec** | | 32c/64t AMD Ryzen Threadripper PRO 3975WX | Windows Heap | 23 min 29 sec | | | **Rpmalloc** | **2 min 11 sec** | | | **Rpmalloc + /threads:64** | **1 min 50 sec** | | 176 vCPU (2 socket) Intel Xeon Platinium 8481C (fixed clock 2.7 GHz) | Windows Heap | 43 min 40 sec | | | **Rpmalloc** | **1 min 45 sec** | This also improves the overall performance when building with clang-cl. I've profiled a regular compilation of clang itself, ie: ``` set OPTS=/GS- /D_ITERATOR_DEBUG_LEVEL=0 /arch:AVX -fuse-ld=lld cmake -G Ninja %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=TRUE -DLLVM_ENABLE_PROJECTS="clang;lld" -DLLVM_ENABLE_PDB=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_C_COMPILER=clang-cl.exe -DCMAKE_CXX_COMPILER=clang-cl.exe -DCMAKE_LINKER=lld-link.exe -DLLVM_ENABLE_LLD=ON -DCMAKE_CXX_FLAGS="%OPTS%" -DCMAKE_C_FLAGS="%OPTS%" ``` This saves approx. 30 sec when building on the Threadripper PRO 3975WX: ``` (default Windows Heap) C:\src\git\llvm-project>hyperfine -r 5 -p "make_llvm.bat stage1_test2" "ninja clang -C stage1_test2" Benchmark 1: ninja clang -C stage1_test2 Time (mean ± σ): 392.716 s ± 3.830 s [User: 17734.025 s, System: 1078.674 s] Range (min … max): 390.127 s … 399.449 s 5 runs (rpmalloc) C:\src\git\llvm-project>hyperfine -r 5 -p "make_llvm.bat stage1_test2" "ninja clang -C stage1_test2" Benchmark 1: ninja clang -C stage1_test2 Time (mean ± σ): 360.824 s ± 1.162 s [User: 15148.637 s, System: 905.175 s] Range (min … max): 359.208 s … 362.288 s 5 runs ```
…insic (llvm#95276) Depends on llvm#94830.
…bindings (llvm#96150) The MLIR C and Python Bindings expose various methods from `mlir::OpPrintingFlags` . This PR adds a binding for the `skipRegions` method, which allows to skip the printing of Regions when printing Ops. It also exposes this option as parameter in the python `get_asm` and `print` methods
Note: our baremetal arm configuration compiles this as `--target=arm-none-eabi`, so this code is built in -marm mode. It could be smaller with `--target=armv7-none-eabi -mthumb`. The assembler is valid ARMv5, or THUMB2, but not THUMB(1).
The use of SmallVector here saves 4.7% of heap allocations during the compilation of ConvertExpr.cpp.ii, a preprocessed version of flang/lib/Lower/ConvertExpr.cpp.
Add DwarfRegAlias for VSRPair as it shares dwarfRegNum with the VR registers.
... so move it out of the `implied_features` list, and into the `DefaultExts` list.
SmallPtrSet.h and TimeProfiler.h are unused. CommandLine.h is only needed for the UseNewDbgInfoFormat declare, which can be moved to the places that need it.
…ngth vectors. (llvm#96081) This also fixes the case where an SVE div is incorrectly to be assumed available in non-streaming mode with SME.
llvm#91100) ### Background Doxygen's `\par` command ([link](https://www.doxygen.nl/manual/commands.html#cmdpar)) has an optional argument, which denotes the header of the paragraph started by a given `\par` command. In short, the paragraph command can be used with a heading, or without one. The code block below shows both forms and how the current version of LLVM/Clang parses this code: ``` $ cat test.cpp /// \par User defined paragraph: /// Contents of the paragraph. /// /// \par /// New paragraph under the same heading. /// /// \par /// A second paragraph. class A {}; $ clang++ -cc1 -ast-dump -fcolor-diagnostics -std=c++20 test.cpp `-CXXRecordDecl 0x1530f3a78 <test.cpp:11:1, col:10> col:7 class A definition |-FullComment 0x1530fea38 <line:2:4, line:9:23> | |-ParagraphComment 0x1530fe7e0 <line:2:4> | | `-TextComment 0x1530fe7b8 <col:4> Text=" " | |-BlockCommandComment 0x1530fe800 <col:5, line:3:30> Name="par" | | `-ParagraphComment 0x1530fe878 <line:2:9, line:3:30> | | |-TextComment 0x1530fe828 <line:2:9, col:32> Text=" User defined paragraph:" | | `-TextComment 0x1530fe848 <line:3:4, col:30> Text=" Contents of the paragraph." | |-ParagraphComment 0x1530fe8c0 <line:5:4> | | `-TextComment 0x1530fe898 <col:4> Text=" " | |-BlockCommandComment 0x1530fe8e0 <col:5, line:6:41> Name="par" | | `-ParagraphComment 0x1530fe930 <col:4, col:41> | | `-TextComment 0x1530fe908 <col:4, col:41> Text=" New paragraph under the same heading." | |-ParagraphComment 0x1530fe978 <line:8:4> | | `-TextComment 0x1530fe950 <col:4> Text=" " | `-BlockCommandComment 0x1530fe998 <col:5, line:9:23> Name="par" | `-ParagraphComment 0x1530fe9e8 <col:4, col:23> | `-TextComment 0x1530fe9c0 <col:4, col:23> Text=" A second paragraph." `-CXXRecordDecl 0x1530f3bb0 <line:11:1, col:7> col:7 implicit class A ``` As we can see above, the optional paragraph heading (`"User defined paragraph"`) is not an argument of the `\par` `BlockCommandComment`, but instead a child `TextComment`. For documentation generators like [hdoc](https://hdoc.io/), it would be ideal if we could parse Doxygen documentation comments with these semantics in mind. Currently that's not possible. ### Change This change parses `\par` command according to how Doxygen parses them, making an optional header available as a an argument if it is present. In addition: - AST unit tests are defined to test this functionality when an argument is present, isn't present, with additional spacing, etc. - TableGen is updated with an `IsParCommand` to support this functionality - `lit` tests are updated where needed
Since commit 8d468c1, the header `openmp_wrappers/complex` is hidden behind `openmp_wrappers/complex.h` due to a bug in CMake[^1], so is not actually installed. To test the issue, you can ask `ninja` to generate the file on your build: ``` $ ninja lib/clang/19/include/openmp_wrappers/complex.h [199/199] Copying clang's openmp_wrappers/complex.h... $ ninja lib/clang/19/include/openmp_wrappers/complex ninja: error: unknown target 'lib/clang/19/include/openmp_wrappers/complex', did you mean 'lib/clang/19/include/openmp_wrappers/complex.h'? ``` Re-ordering the entries workarounds the issue. The other option is to revert the cited commit, but I'm not sure which approach is preferred. CC @etcwilde @jdoerfert [^1]: [Here](https://gitlab.kitware.com/cmake/cmake/-/issues/26058) is the CMake report on the issue.
Add additional test with salarized store which caused crashes with earlier versions of llvm#92555.
This reverts commit 6f538f6. Extra tests for crashes discovered when building Chromium have been added in fb86cb7, 3be7312. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, the very first test for `vector.transfer_write` is complemented with all the possible combinations: * scalable (rather than fixed) unit trailing dim, * dynamic (rather than static) trailing dim in the source memref. To this end, the following tests: * `@leading_scalable_dimension_transfer_write` `@trailing_scalable_one_dim_transfer_write` are replaced with: * `@drop_two_inner_most_dim_scalable_inner_dim` and `@negative_scalable_unit_dim`, respectively. In addition: * "_for_transfer_write" is removed from function names (to reduce noise). In addition, to maintain consistency between the tests for `xfer_read` and `xfer_write`, 2 negative tests for `xfer_read` are also renamed. This is to follow the suggestion made during the review of this PR. Extra comments in "VectorTransforms.cpp" are added to better document the limitations related to scalable vectors and which tests added here excercise. This is a follow-up for: llvm#94490 and llvm#94604 NOTE: This PR is limited to tests for `vector.transfer_write`.
This test occasionally fails on two of the busiest CI bots (asan and matrix), and we can't reproduce it locally. This leads to the hypothesis that the test is timing out (in the sense of the number of "join attempts" performed by this test's driver). This commit doubles the number of iterations performed and also does an NFC refactor of the main test loop so that it can be more easily understood.
…mport name is used (llvm#80169)
… contexts (llvm#95963)" This reverts commit dadf960. The commit caused `TestEarlyProcessLaunch.py` to fail on the macOS bots.
This is the paper that added the 'restrict' keyword. Clang is conforming to the letter of the standard's requirements, so it would be defensible for us to claim full support instead. However, LLVM does not currently support the optimization semantics with restricted local variables or data members, only with restricted pointers declared in function parameters. So we're only claiming partial support because we don't yet take full advantage of what the feature allows.
This change by itself has no measurable effect on the LLDB testsuite. I'm making it in preparation for threading through more errors in the Swift language plugin.
This is de facto an NFC change for Objective-C but will benefit the Swift language plugin.
…th functions (llvm#95159) Part of llvm#93566.
This pass is not used in any pipeline, barely used in tests and not really useful, so drop it. The only place where we "repeat" passes is devirt repetition, and that is done using a separate pass.
…nit (llvm#95580) This checks if the layout of `std::initializer_list` is something Clang can handle much earlier and deduplicates the checks in CodeGen/CGExprAgg.cpp and AST/ExprConstant.cpp Also now diagnose `union initializer_list` (Fixes llvm#95495), bit-field for the size (Fixes a crash that would happen during codegen if it were unnamed), base classes (that wouldn't be initialized) and polymorphic classes (whose vtable pointer wouldn't be initialized).
…lvm#96013) Make LanguageRuntime::GetTypeBitSize return an optional. This should be NFC, though the ObjCLanguageRuntime implementation is (possibly) more defensive against returning 0. I'm not sure if it's possible for both `m_ivar.size` and `m_ivar.offset` to be zero. Previously, we'd return 0 and cache it, only to discard it the next time when finding it in the cache, and recomputing it again. The new code will avoid putting it in the cache in the first place.
This formatter doesn't currently provide much value. It only formats `SourceLocation` and `QualType`. The only formatting it does for `QualType` is call `getAsString()` on it. The main motivator for the removal however is that the formatter implementation can be very slow (since it uses the expression evaluator in non-trivial ways). Not infrequently do we get reports about LLDB being slow when debugging Clang, and it turns out the user was loading `ClangDataFormat.py` in their `.lldbinit` by default. We should eventually develop proper formatters for Clang data-types, but these are currently not ready. So this patch removes them in the meantime to avoid users shooting themselves in the foot, and giving the wrong impression of these being reference implementations.
Fold `mul (uitofp i1 X), Y` to `select i1 X, Y, 0.0` when the `mul` is `nnan` and `nsz` Proof: https://alive2.llvm.org/ce/z/_stiPm
We're ultimately expected to return an APValue simply pointing to the CallExpr, not any useful value. Do that by creating a global variable for the call.
The checks when building a thunk to decide if an arg needed to be cast to/from an integer or redirected via a pointer didn't match how arg types were changed in `canonicalizeThunkType`, this caused LLVM to ICE when using vector types as args due to incorrect types in a call instruction. Instead of duplicating these checks, we should check if the arg type differs between x64 and AArch64 and then cast or redirect as appropriate.
…llvm#96396) Reapply 4a7bf42 which was reverted in 34d44eb Not sure why there are tests elsewhere in clang that rely on the output of clang-format, but they were wrong
…at_provider (llvm#95704) The original implementation of HelperFunctions::consumeHexStyle always sets Style when it returns true, but this is difficult for a compiler to understand since it requires seeing that Str starts with either an "x" or an "X" when starts_with_insensitive("x") return true. In particular, g++ 12 warns that HS may be used uninitialized in the format_provider::format caller. Change HelperFunctions::consumeHexStyle to return an optional HexPrintStyle and to make the fact that Str necessarily starts with an "X" when all other cases do not apply more explicit. This helps both the compiler and the human reader of the code. Co-authored-by: Sven Verdoolaege <sven.verdoolaege@gmail.com>
llvm#95197 and 7500646 eliminated all raw `new MCXXXFragment`. We can now place fragments in a bump allocator. In addition, remove the dead `Kind == FragmentType(~0)` condition. ~CodeViewContext may call `StrTabFragment->destroy()` and need to be reset before `FragmentAllocator.Reset()`. Tested by llvm/test/MC/COFF/cv-compiler-info.ll using asan. Pull Request: llvm#96402
There is only one caller after llvm#95188.
https://reviews.llvm.org/D67249 added content hash (see -fvalidate-ast-input-files-content) using llvm::hash_code (size_t). The hash value is 32-bit on 32-bit systems, which was unintentional. Fix llvm#96379: llvm#96136 switched the hash function to xxh3_64bit but did not update the ContentHash type, leading to mismatch between ASTReader and ASTWriter.
This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This is part 3 of 4 PRs. It sets the ground work for using the intrinsics in HLSL. Add HLSL frontend apis for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh` llvm#70079 llvm#70080 llvm#70081 llvm#70083 llvm#70084 llvm#95966
…n-constants If f(Y) simplifies to Y, replace with Y. This requires Y to be non-undef. Closes llvm#94719
Follow-up to 05ba5c0. uint32_t is preferred over const MCExpr * in the section stack uses because it should only be evaluated once. Change the paramter type to match.
Functions that have the `nvvm.kernel` attribute should have 0 results.
The `gpu.func` op lowering accounts for memref arguments/results (both "normal" and bare-pointer supported), but the `gpu.return` op lowering did not. The lowering produced invalid IR that did not verify. This commit uses the same lowering strategy as for `func.return` in the `gpu.return` lowering. (The C++ implementation is copied. We may want to share some code between `func` and `gpu` lowerings in the future.)
Define subtarget features for atomic fmin/fmax support. The flat/global support is a real messe. We had float/double support at the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them. gfx11 removed the f64 versions again. gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.
These have been replaced with atomicrmw fadd
… == c) == (b != c)` (llvm#94915) resolves llvm#92966 alive proof https://alive2.llvm.org/ce/z/bLAQBS
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
following llvm#94915 (comment) we can move the check from there inside
CheckInversion
and use the code forfoldOrOfInversions
to do the simplification