forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge main into amd-staging #784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
setzucc with memory operand is same as setcc but the later is shorter.
This was marked as deprecated in 2022, but as comment. Switch to error to make visible and stop generating. Will remove the error message in follow up, just felt this was easier for folks to understand compilation errors. The change required to new form is rather minimal.
…170994) Bitwidths greater than 64 are not supported by `arith-to-apfloat`.
This change makes inlining logic in the translator simpler and more
consistent by
(a) Extending the inlining concept to include CExpression ops, which by
definition are inlined if and only if they reside within an
ExpressionOp.
(b) Concentraing all inlining decisions in `shouldBeInlined()` to make
sure that ops get the same decision when queried as operations and
as operands.
The op was not added to `hasDeferredEmission()` when introduced by f17abc2, causing incorrect translation.
…k_wait (llvm#161086) This is to address llvm#146145 The issue before was that, for `std::atomic::wait/notify`, we only support `uint64_t` to go through the native `ulock_wait` directly. Any other types will go through the global contention table's `atomic`, increasing the chances of spurious wakeup. This PR tries to allow any types that are of size 4 or 8 to directly go to the `ulock_wait`. This PR is just proof of concept. If we like this idea, I can go further to update the Linux/FreeBSD branch and add ABI macros so the existing behaviours are reserved under the stable ABI Here are some benchmark results ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- BM_stop_token_single_thread_reg_unreg_callback/1024 -0.1113 -0.1165 51519 45785 51397 45408 BM_stop_token_single_thread_reg_unreg_callback/4096 -0.2727 -0.1447 249685 181608 211865 181203 BM_stop_token_single_thread_reg_unreg_callback/65536 -0.1241 -0.1237 3308930 2898396 3300986 2892608 BM_stop_token_single_thread_reg_unreg_callback/262144 +0.0335 -0.1920 13237682 13681632 13208849 10673254 OVERALL_GEOMEAN -0.1254 -0.1447 0 0 0 0 ``` ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/65536 -0.3344 -0.2424 5960741 3967212 5232250 3964085 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/131072 -0.1474 -0.1475 9144356 7796745 9137547 7790193 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/262144 -0.1336 -0.1340 18333441 15883805 18323711 15868500 OVERALL_GEOMEAN -0.2107 -0.1761 0 0 0 0 ``` ``` Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<2>, NumHighPrioTasks<0>>/16384 +0.2321 -0.0081 836618 1030772 833197 826476 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<2>, NumHighPrioTasks<0>>/32768 -0.3034 -0.1329 2182721 1520569 1747211 1515028 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<2>, NumHighPrioTasks<0>>/65536 -0.0924 -0.0924 3389098 3075897 3378486 3066448 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<8>, NumHighPrioTasks<0>>/4096 +0.0464 +0.0474 664233 695080 657736 688892 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<8>, NumHighPrioTasks<0>>/8192 -0.0279 -0.0267 1336041 1298794 1324270 1288953 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<8>, NumHighPrioTasks<0>>/16384 +0.0270 +0.0304 2543004 2611786 2517471 2593975 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<32>, NumHighPrioTasks<0>>/1024 +0.0423 +0.0941 473621 493657 325604 356245 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<32>, NumHighPrioTasks<0>>/2048 +0.0420 +0.0675 906266 944349 636253 679169 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<32>, NumHighPrioTasks<0>>/4096 +0.0359 +0.0378 1761584 1824783 1015092 1053439 OVERALL_GEOMEAN -0.0097 -0.0007 0 0 0 0 ``` ``` Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/4096 -0.0990 -0.1001 371100 334370 369984 332955 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/8192 -0.0305 -0.0314 698228 676908 696418 674585 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/16384 -0.0258 -0.0268 1383530 1347894 1380665 1343680 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<8>, NumHighPrioTasks<0>>/1024 +0.0465 +0.4702 937821 981388 472087 694082 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<8>, NumHighPrioTasks<0>>/2048 +0.1596 +0.9140 1704819 1976899 616419 1179852 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<8>, NumHighPrioTasks<0>>/4096 -0.1018 -0.2316 3793976 3407609 1912209 1469331 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<32>, NumHighPrioTasks<0>>/256 +0.0395 +0.5818 30102662 31292982 174650 276270 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<32>, NumHighPrioTasks<0>>/512 -0.0065 +1.2860 33079634 32863968 162150 370680 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<32>, NumHighPrioTasks<0>>/1024 -0.0325 +0.4683 36581740 35392385 282320 414520 OVERALL_GEOMEAN -0.0084 +0.2878 0 0 0 0 ``` --------- Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
…1033) [CommandLine.cpp](https://github.com/llvm/llvm-project/blob/fb0400fe1f1f9e83f3148db8ce2c72ab5bc6728e/llvm/lib/Support/CommandLine.cpp#L940) treats single quote as literal characters on Windows, so the argument is parsed as a check named `' -*,llvm-namespace-comment '`, which matches no existing checks, so no checks are enabled via the command line. Previously, the test passed because it fell back to the root `.clang-tidy` configuration which enables `llvm-*`.
This patch adds Clang support for speculative devirtualization and integrates the related pass into the pass pipeline. It's building on the LLVM backend implementation from PR llvm#159048. Speculative devirtualization transforms an indirect call (the virtual function) to a guarded direct call. It is guarded by a comparison of the virtual function pointer to the expected target. This optimization is still safe without LTO because it doesn't do direct calls, it's conditional according to the function ptr. This optimization: - Opt-in: Disabled by default, enabled via `-fdevirtualize-speculatively` - Works in non-LTO mode - Handles publicly-visible objects. - Uses guarded devirtualization with fallback to indirect calls when the speculation is incorrect. For this C++ example: ``` class Base { public: __attribute__((noinline)) virtual void virtual_function1() { asm volatile("NOP"); } virtual void virtual_function2() { asm volatile("NOP"); } }; class Derived : public Base { public: void virtual_function2() override { asm volatile("NOP"); } }; __attribute__((noinline)) void foo(Base *BV) { BV->virtual_function1(); } void bar() { Base *b = new Derived(); foo(b); } ``` Here is the IR without enabling speculative devirtualization: ``` define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 { entry: %vtable = load ptr, ptr %BV, align 8, !tbaa !6 %0 = load ptr, ptr %vtable, align 8 tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV) ret void } ``` IR after enabling speculative devirtualization: ``` define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 { entry: %vtable = load ptr, ptr %BV, align 8, !tbaa !12 %0 = load ptr, ptr %vtable, align 8 %1 = icmp eq ptr %0, @_ZN4Base17virtual_function1Ev br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !15 if.true.direct_targ: ; preds = %entry tail call void @_ZN4Base17virtual_function1Ev(ptr noundef nonnull align 8 dereferenceable(8) %BV) br label %if.end.icp if.false.orig_indirect: ; preds = %entry tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV) br label %if.end.icp if.end.icp: ; preds = %if.false.orig_indirect, %if.true.direct_targ ret void } ```
…art) (llvm#164124) Replace ExtractLastElement and ExtractLastLanePerPart with more generic and specific ExtractLastLane and ExtractLastPart, which model distinct parts of extracting across parts and lanes. ExtractLastElement == ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart == ExtractLastLane, the latter clarifying the name of the opcode. A new m_ExtractLastElement matcher is provided for convenience. The patch should be NFC modulo printing changes. PR: llvm#164124
This PR enables the MLIR execution engine to dump object file as PIC code, which is needed when the object file is later bundled into a dynamic shared library. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
At the moment AMDGCN flavoured SPIRV uses the SPIRV ABI with some tweaks revolving around passing aggregates as direct. This is problematic in multiple ways: - it leads to divergence from code compiled for a concrete target, which makes it difficult to debug; - it incurs a run time cost, when dealing with larger aggregates; - it incurs a compile time cost, when dealing with larger aggregates. This patch switches over AMDGCN flavoured SPIRV to implement the AMDGPU ABI (except for dealing with variadic functions, which will be added in the future). One additional complication (and the primary motivation behind the current less than ideal state of affairs) stems from `byref`, which AMDGPU uses, not being expressible in SPIR-V. We deal with this by CodeGen-ing for `byref`, lowering it to the `FuncParamAttr ByVal` in SPIR-V, and restoring it when doing reverse translation from AMDGCN flavoured SPIR-V.
llvm-project/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp:312:19: warning: unused variable 'EB' [-Wunused-variable]
312 | VPBasicBlock *EB = Plan.getExitBlocks().front();
| ^~
This showed up in a non-assertions build.
Add support for vectorized operations such as `arith.addf ... : vector<4xf4E2M1FN>`. The computation is scalarized: scalar operands are extracted with `vector.to_elements`, multiple scalar computations are performed and the result is inserted back into a vector with `vector.from_elements`.
Collaborator
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.