forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
Prif dialect #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Prif dialect #2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…40900) This is the bare minimum to get the intrinsic to compile for AMDGPU, and it's not optimal. We need to follow along closer with the existing G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case better. Just re-use the existing lowering for the old semantics for G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's treatment, nor try to handle the general expansion without an underlying min/max variant (or with G_FMINIMUM/G_FMAXIMUM).
This patch fixes: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8564:20: error: unused variable 'LoopRegionOf' [-Werror,-Wunused-variable]
Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync(i32, i32) - llvm.nvvm.barrier.cta.arrive(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)
This adds assembler/disassembler support for XSfmmbase 0.6 and related SiFive matrix multiplication extensions based on the spec here https://www.sifive.com/document-file/xsfmm-matrix-extensions-specification Functionality-wise, this is the same as the Zvma extension proposal that SiFive shared with the Attached Matrix Extension Task Group. The extension names and instruction mnemonics have been changed to use vendor prefixes. Note this is a non-conforming extension as the opcodes used here are in the standard opcode space in OP-V or OP-VE. --------- Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
… zero/NaN (llvm#139861) When the only user of select is a fcmp or a fp operation with nnan/nsz, the sign bit of zero/NaN can be ignored. Alive2: https://alive2.llvm.org/ce/z/ZcxeIv Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=7add1bcd02b1f72d580bb2e64a1fe4a8bdc085d9&to=cb419c7cbddce778673f3d4b414ed9b8064b8d6e&stat=instructions:u Closes llvm#133367.
…vm#140851) This patch makes getMRVFunctionsTracked return a reference. runIPSCCP, the sole user of getMRVFunctionsTracked, just needs a read-access to the map. The missing "&" is most likely an oversight as two "sibling" functions getTrackedRetVals and getTrackedGlobals return maps by const reference.
This patch enhances the performance of `std::for_each_n` when used with segmented iterators, leading to significant performance improvements, summarized in the tables below. This addresses a subtask of llvm#102817.
While processing members of a record, we try to create new record types as we encounter them, but if this would result in recursion (either because the type points to itself or because it points to a type that points back to the original type) we need to add it to a list for deferred processing. Previously, we issued an error saying this wasn't handled. This change adds the necessary handling.
This patch optimizes `bitset::to_string` by replacing the existing bit-by-bit processing with a more efficient bit traversal strategy. Instead of checking each bit sequentially, we leverage `std::__countr_zero` to efficiently locate the next set bit, skipping over consecutive zero bits. This greatly accelerates the conversion process, especially for sparse `bitset`s where zero bits dominate. To ensure similar improvements for dense `bitset`s, we exploit symmetry by inverting the bit pattern, allowing us to apply the same optimized traversal technique. Even for uniformly distributed `bitset`s, the proposed approach offers measurable performance gains over the existing implementation. Benchmarks demonstrate substantial improvements, achieving up to 13.5x speedup for sparse `bitset`s with `Pr(true bit) = 0.1`, 16.1x for dense `bitset`s with `Pr(true bit) = 0.9`, and 8.3x for uniformly distributed `bitset`s with `Pr(true bit) = 0.5)`.
When using `-no-pie` without a `SECTIONS` command, the linker uses the target's default image base. If `-Ttext=` or `--section-start` specifies an output section address below this base, the result is likely unintended. - With `--no-rosegment`, the PT_LOAD segment covering the ELF header cannot include `.text` if `.text`'s address is too low, causing an `error: output file too large`. - With default `--rosegment`: - If a read-only section (e.g., `.rodata`) exists, a similar `error: output file too large` occurs. - Without read-only sections, the PT_LOAD segment covering the ELF header and program headers includes no sections, which is unusual and likely undesired. This also causes non-ascending PT_LOAD `p_vaddr` values related to the PT_LOAD that overlaps with PT_PHDR (llvm#138584). To prevent these issues, report an error if a section address is below the image base and suggest `--image-base`. This check also applies when `--image-base` is explicitly set but is skipped when a `SECTIONS` command is used. Pull Request: llvm#140187
Fixed TableGen duplicate issues that causes the wrong interrupt attribute from being selected. resolves llvm#140701
This is a scoped helper similar to ApplyDebugLocation that creates a new source location atom group which instructions can be added to. A source atom is a source construct that is "interesting" for debug stepping purposes. We use an atom group number to track the instruction(s) that implement the functionality for the atom, plus backup instructions/source locations. This patch is part of a stack that teaches Clang to generate Key Instructions metadata for C and C++. RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668 The feature is only functional in LLVM if LLVM is built with CMake flag LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
This fixes a warning where a variable assigned in 'if' statement wasn't referenced again, and where else is used when 'if' has returns statement in the if-else statement
This change adds support for lowering BitCastOp
Same as llvm#139907 except there is now a special dovoidwork helper function. Previous approach with assert(f();return success;) failed tests for release builds, so I created a separate helper. Open to suggestions how to solve this more elegantly. Co-authored-by: Arslan Khabutdinov <akhabutdinov@fb.com>
This commit provides definitions of builtins with the generic address space. One concept to consider is the difference between supporting the generic address space from the user's perspective and the requirement for libclc as a compiler implementation detail to define separate generic address space builtins. In practice a target (like NVPTX) might notionally support the generic address space, but it's mapped to the same LLVM target address space as another address space (often the private one). In such cases libclc must be careful not to define both private and generic overloads of the same builtin. We track these two concepts separately, and make the assumption that if the generic address space does clash with another, it's with the private one. We track the concepts separately because there are some builtins such as atomics that are defined for the generic address space but not the private address space.
clang/lib/CodeGen/CGDebugInfo.cpp:153:2: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi] 153 | }; | ^ 1 error generated.
Fix for src scale opsel flags encoding and ASM parsing for gfx950 scaled MFMA.
The code to analyze VarDecls for the purpose of ensuring a magic-static isn't present in a 'routine' was getting confused/crashed because we create something that looks like a magic-static during error-recovery, but it is still an invalid decl. This patch causes us to just 'give up' in the case where the vardecl is already invalid. Fixes: llvm#140920
…m#140922) This was an oversight in the original patch series. Without this change, the newly added tests fail assertions.
Followup to llvm#138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM.
…outines (llvm#140834) Scalars inside device routines also need to implicitly set the DEVICE attribute, except for function results.
Make sure to cover all the scalable types which are legal, plus splitting. Make sure to cover all instructions. Not duplicating vx testing at this time.
This reverts commit b263c08. Looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate https://lab.llvm.org/buildbot/#/builders/41/builds/6825
As noted in review comment llvm#140922 (comment), this aren't required
/llvm-project/clang-tools-extra/clang-doc/HTMLMustacheGenerator.cpp:86:19: error: loop variable '[Name, FileName]' creates a copy from type 'std::pair<llvm::StringRef, llvm::StringRef> const' [-Werror,-Wrange-loop-construct] for (const auto [Name, FileName] : Partials) ^ /llvm-project/clang-tools-extra/clang-doc/HTMLMustacheGenerator.cpp:86:8: note: use reference type 'std::pair<llvm::StringRef, llvm::StringRef> const &' to prevent copying for (const auto [Name, FileName] : Partials) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ & 1 error generated.
Took xry's idea [^1] to improve the csrxchg instrinsic test case. [^1]: llvm#141037 (comment)
llvm#138227)" This reverts commit 64bb60a. Revert to give more time affected parties to adjust to the change.
I'm working on porting ASan to Wasm/WASI targets, and this is the first part of the change sets. I'll post runtime changes separately. This change makes `-fsanitize=address` available for WASI target by replicating what we do for Emscripten because they share the same memory model.
…riptor (llvm#140151) - define in-memory representation of optional non-flag parameters to `RootDescriptor` - fill in data to parse these params in `parseRootDescriptorParams` - add unit tests Part 3 of llvm#126577
…method (llvm#141127) - we will need to provide a way to dump `RootFlags` for serialization and by using operator overloads we can maintain a consistent interface This is an NFC to allow for llvm#138192 to be more straightforwardly implemented.
…vm#141143) Note: This relands llvm#140615 adding a ".count" suffix to the non-".all" variants. Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.count(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
…manner. (llvm#141161) This change would enable using `ir2vec::Embedding` which are float vectors in ML Inliner. Co-authored-by: svkeerthy <venkatakeerthy@google.com>
…m#141037) This patch adds support for the `q` constraint: a general-purpose register except for $r0 and $r1 (for the csrxchg instruction) Link: https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684339.html
- `CreateRunInTerminalReverseRequest` is passed a `StringMap` by value, whose keys are owned and deallocated after return. - Target args are wrongly specified as reverse request (launcher) args. - Test case error message did not catch up with a0aa5f8. All runInTerminal tests are passing with this applied.
* Changes the default synthetic symbol names to contain their file address This is a new PR after the first PR (llvm#137512) was reverted because it didn't update the way unnamed symbols were searched in the symbol table, which relied on the index being in the name. This time also added extra test to make sure the symbol is found as expected
…#141188) std::string::rfind accepts anything that can be converted to std::string_view starting in C++17. Since StringRef can be converted to std::string_view, we do not need to create a temporary instance of std::string here.
try_emplace can default-construct values, so we do not need to do so on our own. Plus, try_emplace(Key) is much simpler/shorter than insert({Key, LongValueType()}).
jvillette38
pushed a commit
that referenced
this pull request
Jun 9, 2025
Fixes llvm#123300 What is seen ``` clang-repl> int x = 42; clang-repl> auto capture = [&]() { return x * 2; }; In file included from <<< inputs >>>:1: input_line_4:1:17: error: non-local lambda expression cannot have a capture-default 1 | auto capture = [&]() { return x * 2; }; | ^ zsh: segmentation fault clang-repl --Xcc="-v" (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8) * frame #0: 0x0000000107b4f8b8 libclang-cpp.19.1.dylib`clang::IncrementalParser::CleanUpPTU(clang::PartialTranslationUnit&) + 988 frame #1: 0x0000000107b4f1b4 libclang-cpp.19.1.dylib`clang::IncrementalParser::ParseOrWrapTopLevelDecl() + 416 frame #2: 0x0000000107b4fb94 libclang-cpp.19.1.dylib`clang::IncrementalParser::Parse(llvm::StringRef) + 612 frame #3: 0x0000000107b52fec libclang-cpp.19.1.dylib`clang::Interpreter::ParseAndExecute(llvm::StringRef, clang::Value*) + 180 frame #4: 0x0000000100003498 clang-repl`main + 3560 frame llvm#5: 0x000000018d39a0e0 dyld`start + 2360 ``` Though the error is justified, we shouldn't be interested in exiting through a segfault in such cases. The issue is that empty named decls weren't being taken care of resulting into this assert https://github.com/llvm/llvm-project/blob/c1a229252617ed58f943bf3f4698bd8204ee0f04/clang/include/clang/AST/DeclarationName.h#L503 Can also be seen when the example is attempted through xeus-cpp-lite. 
jvillette38
pushed a commit
that referenced
this pull request
Jun 9, 2025
…142952) This was removed in llvm#135343 in favour of making it a format variable, which we do here. This follows the precedent of the `[opt]` and `[artificial]` markers. Before: ``` thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2 * frame #0: 0x000000010000037c a.out`inlined1() at inline.cpp:4:3 frame #1: 0x000000010000037c a.out`regular() at inline.cpp:6:17 frame #2: 0x00000001000003b8 a.out`inlined2() at inline.cpp:7:43 frame #3: 0x00000001000003b4 a.out`main at inline.cpp:10:3 frame #4: 0x0000000186345be4 dyld`start + 7040 ``` After (note the `[inlined]` markers): ``` thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2 * frame #0: 0x000000010000037c a.out`inlined1() at inline.cpp:4:3 [inlined] frame #1: 0x000000010000037c a.out`regular() at inline.cpp:6:17 frame #2: 0x00000001000003b8 a.out`inlined2() at inline.cpp:7:43 [inlined] frame #3: 0x00000001000003b4 a.out`main at inline.cpp:10:3 frame #4: 0x0000000186345be4 dyld`start + 7040 ``` rdar://152642178
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
This PR is a discussion around a dialect to define “prif” operations in Flang. This is a draft and a few operations have been proposed to start with and see if they might be suitable.
The set of operations will be based on what is presented in the PRIF specification, which can be found here: : https://doi.org/10.25344/S4CG6G