Prif dialect #2

JDPailleux · 2025-05-23T06:55:35Z

Hello,

This PR is a discussion around a dialect to define “prif” operations in Flang. This is a draft and a few operations have been proposed to start with and see if they might be suitable.

The set of operations will be based on what is presented in the PRIF specification, which can be found here: : https://doi.org/10.25344/S4CG6G

…40900) This is the bare minimum to get the intrinsic to compile for AMDGPU, and it's not optimal. We need to follow along closer with the existing G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case better. Just re-use the existing lowering for the old semantics for G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's treatment, nor try to handle the general expansion without an underlying min/max variant (or with G_FMINIMUM/G_FMAXIMUM).

This patch fixes: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8564:20: error: unused variable 'LoopRegionOf' [-Werror,-Wunused-variable]

Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync(i32, i32) - llvm.nvvm.barrier.cta.arrive(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)

This adds assembler/disassembler support for XSfmmbase 0.6 and related SiFive matrix multiplication extensions based on the spec here https://www.sifive.com/document-file/xsfmm-matrix-extensions-specification Functionality-wise, this is the same as the Zvma extension proposal that SiFive shared with the Attached Matrix Extension Task Group. The extension names and instruction mnemonics have been changed to use vendor prefixes. Note this is a non-conforming extension as the opcodes used here are in the standard opcode space in OP-V or OP-VE. --------- Co-authored-by: Brandon Wu <brandon.wu@sifive.com>

… zero/NaN (llvm#139861) When the only user of select is a fcmp or a fp operation with nnan/nsz, the sign bit of zero/NaN can be ignored. Alive2: https://alive2.llvm.org/ce/z/ZcxeIv Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=7add1bcd02b1f72d580bb2e64a1fe4a8bdc085d9&to=cb419c7cbddce778673f3d4b414ed9b8064b8d6e&stat=instructions:u Closes llvm#133367.

…vm#140851) This patch makes getMRVFunctionsTracked return a reference. runIPSCCP, the sole user of getMRVFunctionsTracked, just needs a read-access to the map. The missing "&" is most likely an oversight as two "sibling" functions getTrackedRetVals and getTrackedGlobals return maps by const reference.

This patch enhances the performance of `std::for_each_n` when used with segmented iterators, leading to significant performance improvements, summarized in the tables below. This addresses a subtask of llvm#102817.

While processing members of a record, we try to create new record types as we encounter them, but if this would result in recursion (either because the type points to itself or because it points to a type that points back to the original type) we need to add it to a list for deferred processing. Previously, we issued an error saying this wasn't handled. This change adds the necessary handling.

This patch optimizes `bitset::to_string` by replacing the existing bit-by-bit processing with a more efficient bit traversal strategy. Instead of checking each bit sequentially, we leverage `std::__countr_zero` to efficiently locate the next set bit, skipping over consecutive zero bits. This greatly accelerates the conversion process, especially for sparse `bitset`s where zero bits dominate. To ensure similar improvements for dense `bitset`s, we exploit symmetry by inverting the bit pattern, allowing us to apply the same optimized traversal technique. Even for uniformly distributed `bitset`s, the proposed approach offers measurable performance gains over the existing implementation. Benchmarks demonstrate substantial improvements, achieving up to 13.5x speedup for sparse `bitset`s with `Pr(true bit) = 0.1`, 16.1x for dense `bitset`s with `Pr(true bit) = 0.9`, and 8.3x for uniformly distributed `bitset`s with `Pr(true bit) = 0.5)`.

When using `-no-pie` without a `SECTIONS` command, the linker uses the target's default image base. If `-Ttext=` or `--section-start` specifies an output section address below this base, the result is likely unintended. - With `--no-rosegment`, the PT_LOAD segment covering the ELF header cannot include `.text` if `.text`'s address is too low, causing an `error: output file too large`. - With default `--rosegment`: - If a read-only section (e.g., `.rodata`) exists, a similar `error: output file too large` occurs. - Without read-only sections, the PT_LOAD segment covering the ELF header and program headers includes no sections, which is unusual and likely undesired. This also causes non-ascending PT_LOAD `p_vaddr` values related to the PT_LOAD that overlaps with PT_PHDR (llvm#138584). To prevent these issues, report an error if a section address is below the image base and suggest `--image-base`. This check also applies when `--image-base` is explicitly set but is skipped when a `SECTIONS` command is used. Pull Request: llvm#140187

Fixed TableGen duplicate issues that causes the wrong interrupt attribute from being selected. resolves llvm#140701

This is a scoped helper similar to ApplyDebugLocation that creates a new source location atom group which instructions can be added to. A source atom is a source construct that is "interesting" for debug stepping purposes. We use an atom group number to track the instruction(s) that implement the functionality for the atom, plus backup instructions/source locations. This patch is part of a stack that teaches Clang to generate Key Instructions metadata for C and C++. RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668 The feature is only functional in LLVM if LLVM is built with CMake flag LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.

This fixes a warning where a variable assigned in 'if' statement wasn't referenced again, and where else is used when 'if' has returns statement in the if-else statement

This change adds support for lowering BitCastOp

Same as llvm#139907 except there is now a special dovoidwork helper function. Previous approach with assert(f();return success;) failed tests for release builds, so I created a separate helper. Open to suggestions how to solve this more elegantly. Co-authored-by: Arslan Khabutdinov <akhabutdinov@fb.com>

This commit provides definitions of builtins with the generic address space. One concept to consider is the difference between supporting the generic address space from the user's perspective and the requirement for libclc as a compiler implementation detail to define separate generic address space builtins. In practice a target (like NVPTX) might notionally support the generic address space, but it's mapped to the same LLVM target address space as another address space (often the private one). In such cases libclc must be careful not to define both private and generic overloads of the same builtin. We track these two concepts separately, and make the assumption that if the generic address space does clash with another, it's with the private one. We track the concepts separately because there are some builtins such as atomics that are defined for the generic address space but not the private address space.

clang/lib/CodeGen/CGDebugInfo.cpp:153:2: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi] 153 | }; | ^ 1 error generated.

llvm#140788)

Fix for src scale opsel flags encoding and ASM parsing for gfx950 scaled MFMA.

The code to analyze VarDecls for the purpose of ensuring a magic-static isn't present in a 'routine' was getting confused/crashed because we create something that looks like a magic-static during error-recovery, but it is still an invalid decl. This patch causes us to just 'give up' in the case where the vardecl is already invalid. Fixes: llvm#140920

…m#140922) This was an oversight in the original patch series. Without this change, the newly added tests fail assertions.

Followup to llvm#138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM.

…outines (llvm#140834) Scalars inside device routines also need to implicitly set the DEVICE attribute, except for function results.

Make sure to cover all the scalable types which are legal, plus splitting. Make sure to cover all instructions. Not duplicating vx testing at this time.

This reverts commit b263c08. Looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate https://lab.llvm.org/buildbot/#/builders/41/builds/6825

As noted in review comment llvm#140922 (comment), this aren't required

…39614)" This reverts commit 0954c9d. It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).

/llvm-project/clang-tools-extra/clang-doc/HTMLMustacheGenerator.cpp:86:19: error: loop variable '[Name, FileName]' creates a copy from type 'std::pair<llvm::StringRef, llvm::StringRef> const' [-Werror,-Wrange-loop-construct] for (const auto [Name, FileName] : Partials) ^ /llvm-project/clang-tools-extra/clang-doc/HTMLMustacheGenerator.cpp:86:8: note: use reference type 'std::pair<llvm::StringRef, llvm::StringRef> const &' to prevent copying for (const auto [Name, FileName] : Partials) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ & 1 error generated.

Took xry's idea [^1] to improve the csrxchg instrinsic test case. [^1]: llvm#141037 (comment)

llvm#138227)" This reverts commit 64bb60a. Revert to give more time affected parties to adjust to the change.

I'm working on porting ASan to Wasm/WASI targets, and this is the first part of the change sets. I'll post runtime changes separately. This change makes `-fsanitize=address` available for WASI target by replicating what we do for Emscripten because they share the same memory model.

…m#141168)

…riptor (llvm#140151) - define in-memory representation of optional non-flag parameters to `RootDescriptor` - fill in data to parse these params in `parseRootDescriptorParams` - add unit tests Part 3 of llvm#126577

…method (llvm#141127) - we will need to provide a way to dump `RootFlags` for serialization and by using operator overloads we can maintain a consistent interface This is an NFC to allow for llvm#138192 to be more straightforwardly implemented.

…vm#141143) Note: This relands llvm#140615 adding a ".count" suffix to the non-".all" variants. Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.count(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)

…manner. (llvm#141161) This change would enable using `ir2vec::Embedding` which are float vectors in ML Inliner. Co-authored-by: svkeerthy <venkatakeerthy@google.com>

…m#141037) This patch adds support for the `q` constraint: a general-purpose register except for $r0 and $r1 (for the csrxchg instruction) Link: https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684339.html

- `CreateRunInTerminalReverseRequest` is passed a `StringMap` by value, whose keys are owned and deallocated after return. - Target args are wrongly specified as reverse request (launcher) args. - Test case error message did not catch up with a0aa5f8. All runInTerminal tests are passing with this applied.

…les (llvm#141170)

) ... that's been created in a different evaluation.

* Changes the default synthetic symbol names to contain their file address This is a new PR after the first PR (llvm#137512) was reverted because it didn't update the way unnamed symbols were searched in the symbol table, which relied on the index being in the name. This time also added extra test to make sure the symbol is found as expected

…#141188) std::string::rfind accepts anything that can be converted to std::string_view starting in C++17. Since StringRef can be converted to std::string_view, we do not need to create a temporary instance of std::string here.

try_emplace can default-construct values, so we do not need to do so on our own. Plus, try_emplace(Key) is much simpler/shorter than insert({Key, LongValueType()}).

…vm#141191)

Fixes llvm#123300 What is seen ``` clang-repl> int x = 42; clang-repl> auto capture = [&]() { return x * 2; }; In file included from <<< inputs >>>:1: input_line_4:1:17: error: non-local lambda expression cannot have a capture-default 1 | auto capture = [&]() { return x * 2; }; | ^ zsh: segmentation fault clang-repl --Xcc="-v" (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8) * frame #0: 0x0000000107b4f8b8 libclang-cpp.19.1.dylib`clang::IncrementalParser::CleanUpPTU(clang::PartialTranslationUnit&) + 988 frame #1: 0x0000000107b4f1b4 libclang-cpp.19.1.dylib`clang::IncrementalParser::ParseOrWrapTopLevelDecl() + 416 frame #2: 0x0000000107b4fb94 libclang-cpp.19.1.dylib`clang::IncrementalParser::Parse(llvm::StringRef) + 612 frame #3: 0x0000000107b52fec libclang-cpp.19.1.dylib`clang::Interpreter::ParseAndExecute(llvm::StringRef, clang::Value*) + 180 frame #4: 0x0000000100003498 clang-repl`main + 3560 frame llvm#5: 0x000000018d39a0e0 dyld`start + 2360 ``` Though the error is justified, we shouldn't be interested in exiting through a segfault in such cases. The issue is that empty named decls weren't being taken care of resulting into this assert https://github.com/llvm/llvm-project/blob/c1a229252617ed58f943bf3f4698bd8204ee0f04/clang/include/clang/AST/DeclarationName.h#L503 Can also be seen when the example is attempted through xeus-cpp-lite. ![image](https://github.com/user-attachments/assets/9b0e6ead-138e-4b06-9ad9-fcb9f8d5bf6e)

…142952) This was removed in llvm#135343 in favour of making it a format variable, which we do here. This follows the precedent of the `[opt]` and `[artificial]` markers. Before: ``` thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2 * frame #0: 0x000000010000037c a.out`inlined1() at inline.cpp:4:3 frame #1: 0x000000010000037c a.out`regular() at inline.cpp:6:17 frame #2: 0x00000001000003b8 a.out`inlined2() at inline.cpp:7:43 frame #3: 0x00000001000003b4 a.out`main at inline.cpp:10:3 frame #4: 0x0000000186345be4 dyld`start + 7040 ``` After (note the `[inlined]` markers): ``` thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2 * frame #0: 0x000000010000037c a.out`inlined1() at inline.cpp:4:3 [inlined] frame #1: 0x000000010000037c a.out`regular() at inline.cpp:6:17 frame #2: 0x00000001000003b8 a.out`inlined2() at inline.cpp:7:43 [inlined] frame #3: 0x00000001000003b4 a.out`main at inline.cpp:10:3 frame #4: 0x0000000186345be4 dyld`start + 7040 ``` rdar://152642178

arsenm and others added 30 commits May 21, 2025 17:00

[Vectorize] Fix a warning

a28d753

This patch fixes: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8564:20: error: unused variable 'LoopRegionOf' [-Werror,-Wunused-variable]

[gn build] Port b263c08

ec9c8e9

[libc++] Optimize std::for_each_n for segmented iterators (llvm#135468)

09c266b

This patch enhances the performance of `std::for_each_n` when used with segmented iterators, leading to significant performance improvements, summarized in the tables below. This addresses a subtask of llvm#102817.

Add live in for PrivateSegmentSize in GISel path (llvm#139968)

91b2e5b

[clang][TableGen] Fix Duplicate Entries in TableGen (llvm#140828)

385752c

Fixed TableGen duplicate issues that causes the wrong interrupt attribute from being selected. resolves llvm#140701

[gn build] Port 09c266b

878c976

[CIR][NFC] Fix an unused variable warning (llvm#140783)

9db6c32

This fixes a warning where a variable assigned in 'if' statement wasn't referenced again, and where else is used when 'if' has returns statement in the if-else statement

[CIR][LLVMLowering] Upstream Bitcast lowering (llvm#140774)

584616c

This change adds support for lowering BitCastOp

Fix-forward excess ';' from 9459c83 (llvm#134632)

57763b7

clang/lib/CodeGen/CGDebugInfo.cpp:153:2: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi] 153 | }; | ^ 1 error generated.

[lldb][lldb-dap][tests] Make sure evaluate test exists with no errors. (

d219a71

llvm#140788)

[AMDGPU] Fix scale opsel flags for scaled MFMA operations (llvm#140183)

e12cbd8

Fix for src scale opsel flags encoding and ASM parsing for gfx950 scaled MFMA.

[RISCV] Support scalable vectors for the zvqdotq lowering paths (llv…

68995b1

…m#140922) This was an oversight in the original patch series. Without this change, the newly added tests fail assertions.

[flang] [cuda] implicitly set DEVICE attribute to scalars in device r…

4042a00

…outines (llvm#140834) Scalars inside device routines also need to implicitly set the DEVICE attribute, except for function results.

[RISCV] Expand zvqdotq partial.reduce test variants

04d10f1

Make sure to cover all the scalable types which are legal, plus splitting. Make sure to cover all instructions. Not duplicating vx testing at this time.

Revert "[VPlan] Move predication to VPlanTransform (NFC). (llvm#128420)"

793bb6b

This reverts commit b263c08. Looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate https://lab.llvm.org/buildbot/#/builders/41/builds/6825

[RISCV] Remove nsw/nuw from zvqdotq tests [nfc]

cbe9af1

As noted in review comment llvm#140922 (comment), this aren't required

Revert "Add macro to suppress -Wunnecessary-virtual-specifier (llvm#1…

e4e7a7e

…39614)" This reverts commit 0954c9d. It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).

DamonFool and others added 25 commits May 23, 2025 09:00

[LoongArch][NFC] Improve csrxchg instrinsic test case (llvm#141060)

0635ef8

Took xry's idea [^1] to improve the csrxchg instrinsic test case. [^1]: llvm#141037 (comment)

Revert "[Modules] Don't fail when an unused textual header is missing. (

720014f

llvm#138227)" This reverts commit 64bb60a. Revert to give more time affected parties to adjust to the change.

[clang-doc] Precommit test for correct conversion function names (llv…

8268794

…m#141168)

[MLInlineAdvisor] Changing creation of TensorSpec in a type agnostic …

73b8330

…manner. (llvm#141161) This change would enable using `ir2vec::Embedding` which are float vectors in ML Inliner. Co-authored-by: svkeerthy <venkatakeerthy@google.com>

[Clang][LoongArch] Add inline asm support for the q constraint (llv…

689342d

…m#141037) This patch adds support for the `q` constraint: a general-purpose register except for $r0 and $r1 (for the csrxchg instruction) Link: https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684339.html

[mlir][docs] Fix typo in PassManagement.md (NFC) (llvm#140891)

8d30c73

[clang][Darwin] Add test checking for tls support on xros target trip…

835bba9

…les (llvm#141170)

[BOLT] Remove redundant control flow statements (NFC) (llvm#141182)

c0e7a59

[lldb] Remove redundant control flow statements (NFC) (llvm#141183)

6979ab2

[mlir] Remove redundant control flow statements (NFC) (llvm#141184)

9c62446

[mlir] Use llvm::stable_sort (NFC) (llvm#141186)

6464238

[AST] Simplify string comparisons (NFC) (llvm#141189)

02ed6d8

[clang][bytecode] Diagnose placement-new'ing to a temporary (llvm#141099

3280596

) ... that's been created in a different evaluation.

[MCA] Use a range-based for loop (NFC) (llvm#141187)

782c066

[llvm] Use *Map::try_emplace (NFC) (llvm#141190)

cc78177

try_emplace can default-construct values, so we do not need to do so on our own. Plus, try_emplace(Key) is much simpler/shorter than insert({Key, LongValueType()}).

[CodeGen] Remove redundant calls to std::unique_ptr<T>::get (NFC) (ll…

3d02834

…vm#141191)

[DRAFT][PRIF] Dialect PRIF operations

bc4bd22

JDPailleux force-pushed the prif-dialect branch from bda0bd4 to bc4bd22 Compare May 23, 2025 07:02

JDPailleux closed this May 23, 2025

JDPailleux deleted the prif-dialect branch May 23, 2025 07:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prif dialect #2

Prif dialect #2

Uh oh!

JDPailleux commented May 23, 2025

Uh oh!

Uh oh!

Prif dialect #2

Prif dialect #2

Uh oh!

Conversation

JDPailleux commented May 23, 2025

Uh oh!

Uh oh!