Skip to content

Prif dialect #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7,378 commits into from
Closed

Prif dialect #2

wants to merge 7,378 commits into from

Conversation

JDPailleux
Copy link
Member

Hello,

This PR is a discussion around a dialect to define “prif” operations in Flang. This is a draft and a few operations have been proposed to start with and see if they might be suitable.

The set of operations will be based on what is presented in the PRIF specification, which can be found here: : https://doi.org/10.25344/S4CG6G

arsenm and others added 30 commits May 21, 2025 17:00
…40900)

This is the bare minimum to get the intrinsic to compile for AMDGPU,
and it's not optimal. We need to follow along closer with the existing
G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case
better.

Just re-use the existing lowering for the old semantics for
G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's
treatment,
nor try to handle the general expansion without an underlying min/max
variant (or with G_FMINIMUM/G_FMAXIMUM).
This patch fixes:

  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8564:20: error:
  unused variable 'LoopRegionOf' [-Werror,-Wunused-variable]
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.

- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync(i32, i32)
- llvm.nvvm.barrier.cta.arrive(i32, i32)

The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:

* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)
This adds assembler/disassembler support for XSfmmbase 0.6 and related
SiFive matrix multiplication extensions based on the spec here
https://www.sifive.com/document-file/xsfmm-matrix-extensions-specification

Functionality-wise, this is the same as the Zvma extension proposal that
SiFive shared with the Attached Matrix Extension Task Group. The
extension names and instruction mnemonics have been changed to use
vendor prefixes.

Note this is a non-conforming extension as the opcodes used here are in
the standard opcode space in OP-V or OP-VE.

---------

Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
…vm#140851)

This patch makes getMRVFunctionsTracked return a reference.
runIPSCCP, the sole user of getMRVFunctionsTracked, just needs a
read-access to the map.

The missing "&" is most likely an oversight as two "sibling" functions
getTrackedRetVals and getTrackedGlobals return maps by const
reference.
This patch enhances the performance of `std::for_each_n` when used with
segmented iterators, leading to significant performance improvements,
summarized in the tables below. This addresses a subtask of
llvm#102817.
While processing members of a record, we try to create new record types
as we encounter them, but if this would result in recursion (either
because the type points to itself or because it points to a type that
points back to the original type) we need to add it to a list for
deferred processing. Previously, we issued an error saying this wasn't
handled. This change adds the necessary handling.
This patch optimizes `bitset::to_string` by replacing the existing bit-by-bit processing with a more efficient
bit traversal strategy. Instead of checking each bit sequentially, we leverage `std::__countr_zero` to efficiently
locate the next set bit, skipping over consecutive zero bits. This greatly accelerates the conversion process,
especially for sparse `bitset`s where zero bits dominate. To ensure similar improvements for dense `bitset`s, we
exploit symmetry by inverting the bit pattern, allowing us to apply the same optimized traversal technique. Even
for uniformly distributed `bitset`s, the proposed approach offers measurable performance gains over the existing
implementation.

Benchmarks demonstrate substantial improvements, achieving up to 13.5x speedup for sparse `bitset`s with
`Pr(true bit) = 0.1`, 16.1x for dense `bitset`s with `Pr(true bit) = 0.9`, and 8.3x for uniformly distributed
`bitset`s with `Pr(true bit) = 0.5)`.
When using `-no-pie` without a `SECTIONS` command, the linker uses the
target's default image base. If `-Ttext=` or `--section-start` specifies
an output section address below this base, the result is likely
unintended.

- With `--no-rosegment`, the PT_LOAD segment covering the ELF header cannot include `.text` if `.text`'s address is too low, causing an `error: output file too large`.
- With default `--rosegment`:
  - If a read-only section (e.g., `.rodata`) exists, a similar `error: output file too large` occurs.
  - Without read-only sections, the PT_LOAD segment covering the ELF header and program headers includes no sections, which is unusual and likely undesired. This also causes non-ascending PT_LOAD `p_vaddr` values related to the PT_LOAD that overlaps with PT_PHDR (llvm#138584).

To prevent these issues, report an error if a section address is below
the image base and suggest `--image-base`. This check also applies when
`--image-base` is explicitly set but is skipped when a `SECTIONS`
command is used.

Pull Request: llvm#140187
Fixed TableGen duplicate issues that causes the wrong interrupt
attribute from being selected.

resolves llvm#140701
This is a scoped helper similar to ApplyDebugLocation that creates a new source
location atom group which instructions can be added to.

A source atom is a source construct that is "interesting" for debug stepping
purposes. We use an atom group number to track the instruction(s) that implement
the functionality for the atom, plus backup instructions/source locations.

This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
This fixes a warning where a variable assigned in 'if' statement wasn't
referenced again, and where else is used when 'if' has returns statement
in the if-else statement
This change adds support for lowering BitCastOp
Same as llvm#139907 except there
is now a special dovoidwork helper function.
Previous approach with assert(f();return success;) failed tests for
release builds, so I created a separate helper. Open to suggestions how
to solve this more elegantly.

Co-authored-by: Arslan Khabutdinov <akhabutdinov@fb.com>
This commit provides definitions of builtins with the generic address
space.

One concept to consider is the difference between supporting the generic
address space from the user's perspective and the requirement for libclc
as a compiler implementation detail to define separate generic address
space builtins. In practice a target (like NVPTX) might notionally
support the generic address space, but it's mapped to the same LLVM
target address space as another address space (often the private one).

In such cases libclc must be careful not to define both private and
generic overloads of the same builtin. We track these two concepts
separately, and make the assumption that if the generic address space
does clash with another, it's with the private one. We track the
concepts separately because there are some builtins such as atomics that
are defined for the generic address space but not the private address
space.
clang/lib/CodeGen/CGDebugInfo.cpp:153:2: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
  153 | };
      |  ^
1 error generated.
Fix for src scale opsel flags encoding and ASM parsing for gfx950 scaled MFMA.
The code to analyze VarDecls for the purpose of ensuring a magic-static
isn't present in a 'routine' was getting confused/crashed because we
create something that looks like a magic-static during error-recovery,
but it is still an invalid decl.

This patch causes us to just 'give up' in the case where the vardecl is
already invalid.

Fixes: llvm#140920
…m#140922)

This was an oversight in the original patch series. Without this change,
the newly added tests fail assertions.
Followup to llvm#138741.

This adds the requested macro to silence
`-Wunnecessary-virtual-specifier` when declaring virtual anchor
functions in `final` classes, per [LLVM
policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers).

It also cleans up any remaining instances of the warning, allowing us to
stop disabling it when we build LLVM.
…outines (llvm#140834)

Scalars inside device routines also need to implicitly set the DEVICE
attribute, except for function results.
Make sure to cover all the scalable types which are legal, plus
splitting.  Make sure to cover all instructions.  Not duplicating
vx testing at this time.
This reverts commit b263c08.

Looks like this triggers a crash in one of the Fortran tests. Reverting
while I investigate
    https://lab.llvm.org/buildbot/#/builders/41/builds/6825
As noted in review comment llvm#140922 (comment), this aren't required
…39614)"

This reverts commit 0954c9d.

It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).
DamonFool and others added 25 commits May 23, 2025 09:00
/llvm-project/clang-tools-extra/clang-doc/HTMLMustacheGenerator.cpp:86:19:
error: loop variable '[Name, FileName]' creates a copy from type 'std::pair<llvm::StringRef, llvm::StringRef> const' [-Werror,-Wrange-loop-construct]
  for (const auto [Name, FileName] : Partials)
                  ^
/llvm-project/clang-tools-extra/clang-doc/HTMLMustacheGenerator.cpp:86:8:
note: use reference type 'std::pair<llvm::StringRef, llvm::StringRef> const &' to prevent copying
  for (const auto [Name, FileName] : Partials)
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                  &
1 error generated.
Took xry's idea [^1] to improve the csrxchg instrinsic test case.

[^1]:
llvm#141037 (comment)
llvm#138227)"

This reverts commit 64bb60a.

Revert to give more time affected parties to adjust to the change.
I'm working on porting ASan to Wasm/WASI targets, and this is the first
part of the change sets. I'll post runtime changes separately.

This change makes `-fsanitize=address` available for WASI target by
replicating what we do for Emscripten because they share the same memory
model.
…riptor (llvm#140151)

- define in-memory representation of optional non-flag parameters to
`RootDescriptor`
- fill in data to parse these params in `parseRootDescriptorParams`
- add unit tests

Part 3 of llvm#126577
…method (llvm#141127)

- we will need to provide a way to dump `RootFlags` for serialization
and by using operator overloads we can maintain a consistent interface

This is an NFC to allow for
llvm#138192 to be more
straightforwardly implemented.
…vm#141143)

Note: This relands llvm#140615 adding a ".count" suffix to the non-".all"
variants.

Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.

- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.count(i32, i32)

The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:

* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
…manner. (llvm#141161)

This change would enable using `ir2vec::Embedding` which are float
vectors in ML Inliner.

Co-authored-by: svkeerthy <venkatakeerthy@google.com>
…m#141037)

This patch adds support for the `q` constraint:
a general-purpose register except for $r0 and $r1 (for the csrxchg
instruction)

Link: https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684339.html
- `CreateRunInTerminalReverseRequest` is passed a `StringMap` by value,
whose keys are owned and deallocated after return.
- Target args are wrongly specified as reverse request (launcher) args.
- Test case error message did not catch up with a0aa5f8.

All runInTerminal tests are passing with this applied.
)

... that's been created in a different evaluation.
* Changes the default synthetic symbol names to contain their file
address

This is a new PR after the first PR (llvm#137512) was reverted because it
didn't update the way unnamed symbols were searched in the symbol table,
which relied on the index being in the name.

This time also added extra test to make sure the symbol is found as
expected
…#141188)

std::string::rfind accepts anything that can be converted to
std::string_view starting in C++17.  Since StringRef can be converted
to std::string_view, we do not need to create a temporary instance of
std::string here.
try_emplace can default-construct values, so we do not need to do so
on our own.  Plus, try_emplace(Key) is much simpler/shorter than
insert({Key, LongValueType()}).
@JDPailleux JDPailleux closed this May 23, 2025
@JDPailleux JDPailleux deleted the prif-dialect branch May 23, 2025 07:08
jvillette38 pushed a commit that referenced this pull request Jun 9, 2025
Fixes llvm#123300

What is seen 
```
clang-repl> int x = 42;
clang-repl> auto capture = [&]() { return x * 2; };
In file included from <<< inputs >>>:1:
input_line_4:1:17: error: non-local lambda expression cannot have a capture-default
    1 | auto capture = [&]() { return x * 2; };
      |                 ^
zsh: segmentation fault  clang-repl --Xcc="-v"

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
  * frame #0: 0x0000000107b4f8b8 libclang-cpp.19.1.dylib`clang::IncrementalParser::CleanUpPTU(clang::PartialTranslationUnit&) + 988
    frame #1: 0x0000000107b4f1b4 libclang-cpp.19.1.dylib`clang::IncrementalParser::ParseOrWrapTopLevelDecl() + 416
    frame #2: 0x0000000107b4fb94 libclang-cpp.19.1.dylib`clang::IncrementalParser::Parse(llvm::StringRef) + 612
    frame #3: 0x0000000107b52fec libclang-cpp.19.1.dylib`clang::Interpreter::ParseAndExecute(llvm::StringRef, clang::Value*) + 180
    frame #4: 0x0000000100003498 clang-repl`main + 3560
    frame llvm#5: 0x000000018d39a0e0 dyld`start + 2360
```

Though the error is justified, we shouldn't be interested in exiting
through a segfault in such cases.

The issue is that empty named decls weren't being taken care of
resulting into this assert


https://github.com/llvm/llvm-project/blob/c1a229252617ed58f943bf3f4698bd8204ee0f04/clang/include/clang/AST/DeclarationName.h#L503

Can also be seen when the example is attempted through xeus-cpp-lite.


![image](https://github.com/user-attachments/assets/9b0e6ead-138e-4b06-9ad9-fcb9f8d5bf6e)
jvillette38 pushed a commit that referenced this pull request Jun 9, 2025
…142952)

This was removed in llvm#135343 in
favour of making it a format variable, which we do here. This follows
the precedent of the `[opt]` and `[artificial]` markers.

Before:
```
 thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
 * frame #0: 0x000000010000037c a.out`inlined1() at inline.cpp:4:3
   frame #1: 0x000000010000037c a.out`regular() at inline.cpp:6:17
   frame #2: 0x00000001000003b8 a.out`inlined2() at inline.cpp:7:43
   frame #3: 0x00000001000003b4 a.out`main at inline.cpp:10:3
   frame #4: 0x0000000186345be4 dyld`start + 7040
```

After (note the `[inlined]` markers):
```
thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
* frame #0: 0x000000010000037c a.out`inlined1() at inline.cpp:4:3 [inlined]
  frame #1: 0x000000010000037c a.out`regular() at inline.cpp:6:17
  frame #2: 0x00000001000003b8 a.out`inlined2() at inline.cpp:7:43 [inlined]
  frame #3: 0x00000001000003b4 a.out`main at inline.cpp:10:3
  frame #4: 0x0000000186345be4 dyld`start + 7040
```

rdar://152642178
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.