Skip to content

LLVM and LLVM-SPIRV-Translator pulldown #1509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1,022 commits into from

Conversation

vladimirlaz
Copy link
Contributor

@vladimirlaz vladimirlaz commented Apr 13, 2020

iclsrc and others added 30 commits April 1, 2020 19:35
If we have a must-tail call the callee and caller need to have matching
ABIs. Part of that is alignment which we might modify when we deduce
alignment of arguments of either. Since we would need to keep them in
sync, which is not as simple, we simply avoid deducing alignment for
arguments of the must-tail caller or callee.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D76673
This reverts commit 0071eaa.
Inputs/noop-main.ll wasn't checked in, so this breaks check-llvm
everywhere.
Use DL & ABI information for better alignment deduction, e.g., if a type
is accessed and the ABI specifies an alignment requirement for such an
access we can use it. This is based on a patch by @lebedev.ri and
inspired by getBaseAlign in Loads.cpp.

Depends on D76673.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D76674
It could happen that we delete the first function in the SCC in the
future so we should be careful accessing `Functions` after the manifest
stage.
While D68850 allowed functions to be deleted I accidentally saved some
version of the function to be used once a suitable prefix was found.
This turned out to be problematic when the occasionally deleted function
is also occasionally modified. The test case is adjusted to resemble the
case in which the problem was found.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D76586
This cannot be triggered right now, as far as I know, but it doesn't
make sense to deduce a constant range on arguments of declarations.
Exposed during testing of AAValueSimplify extensions.
There was a TODO in genericValueTraversal to provide the context
instruction and due to the lack of it users that wanted one just used
something available. Unfortunately, using a fixed instruction is wrong
in the presence of PHIs so we need to update the context instruction
properly.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D76870
…e or library

Summary:
cmake fails with an error when attempting to evaluate $<TARGET_FILE:tgt>
where `tgt` is defined via an `add_custom_target` and thus the `TYPE`
is `UTILITY`. Requesting a TARGET_FILE only works on an `EXECUTABLE`
or one of a few differetnt types of `X_LIBRARY` (e.g. added via
`add_library` or `add_executable`). The logic as implemented in cmake
is below:

  enum TargetType
  {
    EXECUTABLE,
    STATIC_LIBRARY,
    SHARED_LIBRARY,
    MODULE_LIBRARY,
    OBJECT_LIBRARY,
    UTILITY,
    GLOBAL_TARGET,
    INTERFACE_LIBRARY,
    UNKNOWN_LIBRARY
  };

  if (target->GetType() >= cmStateEnums::OBJECT_LIBRARY &&
      target->GetType() != cmStateEnums::UNKNOWN_LIBRARY) {
    ::reportError(context, content->GetOriginalExpression(),
                  "Target \"" + name +
                    "\" is not an executable or library.");
    return nullptr;
  }

This has always been the case back to at least 3.12 (furthest I
checked) but this is causing a new failure in cmake 3.17 while
evaluating ExternalProjectAdd.

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77284
…int’ Intrinsic

The requirement for deopt parameter to be in gc parameter if it can
be modified by GC is very strong and difficult to follow.

The key example of why this can't work:
%p1 = bitcast i8* %p to i8*
statepoint [gc = (%p1)], [deopt = (%p1)]

The optimizer is allowed to replace either use (or both) of %p1 with %p.
If it updates only one of the two (entirely legal), the two sets do not overlap.

So this change removes the strong wording.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D77122
…vtable symbol.

In most cases, LLD prints its multiline diagnostic messages starting
additional lines with ">>> ". That greatly helps external tools to parse
the output, simplifying combining several lines of the log back into one
message. The patch fixes the only message I found that does not follow
the common pattern.

Differential Revision: https://reviews.llvm.org/D77132
…ns.h

This is not supported to change anything but allow us to reuse the math
functions separately from the device functions, e.g., source them at
different times. This will be used by the OpenMP overlay.

This also adds two `return` keywords that were missing.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D77238
The math wrapper handling is going to be replaced shortly and
d1705c1 was actually a precursor for
that.
It was added by D76591 for migration purposes (not all
printBranchOperand users have migrated to the overload with `uint64_t Address`).
Now that all have been migrated, the parameter can go away.
  CONFLICT (content): Merge conflict in clang/lib/Sema/Sema.cpp
This commit removes support for building against the system libc++abi,
which was supported on Apple platforms. This is basically never what we
want to do, since libc++ and libc++abi are coupled and building a trunk
libc++ against an older libc++abi can lead to incompatibilities (and
good luck debugging them!). It might have made some sense to support
that when the monorepo did not exist, however I don't think this is
anything but a footgun nowadays.

Furthermore, based on the newly-made assumption that we're building
against the monorepo libc++abi, we can simplify the search path logic
for finding libc++abi.

This area of our build system has a lot of technical debt accumulated,
and it's surprisingly difficult to change. We've tried different things
and failed several times in the past. I did test this change on our
Docker image for the build bots and on Apple platforms, however it is
possible that this breaks some unknown configuration, in which case it
should be fine to revert this (so we can try again!).
Summary:
As noted in documentation, different repetition modes have different trade-offs:

> .. option:: -repetition-mode=[duplicate|loop]
>
>  Specify the repetition mode. `duplicate` will create a large, straight line
>  basic block with `num-repetitions` copies of the snippet. `loop` will wrap
>  the snippet in a loop which will be run `num-repetitions` times. The `loop`
>  mode tends to better hide the effects of the CPU frontend on architectures
>  that cache decoded instructions, but consumes a register for counting
>  iterations.

Indeed. Example:

>>! In D74156#1873657, @lebedev.ri wrote:
> At least for `CMOV`, i'm seeing wildly different results
> |           | Latency | RThroughput |
> | duplicate | 1       | 0.8         |
> | loop      | 2       | 0.6         |
> where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

This isn't great for analysis, at least for schedule model development.

As discussed in excruciating detail in

>>! In D74156#1924514, @gchatelet wrote:
>>>! In D74156#1920632, @lebedev.ri wrote:
>> ... did that explanation of the question i'm having made any sense?
>
> Thx for digging in the conversation !
> Ok it makes more sense now.
>
> I discussed it a bit with @courbet:
>  - We want the analysis tool to stay simple so we'd rather not make it knowledgeable of the repetition mode.
>  - We'd like to still be able to select either repetition mode to dig into special cases
>
> So we could add a third `min` repetition mode that would run both and take the minimum. It could be the default option.
> Would you have some time to look what it would take to add this third mode?

there appears to be an agreement that it is indeed sub-par,
and that we should provide an optional, measurement (not analysis!) -time
way to rectify the situation.

However, the solutions isn't entirely straight-forward.

We can just add an actual 'multiplexer' `MinSnippetRepetitor`, because
if we just concatenate snippets produced by `DuplicateSnippetRepetitor`
and `LoopSnippetRepetitor` and run+measure that, the measurement will
naturally be different from what we'd get by running+measuring
them separately and taking the min.
([[ https://www.wolframalpha.com/input/?i=%28x%2By%29%2F2+%21%3D+min%28x%2C+y%29 | `time(D+L)/2 != min(time(D), time(L))` ]])

Also, it seems best to me to have a single snippet instead of generating
a snippet per repetition mode, since the only difference here is that the
loop repetition mode reserves one register for loop counter.

As far as i can tell, we can either teach `BenchmarkRunner::runConfiguration()`
to produce a single report given multiple repetitors (as in the patch),
or do that one layer higher - don't modify `BenchmarkRunner::runConfiguration()`,
produce multiple reports, don't actually print each one, but aggregate them somehow
and only print the final one.

Initially i've gone ahead with the latter approach, but it didn't look like a natural fit;
the former (as in the diff) does seem like a better fit to me.

There's also a question of the test coverage. It sure currently does work here:
```
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8fb949.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP R15 i_0x0'
    - 'CMOV64rr RBX RBX RBX i_0x0'
    - 'CMOV64rr RCX RCX RBX i_0x0'
    - 'CMOV64rr RDI RDI R10 i_0x0'
    - 'CMOV64rr RDX RDX RAX i_0x0'
    - 'CMOV64rr RSI RSI RAX i_0x0'
    - 'CMOV64rr R8 R8 R8 i_0x0'
    - 'CMOV64rr R9 R9 RDX i_0x0'
    - 'CMOV64rr R10 R10 RBX i_0x0'
    - 'CMOV64rr R11 R11 R14 i_0x0'
    - 'CMOV64rr R12 R12 R9 i_0x0'
    - 'CMOV64rr R13 R13 R12 i_0x0'
    - 'CMOV64rr R14 R14 R15 i_0x0'
    - 'CMOV64rr R15 R15 R13 i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'R15=0x0'
    - 'RBX=0x0'
    - 'RCX=0x0'
    - 'RDI=0x0'
    - 'R10=0x0'
    - 'RDX=0x0'
    - 'RSI=0x0'
    - 'R8=0x0'
    - 'R9=0x0'
    - 'R14=0x0'
    - 'R12=0x0'
    - 'R13=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.819, per_snippet_value: 12.285 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BF000000000000000048BB000000000000000048B9000000000000000048BF000000000000000049BA000000000000000048BA000000000000000048BE000000000000000049B8000000000000000049B9000000000000000049BE000000000000000049BC000000000000000049BD0000000000000000490F40C3490F40EF480F40DB480F40CB490F40FA480F40D0480F40F04D0F40C04C0F40CA4C0F40D34D0F40DE4D0F40E14D0F40EC4D0F40F74D0F40FD490F40C35B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-051eb3.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP RSI i_0x0'
    - 'CMOV64rr RBX RBX R9 i_0x0'
    - 'CMOV64rr RCX RCX RSI i_0x0'
    - 'CMOV64rr RDI RDI RBP i_0x0'
    - 'CMOV64rr RDX RDX R9 i_0x0'
    - 'CMOV64rr RSI RSI RDI i_0x0'
    - 'CMOV64rr R9 R9 R12 i_0x0'
    - 'CMOV64rr R10 R10 R11 i_0x0'
    - 'CMOV64rr R11 R11 R9 i_0x0'
    - 'CMOV64rr R12 R12 RBP i_0x0'
    - 'CMOV64rr R13 R13 RSI i_0x0'
    - 'CMOV64rr R14 R14 R14 i_0x0'
    - 'CMOV64rr R15 R15 R10 i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'RSI=0x0'
    - 'RBX=0x0'
    - 'R9=0x0'
    - 'RCX=0x0'
    - 'RDI=0x0'
    - 'RDX=0x0'
    - 'R12=0x0'
    - 'R10=0x0'
    - 'R13=0x0'
    - 'R14=0x0'
    - 'R15=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6083, per_snippet_value: 8.5162 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000048BE000000000000000048BB000000000000000049B9000000000000000048B9000000000000000048BF000000000000000048BA000000000000000049BC000000000000000049BA000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3480F40EE490F40D9480F40CE480F40FD490F40D1480F40F74D0F40CC4D0F40D34D0F40D94C0F40E54C0F40EE4D0F40F64D0F40FA4983C0FF75C25B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=min
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c7a47d.o
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2581f1.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP R10 i_0x0'
    - 'CMOV64rr RBX RBX R10 i_0x0'
    - 'CMOV64rr RCX RCX RDX i_0x0'
    - 'CMOV64rr RDI RDI RAX i_0x0'
    - 'CMOV64rr RDX RDX R9 i_0x0'
    - 'CMOV64rr RSI RSI RAX i_0x0'
    - 'CMOV64rr R9 R9 RBX i_0x0'
    - 'CMOV64rr R10 R10 R12 i_0x0'
    - 'CMOV64rr R11 R11 RDI i_0x0'
    - 'CMOV64rr R12 R12 RDI i_0x0'
    - 'CMOV64rr R13 R13 RDI i_0x0'
    - 'CMOV64rr R14 R14 R9 i_0x0'
    - 'CMOV64rr R15 R15 RBP i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'R10=0x0'
    - 'RBX=0x0'
    - 'RCX=0x0'
    - 'RDX=0x0'
    - 'RDI=0x0'
    - 'R9=0x0'
    - 'RSI=0x0'
    - 'R12=0x0'
    - 'R13=0x0'
    - 'R14=0x0'
    - 'R15=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6073, per_snippet_value: 8.5022 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF0000000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD490F40C3490F40EA5B415C415D415E415F5DC35541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD4983C0FF75C25B415C415D415E415F5DC3
...
```
but i open to suggestions as to how test that.

I also have gone with the suggestion to default to this new mode.
This was irking me for some time, so i'm happy to finally see progress here.
Looking forward to feedback.

Reviewers: courbet, gchatelet

Reviewed By: courbet, gchatelet

Subscribers: mstojanovic, RKSimon, llvm-commits, courbet, gchatelet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76921
In d1705c1 (D77238) we accidentally included subsequent changes and
did not only move the code into a new file (which was the intention).
We undo the changes now and re-introduce them with the appropriate test
changes later.
This is a cleanup and normalization patch that also enables reuse with
Flang later on. A follow up will clean up and move the directive ->
clauses mapping.

Differential Revision: https://reviews.llvm.org/D77112
…d/OpenMP`"

This reverts commit c18d559.

Bots have reported uses that need changing, e.g.,
  clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cp
as reported by
  http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/46591
Summary: The assertion is almost correct, but it fails on refs from non-preamble

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77222
Replace pattern getContents().size with universe function call
Summary: For more details about this instruction, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Reviewers: craig.topper, RKSimon, LuoYuanke

Reviewed By: craig.topper

Subscribers: mgorny, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77193
This work prepares us for the overall goal of clean shutdown on user
keyboard interrupt [Ctrl+C].
Summary:
Reason: the option has an effect on preprocessing.

Also see thread: http://lists.llvm.org/pipermail/cfe-dev/2020-March/065014.html

Reviewers: chill, efriedma

Reviewed By: efriedma

Subscribers: efriedma, danielkiss, cfe-commits, kristof.beyls

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77131
kpneal and others added 17 commits April 3, 2020 15:47
…eg in PowerPC special fma compiler builtins"

The new test case causes bot failures.

This reverts commit ba87430.
It is an obvious part of D77326.

It removes some needless deep indentation and some redundant statements.
It prepares the code for a more clean next patch - DWARF index callbacks
in D77327.
… (LVI) Gadgets

Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph.

More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK).

Also adds a new target feature to X86: +lvi-load-hardening

The feature can be added via the clang CLI using -mlvi-hardening.

Differential Revision: https://reviews.llvm.org/D75936
Graceful lit shutdown on user keyboard interrupt [Ctrl+C] was a
longstanding goal of mine.  After a few refactorings this revision
finally enables it.  We use the following strategy to deal with
KeyboardInterrupt:
https://noswap.com/blog/python-multiprocessing-keyboardinterrupt

Printing of a helpful summary for interrupted runs (just as the one for
completed runs) will be tackled in future revisions.

Reviewed By: serge-sans-paille, rnk

Differential Revision: https://reviews.llvm.org/D77365
Summary:
Linalg makes it possible to interface codegen with externally precompiled HPC libraries. The mechanism to allow such interop uses a normalized ABI and the emission of C interface wrappers.

The mechanism controlling these C interface emission is too aggressive and makes it very easy to obtained undefined symbols for external function (e.g. the ones coming from libm).

This revision uses the newly introduced llvm.emit_c_interface function attribute which allows controlling this behavior at a function granularity. As a consequence LinalgToLLVM does not need to activate the C wrapper emission when adding the StdToLLVM patterns.

Differential Revision: https://reviews.llvm.org/D77364
See rational here: https://reviews.llvm.org/D76173#1922916
Time to compile Attr.h in isolation goes from 2.6s to 1.8s.

Original patch by Johannes, plus some additions from Reid to fix some
clang tooling targets.

Effect on transitive includes is marginal, though:

$ diff -u <(sort thedeps-before.txt) <(sort thedeps-after.txt) \
   | grep '^[-+] ' | sort | uniq -c | sort -nr
    104 -    /usr/local/google/home/rnk/llvm-project/clang/include/clang/AST/OpenMPClause.h
     87 -    /usr/local/google/home/rnk/llvm-project/llvm/include/llvm/Frontend/OpenMP/OMPContext.h
     19 -    /usr/local/google/home/rnk/llvm-project/llvm/include/llvm/ADT/SmallSet.h
     19 -    /usr/local/google/home/rnk/llvm-project/llvm/include/llvm/ADT/SetVector.h
     14 -    /usr/include/c++/9/set
...

Differential Revision: https://reviews.llvm.org/D76184
This replaces SPIR-V assembly tests for translation of FPFastMathMode
decorations with transcoding tests, adding a single SPIR-V assembly test
for the translation of the combination of the Fast and NotNaN flags.
Done automatically using:

```bash
$ for file in $(grep -r "llvm-spirv" --exclude-dir=transcoding --exclude-dir=DebugInfo --exclude="*.sp*" -B 10 -A 10 | grep "llvm-spirv -r" | grep -Po "^[\w\d/.]+(?=:.*$)" | sort | uniq); do git mv $file transcoding/$file; done
```
Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
This patch features two things:
- fixed translation of OpControlBarrier into OpenCL 2.0 built-ins:

  Scope of the operation is now respected and we can either get
  work_group_barrier or sub_group_barrier depending on it

- improved the translation so it handles non-constant 'Memory Semantic'
  operand of the instruction

Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
This extension relaxes the restriction that OpTypeInt must have a width of
32 bits. Ints of arbitrary bit widths can be beneficial on targets that can
exploit narrower widths such as FPGAs.

Signed-off-by: Viktoria Maksimova <viktoria.maksimova@intel.com>
The CreateShuffleVector API only accepts `int` or `uint32_t` ArrayRef
mask arguments.
 - Apply renaming OPT_cuda_gpu_arch_EQ to OPT_offload_arch_EQ
   in clang driver.
 - Apply space changes in AST dump to LIT test
 - Add file-table-tform to check-llvm dependencies

Signed-off-by: Vladimir Lazarev <vladimir.lazarev@intel.com>
@AlexeySotkin
Copy link
Contributor

@vladimirlaz please use the following notation to refer to a SPIRV-LLVM-Translator commit: KhronosGroup/SPIRV-LLVM-Translator@207924d -> KhronosGroup/SPIRV-LLVM-Translator@207924d

@vladimirlaz vladimirlaz deleted the private/vlazarev/pulldown branch April 28, 2020 17:13
jsji pushed a commit that referenced this pull request Jan 11, 2024
Use target triple's subarch component, if present, for setting exact
SPIR-V version for the SPIR-V emission.

Resolves #1509.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@9f29f97
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.