Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW23) #3852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 300 commits into from
Jun 3, 2021

Conversation

vmaksimo
Copy link
Contributor

Quuxplusone and others added 30 commits May 25, 2021 11:12
cxx20_iterator_traits.compile.pass.cpp actually depends on
implementation details of libc++, which is not great;
but I just left a comment and moved on.
- Currently, the host cpu information is not easily available on z/OS as in other platforms.
- This information is stored in the Communications Vector Table (https://www.ibm.com/docs/en/zos/2.2.0?topic=information-cvt-mapping)

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D102793
Currently, BPF only contains three relocations:
  R_BPF_NONE   for no relocation
  R_BPF_64_64  for LD_imm64 and normal 64-bit data relocation
  R_BPF_64_32  for call insn and normal 32-bit data relocation

Also .BTF and .BTF.ext sections contain symbols in allocated
program and data sections. These two sections reserved 32bit
space to hold the offset relative to the symbol's section.
When LLVM JIT is used, the LLVM ExecutionEngine RuntimeDyld
may attempt to resolve relocations for .BTF and .BTF.ext,
which we want to prevent. So we used R_BPF_NONE for such relocations.

This all works fine until when we try to do linking of
multiple objects.
  . R_BPF_64_64 handling of LD_imm64 vs. normal 64-bit data
    is different, so lld target->relocate() needs more context
    to do a correct job.
  . The same for R_BPF_64_32. More context is needed for
    lld target->relocate() to differentiate call insn vs.
    normal 32-bit data relocation.
  . Since relocations in .BTF and .BTF.ext are set to R_BPF_NONE,
    they will not be relocated properly when multiple .BTF/.BTF.ext
    sections are merged by lld.

This patch intends to address this issue by adding additional
relocation kinds:
  R_BPF_64_ABS64     for normal 64-bit data relocation
  R_BPF_64_ABS32     for normal 32-bit data relocation
  R_BPF_64_NODYLD32  for .BTF and .BTF.ext style relocations.
The old R_BPF_64_{64,32} semantics:
  R_BPF_64_64        for LD_imm64 relocation
  R_BPF_64_32        for call insn relocation

The existing R_BPF_64_64/R_BPF_64_32 mapping to numeric values
is maintained. They are the most common use cases for
bpf programs and we want to maintain backward compatibility
as much as possible.

ExecutionEngine RuntimeDyld BPF relocations are adjusted as well.
R_BPF_64_{ABS64,ABS32} relocations will be resolved properly and
other relocations will be ignored.
Two tests are added for RuntimeDyld. Not handling R_BPF_64_NODYLD32 in
RuntimeDyldELF.cpp will result in "Relocation type not implemented yet!"
fatal error.

FK_SecRel_4 usages in BPFAsmBackend.cpp and BPFELFObjectWriter.cpp
are removed as they are not triggered in BPF backend.
BPF backend used FK_SecRel_8 for LD_imm64 instruction operands.

Differential Revision: https://reviews.llvm.org/D102712
Said function had a few shortfalls:
- didn't set an abort message on Android
- was logged on several lines
- didn't provide extra information like the size requested if OOM'ing

This improves the function to address those points.

Differential Revision: https://reviews.llvm.org/D103034
…to intptr_t

A test in ir.c makes use of casting a void* to an integer type to print it's address. This cast is currently done with the datatype `long` however, which is only guaranteed to be equal to the pointer width on LP64 system. Other platforms may use a length not equal to the pointer width. 64bit Windows as an example uses 32 bit for `long` which does not match the 64 bit pointers.
This also results in clang warning due to `-Wvoid-pointer-to-int-cast`.

Technically speaking, since the test only passes the value 42, it does not cause any issues, but it'd be nice to fix the warning at least.

Differential Revision: https://reviews.llvm.org/D103085
All users of the builder should set an insert point before using the
builder. There should be no need for using InsertPointGuard here.
…s on AVX1

Determined from llvm-mca analysis, AVX1 capable targets have a higher throughput for VPBLENDVB and shuffle ops, making it cheaper to perform shift+shuffle/select shift patterns.
Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
We are using TOCEntry symbols like `LC..0` in TOC loads,
this is hard to read , at least requiring an additional step to figure
out the loaded symbols.

We should print out the name in comments.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D102949
Removed some of the older raw "MLIRized" versions that are
no longer needed now that the sparse runtime support library
can focus on the proper sparse tensor types rather than the
opague pointer approach of the past. This avoids legacy...

Reviewed By: penpornk

Differential Revision: https://reviews.llvm.org/D102960
All callers pass "false" for the Equality parameter.  Kill the dead code, and update the function block comment.
The parseInputFile function returns an empty unique_ptr to signal an
error, like when the input file doesn't exist, or is malformed. In this
case, the tool should exit immediately rather than segfault by
dereferencing the unique_ptr later.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102891
Stylistic changes only.
1) Don't pass a parameter just to do an early exit.
2) Use a name which matches actual behavior.
This reverts commit 0bebda1.

Causing "Invalid record" errors.
The 2nd test is based on the fuzzer example in post-commit
comments of D101191 -
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661

The 1st test shows that we don't deal with this symmetrically.
We should be able to reduce both examples (possibly in
instsimplify instead of instcombine).
…banks

This function can change regbank for registers which already have a selected
bank. Depending on the instruction where these registers were used it can
cause instruction selection to fail.
A recent fix for problems with ENTRY statement handling didn't
get the case of a procedure dummy argument on an ENTRY statement
in an executable part right; the code presumed that those dummy
arguments would be objects, not entities that might be objects or
procedures.  Fix.

Differential Revision: https://reviews.llvm.org/D103098
llvm-profgen uses profile summary based cold threshold to merge and trim cold context profile. This is to strike a good balance between profile size and performance.

We've been using 99.9% as the cutoff to save profile size without affecting performance. This change switch to use 99.9% instead of 99.9999% as default cold threshold cutoff for llvm-profgen.

Redundant switch csprof-cold-thres is also removed and tests cleaned up.

Differential Revision: https://reviews.llvm.org/D103071
Update the paragraph on generic / indexed_generic to reflect the unification of these operations.

Differential Revision: https://reviews.llvm.org/D102775
Make sure that if SCUDO_DEBUG=1 in tests
then we had the same in the scudo
library itself.

Reviewed By: cryptoad, hctim

Differential Revision: https://reviews.llvm.org/D103061
Cast of signed types to u64 breaks comparison.
Also remove double () around operands.

Reviewed By: cryptoad, hctim

Differential Revision: https://reviews.llvm.org/D103060
…ly infinite loops into finite ones

Nowadays LLVM does not assume that all loops are finite,
so if we want to produce a finite loop from a potentially-infinite one,
we must ensure that the original loop is known to be a finite one.

For this transform, it only matters for arithmetic right-shifts.
For them, either the function or the loop must be known to
be `mustprogress`, or the original value being shifted must be known
to be non-negative (because iff the sign bit was set,
it will never become zero, but will become `-1` in the "end").

It would be really good for alive2 to actually complain about this,
but it currently does not: AliveToolkit/alive2#726
Now that we can fold some transposes into multiplies (CM: A * B^t and RM:
A^t * B), we want to move them around to create the optimal expressions:

* fold away double transposes while still using them to assert the shape
* sink transposes hoping they cancel out
* lift transposes when both operands are transposed

This also modifies the matrix remarks to include the number of exposed
transposes (i.e. transposes that we couldn't fold into a multiply).

The adjustment to the test remarks-inlining is a bit subtle: I am changing the
double transpose to a single transpose so that we don't remove it completely.
More importantly this changes some of the total instruction count, most
notable stores because we can no longer use a vector store.

Differential Revision: https://reviews.llvm.org/D102733
This patch is the third in a series of patches fixing markdown links and references inside the mlir documentation.

This patch addresses all broken references to other markdown files and sections inside the Tutorials folder.

Differential Revision: https://reviews.llvm.org/D103017
…alue

The semantics of select with undefined/poison condition
are not explicitly stated in the LangRef, but this matches
comments in the code and Alive2 appears to concur:
https://alive2.llvm.org/ce/z/KXytmd

We can find this pattern after demanded elements transforms.

As noted in D101191, fuzzers are finding infinite loops because
we may not account for this pattern in other passes.
nicolasvasilache and others added 16 commits May 27, 2021 12:48
…(5/n)

This revision refactors and simplifies the pattern detection logic: thanks to SSA value properties, we can actually look at all the uses of a given value and avoid having to pattern-match specific chains of operations.

A bufferization pattern for subtensor is added and specific inplaceability analysis is implemented for the simple case of subtensor. More advanced use cases will follow.

Differential revision: https://reviews.llvm.org/D102512
WG14 adopted N2645 and WG21 EWG has accepted P2334 in principle (still
subject to full EWG vote + CWG review + plenary vote), which add
support for #elifdef as shorthand for #elif defined and #elifndef as
shorthand for #elif !defined. This patch adds support for the new
preprocessor directives.
For uniform ReplicateRecipes, only the first lane should be used, so
sinking them would mean we have to compute the value of the first lane
multiple times. Also, at the moment, sinking them causes a crash because
the value of the first lane is re-used by all users.

Reported post-commit for D100258.
The vector calling convention dictates that when the vector argument
registers are exhaused, GPRs are used to pass the address via the stack.
When the GPRs themselves are exhausted, at best we would previously
crash with an assertion, and at worst we'd generate incorrect code.

This patch addresses this issue by passing fixed-length vectors via the
stack with their full fixed-length size and aligned to their element
type size. Since the calling convention lowering can't yet handle
scalable vector types, this patch adds a fatal error to make it clear
that we are lacking in this regard.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102422
DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told
whether it's to use vector types and also whether it's to issue a
truncating store. However, the truncating store code path assumes a
scalar integer `ConstantSDNode`, and when using vector types it creates
either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which
is a constant.

The `riscv64` target is able to expose a crash here because it switches
on both code paths at the same time. The `f32` is stored as `i32` which
must be promoted to `i64`, necessitating a truncating store.
It also decides later that it prefers a vector store of `v2f32`.

While vector truncating stores are legal, this combine is not able to
emit them. We also don't have a test case. This patch adds an assert to
catch this case more gracefully, and updates one of the caller functions
to the function to turn off the use of truncating stores when preferring
vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103173
We were accidentally leaning on code in lowerLoad which expands
extending loads which should be removed.
* Fix for ParseChecksum function

This is a patch to fix failure, caused by bringing as argument of ParseChecksum
StringRef with non-zero length, but with empty data.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@b768773
Some FPGA devices and toolchains can support customizable levels or
implementation of pipeline parallelism when mapping a SPIR-V module
to hardware. Through pipeline parallelism, multiple invocations of
a kernel or function can execute concurrently.

This extension adds decorations to request that a kernel or function
support invocations at a specified initiation interval, that multiple
invocations are forbidden from executing concurrently, or that the
kernel or function is limited to a maximum number of concurrent
invocations.

Spec:
https://github.com/KhronosGroup/SPIRV-Registry/blob/38a7e56b9b7feab7fa8b3bea08770f7232b76ed2/extensions/INTEL/SPV_INTEL_fpga_invocation_pipelining_attributes.asciidoc

Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@e303a5f
This extension adds a type-declaration instruction OpTypeTokenINTEL,
which is an analog of LLVM's type token. This type specifies IDs of
entry points and exit points of certain IL constructs, for example it
can be useful for exception handling representation in IL.

Spec: intel#3788

Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@091c366
When some global variable contains constant function pointer to a
function and this global variable is referenced from the pointed
function, translation crashed because intializer for global variable was
translated before mapping of OpVariable to LLVM value. It happened
because during translation of initializer in such case the original
variable was met for the second time, but it wasn't mapped to a proper
llvm value.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@485c8c7
@vmaksimo
Copy link
Contributor Author

/summary:run

Breaks check-llvm on non-linux, see comments on https://reviews.llvm.org/D85085
This reverts commit caae570
and follow-up commit 1546c52.
@vmaksimo
Copy link
Contributor Author

vmaksimo commented Jun 2, 2021

/summary:run

@vmaksimo vmaksimo marked this pull request as ready for review June 2, 2021 16:50
@vladimirlaz vladimirlaz merged commit 541e697 into intel:sycl Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.