Skip to content

[WIP][SYCL] Add support for union types kernel arguments #2206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1,036 commits into from

Conversation

smanna12
Copy link
Contributor

@smanna12 smanna12 commented Jul 29, 2020

This patch adds support for union types as kernel parameter using existing codes of
"handleScalarType" and renames "handleScalarType" to "handleSimpleType" to
be more generic for scalars, vectors and unions types.

-Add kernel generation, integration header test, and runtime tests.

Signed-off-by: Soumi Manna soumi.manna@intel.com

@smanna12 smanna12 changed the title [WIP]{SYCL] Add support for union types kernel arguments [WIP][SYCL] Add support for union types kernel arguments Jul 29, 2020
@smanna12
Copy link
Contributor Author

smanna12 commented Jul 29, 2020

Patch is incomplete. Work is in progress.

@smanna12 smanna12 changed the title [WIP][SYCL] Add support for union types kernel arguments [WIP][SYCL] [Not Review]Add support for union types kernel arguments Jul 29, 2020
@smanna12 smanna12 changed the title [WIP][SYCL] [Not Review]Add support for union types kernel arguments [WIP][SYCL] Add support for union types kernel arguments Jul 29, 2020
Copy link
Contributor

@Fznamznon Fznamznon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add handling in SyclKernelIntHeaderCreator class.
I would also expect some sema test on kernel generation, integration header test and runtime test.

Comment on lines 948 to 950
} else if (FieldTy->isUnionType()) {
KF_FOR_EACH(handleUnionType, Field, FieldTy);
} else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If code of "handleScalarType" in "IntHeaderCreator" also suits for unions, we could unify it across scalars, unions and vectors, i.e. here you will have something like

} else if (FieldTy->isScalarType() || FieldTy->isVectorType()
 || FieldTy->isUnionType()) {
KF_FOR_EACH(handleSimpleType, Field, FieldTy);
}

i.e. handleScalarType can be renamed to something more generic (handleSimpleType in my example) and used for scalars, vectors and unions..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fznamznon, It seems like"handleScalarType" suits for unions. So this is renamed to "handleSimpleType".

andreil99 and others added 25 commits July 31, 2020 15:39
…led when they're being used

This patch fixed the issue that target memory might be deallocated when
they're still being used or before they're used.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84996
Add the optimizations we have in the SelectionDAG version.
Known non-negative copies all known bits. Any known one other than
the sign bit makes result non-negative.

Differential Revision: https://reviews.llvm.org/D85000
A recent change broke `ninja check-asan` on Darwin by causing an error
during linking of ASan unit tests [1].

Move the addition of `-ObjC` compiler flag outside of the new
`if(COMPILER_RT_STANDALONE_BUILD)` block.  It doesn't add any global
flags (e.g, `${CMAKE_CXX_FLAGS}`) and the decision to add is based
solely on source paths (`${source_rpath}`).

[1] 8b2fcc4, https://reviews.llvm.org/D84466

Differential Revision: https://reviews.llvm.org/D85057
Rather than hardcoding immediate values for 12 different combinations
in a nested pair of switches, we can perform the matched logic
operation on 3 magic constants to calculate the immediate.

Special thanks to this tweet https://twitter.com/rygorous/status/1187034321992871936
for making me realize I could do this.
…/store.

Refer to LangRef http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
'llvm.masked.load/store.*’ intrinsics are overloaded intrinsic, which allow the
load/store data to be a vector of any integer, floating-point or pointer data type.

Therefore, allow pointer data type when checking 'isLegalMaskedLoadStore()'.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D85045
These methods abstract away Error handling when trying to read options that can't be parsed by logging the error automatically and returning None.

Reviewed By: gribozavr2

Differential Revision: https://reviews.llvm.org/D84812
Change to expand all arguments and return values to i64 to follow ABI.
Update regression tests also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D84581
The patterns were incorrect copies from the FPU code, and are
unnecessary, since there's no extended load for SPE.  Just let LLVM
itself do the work by marking it expand.

Reviewed By: #powerpc, lkail
Differential Revision: https://reviews.llvm.org/D78670
SPE doesn't have a fsel instruction, so don't try to lower to it.

This fixes a "Cannot select: tN: f64 = PPCISD::FSEL tX, tY, tZ" error.

Reviewed By: #powerpc, lkail
Differential Revision: https://reviews.llvm.org/D77773
querying getSCEV() for incomplete phis leads to wrong cache value in `ExprToIVMap`,
because incomplete phis may be simplified to same value before get SCEV expression.

Reviewed By: lebedev.ri, mkazantsev

Differential Revision: https://reviews.llvm.org/D77560
… some code.

Now we try to load and broadcast together for operand 1. Followed
by load and broadcast for operand 1. Previously we tried load
operand 1, load operand 1, broadcast operand 0, broadcast operand 1.

Now we have a single helper that tries load and broadcast for
one operand that we can just call twice.
…d files

When checking for the style of a decl that isn't in the main file, the check will now search for the configuration that the included files uses to gather the style for its decls.

This can be useful to silence warnings in header files that follow a different naming convention without using header-filter to silence all warnings(even from other checks) in the header file.

Reviewed By: aaron.ballman, gribozavr2

Differential Revision: https://reviews.llvm.org/D84814
Patch improves performance of verify-machineinstrs pass up to 10x.
Differential revision: https://reviews.llvm.org/D84105
…f32/v4i32

Minor precursor fix for D66004, but helps the SSE41 tests as well as they run with -disable-peephole
abs() should be rare enough that using value tracking is not going
to be a compile-time cost burden, so use it to reduce a variety of
potential patterns. We do this in DAGCombiner too.

Differential Revision: https://reviews.llvm.org/D85043
tkrasnukha and others added 27 commits August 4, 2020 12:22
…macro definition"

The /Zc:__cplusplus option fixes GTEST_LANG_CXX11 value but not GTEST_HAS_TR1_TUPLE,
so we still need to force the latter off.

Still pass the option since it is required by https://reviews.llvm.org/D78186 too.

Differential Revision: https://reviews.llvm.org/D84023
This patch moves a few spread out smin/smax tests to smin-smax-folds.ll
and adds additional test cases that expose further potential for
folds.
Currently, instruction level fast math flags are not considered when
generating patterns for the machine combiner.

This currently leads to some missed opportunities to generate FMAs in
combination with `#pragma clang fp contract (fast)`.

For example, when building the example below with -O3 for AArch64, no
FMADD is generated. If built with -O2 and the DAGCombiner is used
instead of the MachineCombiner for FMAs, an FMADD is generated.

With this patch, the same code is generated in both cases.

    float madd_contract(float a, float b, float c) {
    #pragma clang fp contract (fast)
      return (a * b) + c;
    }

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84930
…OR(X,Y) if the relevant elements are known zero.""

[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero (REAPPLIED)

This allows us to remove the (depth violating) code in getFauxShuffleMask where we were combining the OR(SHUFFLE,SHUFFLE) shuffle inputs as well, and not just the OR().

This is a minor step toward being able to shuffle combine from/to SELECT/BLENDV as a faux shuffle.

Reapplied with fixed signed/unsigned comparisons.
Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests.

Differential Review: https://reviews.llvm.org/D85120
This option was added a while back, to help improve AA around pointer
phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can
then prove more precise aliasing info.

Differential Revision: https://reviews.llvm.org/D82998
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/15718/steps/build%20stage%201/logs/stdio:

FAILED: /usr/bin/c++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/llvm-readobj -I/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/tools/llvm-readobj -Iinclude -I/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/include -march=broadwell -fPIC -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -fdiagnostics-color -ffunction-sections -fdata-sections -O3     -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/llvm-readobj/CMakeFiles/llvm-readobj.dir/ELFDumper.cpp.o -MF tools/llvm-readobj/CMakeFiles/llvm-readobj.dir/ELFDumper.cpp.o.d -o tools/llvm-readobj/CMakeFiles/llvm-readobj.dir/ELFDumper.cpp.o -c /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/tools/llvm-readobj/ELFDumper.cpp
/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/tools/llvm-readobj/ELFDumper.cpp: In function ‘llvm::Expected<const llvm::object::Elf_Mips_Options<ELFT>*> readMipsOptions(const uint8_t*, llvm::ArrayRef<unsigned char>&, bool&)’:
/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/tools/llvm-readobj/ELFDumper.cpp:3374:12: error: parse error in template argument list
     if (O->size < ExpectedSize)

Note: I played with godbolt.org and was able to catch the similar "error in template argument list" error when used gcc 4.9.0 with this code.
Fix: try to introduce a variable to store `O->size`, it helped to me in godbolt.
As discussed in D84949, this removes the constraint to cast since it does not
cause compile time degradation.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D85188
This is the final bit of work to relax the register allocation
requirements when code generating normal LLVM IR, which rarely
care about the result of inactive lanes. By using _PRED nodes
we can make better use of SVE's reversed instructions.

Also removes a redundant parameter from the min/max tests.

Differential Revision: https://reviews.llvm.org/D85142
When mapping an optional value, if the value is <none> and followed
by comments, there will be a parsing error. This patch helps fix this
issue.

e.g.,

When mapping the following YAML,

```
Sections:
  - Name:  blah
    Type:  SHT_foo
    Flags: [[FLAGS=<none>]] ## some comments.
```

the raw value of `ScalarNode` is "<none> " rather than "<none>". We need
to remove the spaces.

Differential Revision: https://reviews.llvm.org/D85180
The CFA is calculated as (SP/FP + offset), but when there are
SVE objects on the stack the SP offset is partly scalable and
should instead be expressed as the DWARF expression:

     SP + offset + scalable_offset * VG

where VG is the Vector Granule register, containing the
number of 64bits 'granules' in a scalable vector.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84043
This patch adds a CFI entry for each SVE callee saved register
that needs unwind info at an offset from the CFA. The offset is
a DWARF expression because the offset is partly scalable.

The CFI entries only cover a subset of the SVE callee-saves and
only encodes the lower 64-bits, thus implementing the lowest
common denominator ABI. Existing unwinders may support VG but
only restore the lower 64-bits.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84044
This is a first patch that sweeps over tests to fix
indentation (tabs to spaces). It also adds label checks and
removes redundant matching of `%{{.*}} = `.

The following tests have been fixed:
- arithmetic-ops-to-llvm
- bitwise-ops-to-llvm
- cast-ops-to-llvm
- comparison-ops-to-llvm
- logical-ops-to-llvm (renamed to match the rest)

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D85181
The bug was not noticed because we didn't have a lot of custom type conversions
directly to LLVM dialect.

Differential Revision: https://reviews.llvm.org/D85192
With new LLVM dialect type modeling, the dialect types no longer wrap LLVM IR
types. Therefore, they need to be translated to and from LLVM IR during export
and import. Introduce the relevant functionality for translating types. It is
currently exercised by an ad-hoc type translation roundtripping test that will
be subsumed by the actual translation test when the type system transition is
complete.

Depends On D84339

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D85019
…deling

These are intended to smoothen the transition and may be removed in the future
in favor of more MLIR-compatible APIs. They intentionally have the same
semantics as the existing functions, which must remain stable until the
transition is complete.

Depends On D85019

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D85020
This should probably be moved up to some common area eventually
when there's another user.
A new first-party modeling for LLVM IR types in the LLVM dialect has been
developed in parallel to the existing modeling based on wrapping LLVM `Type *`
instances. It resolves the long-standing problem of modeling identified
structure types, including recursive structures, and enables future removal of
LLVMContext and related locking mechanisms from LLVMDialect.

This commit only switches the modeling by (a) renaming LLVMTypeNew to LLVMType,
(b) removing the old implementaiton of LLVMType, and (c) updating the tests. It
is intentionally minimal. Separate commits will remove the infrastructure built
for the transition and update API uses where appropriate.

Depends On D85020

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D85021
GCC5 seems to dislike generic lambdas calling a method of the class
containing the lambda without explicit `this`.
This is based on the existing code for the non-intrinsic idioms
in InstCombine.

The vector constant constraint is non-obvious: undefs should be
ok in the outer call, but they can't propagate safely from the
inner call in all cases. Example:

https://alive2.llvm.org/ce/z/-2bVbM
  define <2 x i8> @src(<2 x i8> %x) {
  %0:
    %m = umin <2 x i8> %x, { 7, undef }
    %m2 = umin <2 x i8> { 9, 9 }, %m
    ret <2 x i8> %m2
  }
  =>
  define <2 x i8> @tgt(<2 x i8> %x) {
  %0:
    %m = umin <2 x i8> %x, { 7, undef }
    ret <2 x i8> %m
  }
  Transformation doesn't verify!
  ERROR: Value mismatch

  Example:
  <2 x i8> %x = < undef, undef >

  Source:
  <2 x i8> %m = < #x00 (0)	[based on undef value], #x00 (0) >
  <2 x i8> %m2 = < #x00 (0), #x00 (0) >

  Target:
  <2 x i8> %m = < #x07 (7), #x10 (16) >
  Source value: < #x00 (0), #x00 (0) >
  Target value: < #x07 (7), #x10 (16) >
…nsfer_read into full and partial copies.

This revision adds a transformation and a pattern that rewrites a "maybe masked" `vector.transfer_read %view[...], %pad `into a pattern resembling:

```
   %1:3 = scf.if (%inBounds) {
      scf.yield %view : memref<A...>, index, index
    } else {
      %2 = linalg.fill(%extra_alloc, %pad)
      %3 = subview %view [...][...][...]
      linalg.copy(%3, %alloc)
      memref_cast %extra_alloc: memref<B...> to memref<A...>
      scf.yield %4 : memref<A...>, index, index
   }
   %res= vector.transfer_read %1#0[%1#1, %1#2] {masked = [false ... false]}
```
where `extra_alloc` is a top of the function alloca'ed buffer of one vector.

This rewrite makes it possible to realize the "always full tile" abstraction where vector.transfer_read operations are guaranteed to read from a padded full buffer.
The extra work only occurs on the boundary tiles.
Signed-off-by: Soumi Manna <soumi.manna@intel.com>
@smanna12
Copy link
Contributor Author

smanna12 commented Aug 4, 2020

Please do not review. Sorry wrong merge conflicts happen.

@smanna12
Copy link
Contributor Author

smanna12 commented Aug 4, 2020

I have created separate PR here: #2255

@smanna12 smanna12 closed this Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.