-
Notifications
You must be signed in to change notification settings - Fork 769
[WIP][SYCL] Add support for union types kernel arguments #2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Patch is incomplete. Work is in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add handling in SyclKernelIntHeaderCreator
class.
I would also expect some sema test on kernel generation, integration header test and runtime test.
clang/lib/Sema/SemaSYCL.cpp
Outdated
} else if (FieldTy->isUnionType()) { | ||
KF_FOR_EACH(handleUnionType, Field, FieldTy); | ||
} else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If code of "handleScalarType" in "IntHeaderCreator" also suits for unions, we could unify it across scalars, unions and vectors, i.e. here you will have something like
} else if (FieldTy->isScalarType() || FieldTy->isVectorType()
|| FieldTy->isUnionType()) {
KF_FOR_EACH(handleSimpleType, Field, FieldTy);
}
i.e. handleScalarType
can be renamed to something more generic (handleSimpleType
in my example) and used for scalars, vectors and unions..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fznamznon, It seems like"handleScalarType" suits for unions. So this is renamed to "handleSimpleType".
…led when they're being used This patch fixed the issue that target memory might be deallocated when they're still being used or before they're used. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84996
Add the optimizations we have in the SelectionDAG version. Known non-negative copies all known bits. Any known one other than the sign bit makes result non-negative. Differential Revision: https://reviews.llvm.org/D85000
Differential revision: https://reviews.llvm.org/D80802
Differential revision: https://reviews.llvm.org/D81213
A recent change broke `ninja check-asan` on Darwin by causing an error during linking of ASan unit tests [1]. Move the addition of `-ObjC` compiler flag outside of the new `if(COMPILER_RT_STANDALONE_BUILD)` block. It doesn't add any global flags (e.g, `${CMAKE_CXX_FLAGS}`) and the decision to add is based solely on source paths (`${source_rpath}`). [1] 8b2fcc4, https://reviews.llvm.org/D84466 Differential Revision: https://reviews.llvm.org/D85057
actually verified.
Rather than hardcoding immediate values for 12 different combinations in a nested pair of switches, we can perform the matched logic operation on 3 magic constants to calculate the immediate. Special thanks to this tweet https://twitter.com/rygorous/status/1187034321992871936 for making me realize I could do this.
contains no packs. Fixes a regression from 740a164.
…/store. Refer to LangRef http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics 'llvm.masked.load/store.*’ intrinsics are overloaded intrinsic, which allow the load/store data to be a vector of any integer, floating-point or pointer data type. Therefore, allow pointer data type when checking 'isLegalMaskedLoadStore()'. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D85045
These methods abstract away Error handling when trying to read options that can't be parsed by logging the error automatically and returning None. Reviewed By: gribozavr2 Differential Revision: https://reviews.llvm.org/D84812
Change to expand all arguments and return values to i64 to follow ABI. Update regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D84581
The patterns were incorrect copies from the FPU code, and are unnecessary, since there's no extended load for SPE. Just let LLVM itself do the work by marking it expand. Reviewed By: #powerpc, lkail Differential Revision: https://reviews.llvm.org/D78670
SPE doesn't have a fsel instruction, so don't try to lower to it. This fixes a "Cannot select: tN: f64 = PPCISD::FSEL tX, tY, tZ" error. Reviewed By: #powerpc, lkail Differential Revision: https://reviews.llvm.org/D77773
querying getSCEV() for incomplete phis leads to wrong cache value in `ExprToIVMap`, because incomplete phis may be simplified to same value before get SCEV expression. Reviewed By: lebedev.ri, mkazantsev Differential Revision: https://reviews.llvm.org/D77560
… some code. Now we try to load and broadcast together for operand 1. Followed by load and broadcast for operand 1. Previously we tried load operand 1, load operand 1, broadcast operand 0, broadcast operand 1. Now we have a single helper that tries load and broadcast for one operand that we can just call twice.
…d files When checking for the style of a decl that isn't in the main file, the check will now search for the configuration that the included files uses to gather the style for its decls. This can be useful to silence warnings in header files that follow a different naming convention without using header-filter to silence all warnings(even from other checks) in the header file. Reviewed By: aaron.ballman, gribozavr2 Differential Revision: https://reviews.llvm.org/D84814
Patch improves performance of verify-machineinstrs pass up to 10x. Differential revision: https://reviews.llvm.org/D84105
…f32/v4i32 Minor precursor fix for D66004, but helps the SSE41 tests as well as they run with -disable-peephole
abs() should be rare enough that using value tracking is not going to be a compile-time cost burden, so use it to reduce a variety of potential patterns. We do this in DAGCombiner too. Differential Revision: https://reviews.llvm.org/D85043
…macro definition" The /Zc:__cplusplus option fixes GTEST_LANG_CXX11 value but not GTEST_HAS_TR1_TUPLE, so we still need to force the latter off. Still pass the option since it is required by https://reviews.llvm.org/D78186 too. Differential Revision: https://reviews.llvm.org/D84023
This patch moves a few spread out smin/smax tests to smin-smax-folds.ll and adds additional test cases that expose further potential for folds.
Currently, instruction level fast math flags are not considered when generating patterns for the machine combiner. This currently leads to some missed opportunities to generate FMAs in combination with `#pragma clang fp contract (fast)`. For example, when building the example below with -O3 for AArch64, no FMADD is generated. If built with -O2 and the DAGCombiner is used instead of the MachineCombiner for FMAs, an FMADD is generated. With this patch, the same code is generated in both cases. float madd_contract(float a, float b, float c) { #pragma clang fp contract (fast) return (a * b) + c; } Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84930
…OR(X,Y) if the relevant elements are known zero."" [X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero (REAPPLIED) This allows us to remove the (depth violating) code in getFauxShuffleMask where we were combining the OR(SHUFFLE,SHUFFLE) shuffle inputs as well, and not just the OR(). This is a minor step toward being able to shuffle combine from/to SELECT/BLENDV as a faux shuffle. Reapplied with fixed signed/unsigned comparisons.
Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests. Differential Review: https://reviews.llvm.org/D85120
This option was added a while back, to help improve AA around pointer phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can then prove more precise aliasing info. Differential Revision: https://reviews.llvm.org/D82998
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/15718/steps/build%20stage%201/logs/stdio: FAILED: /usr/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/llvm-readobj -I/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/tools/llvm-readobj -Iinclude -I/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/include -march=broadwell -fPIC -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/llvm-readobj/CMakeFiles/llvm-readobj.dir/ELFDumper.cpp.o -MF tools/llvm-readobj/CMakeFiles/llvm-readobj.dir/ELFDumper.cpp.o.d -o tools/llvm-readobj/CMakeFiles/llvm-readobj.dir/ELFDumper.cpp.o -c /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/tools/llvm-readobj/ELFDumper.cpp /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/tools/llvm-readobj/ELFDumper.cpp: In function ‘llvm::Expected<const llvm::object::Elf_Mips_Options<ELFT>*> readMipsOptions(const uint8_t*, llvm::ArrayRef<unsigned char>&, bool&)’: /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/tools/llvm-readobj/ELFDumper.cpp:3374:12: error: parse error in template argument list if (O->size < ExpectedSize) Note: I played with godbolt.org and was able to catch the similar "error in template argument list" error when used gcc 4.9.0 with this code. Fix: try to introduce a variable to store `O->size`, it helped to me in godbolt.
As discussed in D84949, this removes the constraint to cast since it does not cause compile time degradation. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D85188
This is the final bit of work to relax the register allocation requirements when code generating normal LLVM IR, which rarely care about the result of inactive lanes. By using _PRED nodes we can make better use of SVE's reversed instructions. Also removes a redundant parameter from the min/max tests. Differential Revision: https://reviews.llvm.org/D85142
When mapping an optional value, if the value is <none> and followed by comments, there will be a parsing error. This patch helps fix this issue. e.g., When mapping the following YAML, ``` Sections: - Name: blah Type: SHT_foo Flags: [[FLAGS=<none>]] ## some comments. ``` the raw value of `ScalarNode` is "<none> " rather than "<none>". We need to remove the spaces. Differential Revision: https://reviews.llvm.org/D85180
The CFA is calculated as (SP/FP + offset), but when there are SVE objects on the stack the SP offset is partly scalable and should instead be expressed as the DWARF expression: SP + offset + scalable_offset * VG where VG is the Vector Granule register, containing the number of 64bits 'granules' in a scalable vector. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84043
This patch adds a CFI entry for each SVE callee saved register that needs unwind info at an offset from the CFA. The offset is a DWARF expression because the offset is partly scalable. The CFI entries only cover a subset of the SVE callee-saves and only encodes the lower 64-bits, thus implementing the lowest common denominator ABI. Existing unwinders may support VG but only restore the lower 64-bits. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84044
This is a first patch that sweeps over tests to fix indentation (tabs to spaces). It also adds label checks and removes redundant matching of `%{{.*}} = `. The following tests have been fixed: - arithmetic-ops-to-llvm - bitwise-ops-to-llvm - cast-ops-to-llvm - comparison-ops-to-llvm - logical-ops-to-llvm (renamed to match the rest) Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D85181
The bug was not noticed because we didn't have a lot of custom type conversions directly to LLVM dialect. Differential Revision: https://reviews.llvm.org/D85192
With new LLVM dialect type modeling, the dialect types no longer wrap LLVM IR types. Therefore, they need to be translated to and from LLVM IR during export and import. Introduce the relevant functionality for translating types. It is currently exercised by an ad-hoc type translation roundtripping test that will be subsumed by the actual translation test when the type system transition is complete. Depends On D84339 Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D85019
…deling These are intended to smoothen the transition and may be removed in the future in favor of more MLIR-compatible APIs. They intentionally have the same semantics as the existing functions, which must remain stable until the transition is complete. Depends On D85019 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D85020
This should probably be moved up to some common area eventually when there's another user.
A new first-party modeling for LLVM IR types in the LLVM dialect has been developed in parallel to the existing modeling based on wrapping LLVM `Type *` instances. It resolves the long-standing problem of modeling identified structure types, including recursive structures, and enables future removal of LLVMContext and related locking mechanisms from LLVMDialect. This commit only switches the modeling by (a) renaming LLVMTypeNew to LLVMType, (b) removing the old implementaiton of LLVMType, and (c) updating the tests. It is intentionally minimal. Separate commits will remove the infrastructure built for the transition and update API uses where appropriate. Depends On D85020 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D85021
GCC5 seems to dislike generic lambdas calling a method of the class containing the lambda without explicit `this`.
This is based on the existing code for the non-intrinsic idioms in InstCombine. The vector constant constraint is non-obvious: undefs should be ok in the outer call, but they can't propagate safely from the inner call in all cases. Example: https://alive2.llvm.org/ce/z/-2bVbM define <2 x i8> @src(<2 x i8> %x) { %0: %m = umin <2 x i8> %x, { 7, undef } %m2 = umin <2 x i8> { 9, 9 }, %m ret <2 x i8> %m2 } => define <2 x i8> @tgt(<2 x i8> %x) { %0: %m = umin <2 x i8> %x, { 7, undef } ret <2 x i8> %m } Transformation doesn't verify! ERROR: Value mismatch Example: <2 x i8> %x = < undef, undef > Source: <2 x i8> %m = < #x00 (0) [based on undef value], #x00 (0) > <2 x i8> %m2 = < #x00 (0), #x00 (0) > Target: <2 x i8> %m = < #x07 (7), #x10 (16) > Source value: < #x00 (0), #x00 (0) > Target value: < #x07 (7), #x10 (16) >
…nsfer_read into full and partial copies. This revision adds a transformation and a pattern that rewrites a "maybe masked" `vector.transfer_read %view[...], %pad `into a pattern resembling: ``` %1:3 = scf.if (%inBounds) { scf.yield %view : memref<A...>, index, index } else { %2 = linalg.fill(%extra_alloc, %pad) %3 = subview %view [...][...][...] linalg.copy(%3, %alloc) memref_cast %extra_alloc: memref<B...> to memref<A...> scf.yield %4 : memref<A...>, index, index } %res= vector.transfer_read %1#0[%1#1, %1#2] {masked = [false ... false]} ``` where `extra_alloc` is a top of the function alloca'ed buffer of one vector. This rewrite makes it possible to realize the "always full tile" abstraction where vector.transfer_read operations are guaranteed to read from a padded full buffer. The extra work only occurs on the boundary tiles.
Signed-off-by: Soumi Manna <soumi.manna@intel.com>
Please do not review. Sorry wrong merge conflicts happen. |
I have created separate PR here: #2255 |
This patch adds support for union types as kernel parameter using existing codes of
"handleScalarType" and renames "handleScalarType" to "handleSimpleType" to
be more generic for scalars, vectors and unions types.
-Add kernel generation, integration header test, and runtime tests.
Signed-off-by: Soumi Manna soumi.manna@intel.com