-
Notifications
You must be signed in to change notification settings - Fork 769
Copy attributes of parallel_for kernels to the wrapped versions when rounding up the range #3154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: rdeodhar <rajiv.deodhar@intel.com>
clang/lib/Sema/SemaSYCL.cpp
Outdated
@@ -306,6 +318,42 @@ static int64_t getIntExprValue(const Expr *E, ASTContext &Ctx) { | |||
return E->getIntegerConstantExpr(Ctx)->getSExtValue(); | |||
} | |||
|
|||
// Collect function attributes related to SYCL | |||
void CollectSYCLAttributes(Sema &S, FunctionDecl *FD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be static
, named collectSYCLAttributes()
, and directly_called
should be DirectlyCalled
per the usual coding conventions.
Also, the name is a bit misleading because this function does more than just collect attributes - it also issues diagnostics and modifies the given function in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK on the name changes.
clang/lib/Sema/SemaSYCL.cpp
Outdated
void CollectSYCLAttributes(Sema &S, FunctionDecl *FD, | ||
llvm::SmallPtrSet<Attr *, 4> &Attrs, | ||
bool directly_called = true) { | ||
if (auto *A = FD->getAttr<IntelReqdSubGroupSizeAttr>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified into:
llvm::copy_if(FD->getAtts(), std::inserter(Attrs), [](Attr *A) {
return isa<IntelReqdSubGroupSizeAttr, ReqdWorkGroupSizeAttr, SYCLIntelKernelArgsRestrictAttr, etc>(A);
});
This is functionally different in that the approach in the patch will only add the attribute to the container once whereas the looping approach will get all of the attributes if any of them have been duplicated.
It may be worth it to consider splitting this into two functions. One that collects all the copies of everything but SYCLIntelUseStallEnableClustersAttr
, and one that uses FD->specific_attrs<SYCLIntelUseStallEnableClustersAttr>()
to loop over just those attributes and handle them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking on this a bit more... it would be best if we were able to use the table generator to implement the logical guts of this function. This way, we could add something to each attribute definition in Attr.td
, like let CopyToKernel = 1;
, to signify which attributes opt into this behavior. Then ClangAttrEmitter.cpp
could be modified to spit out a new .inc file that we #include
within the llvm::copy_if()
lambda. On the assumption that most attributes will not need custom logic like SYCLIntelUseStallEnableClustersAttr
does, this could make it easier to opt attributes into this functionality in the future. WDYT? (To be clear, I'm not suggesting this tablegen work as part of this patch but instead as a follow-up refactoring. So I'm mostly wondering if that idea is worth opening an issue for or not.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unable to get the llvm::copy_if call to compile. Is there an example of similar usage elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, I didn't realize that llvm::SmallPtrSet<>
doesn't have the correct interface for its insert()
method -- it doesn't have one accepting an iterator hint. I realized that after noticing that my call to std::inserter()
left off the iterator argument (sorry about that). See https://github.com/intel/llvm/blob/sycl/llvm/include/llvm/ADT/SmallPtrSet.h#L364
I actually think this is a deficiency in the upstream implementation. I think SmallPtrSetImpl
should be given a function like:
std::pair<iterator, bool> insert(iterator, PtrType Ptr) {
return insert(Ptr);
}
where the hint is accepted and ignored.
However, I can understand not wanting to do that as part of this patch, so an alternative suggestion is to do the loop copy manually:
for (Attr *A : FD->getAttrs()) {
if (isa<IntelReqdSubGroupSizeAttr, ReqdWorkGroupSizeAttr, SYCLIntelKernelArgsRestrictAttr, etc>(A))
Attrs.insert(A);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for the tip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up on the need for an inserter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some test coverage for the changes?
clang/lib/Sema/SemaSYCL.cpp
Outdated
@@ -3155,6 +3163,62 @@ void Sema::CheckSYCLKernelCall(FunctionDecl *KernelFunc, SourceRange CallLoc, | |||
KernelFunc->setInvalidDecl(); | |||
} | |||
|
|||
// For a wrapped parallel_for, copy attributes from original | |||
// kernel to wrapped kernel. | |||
void Sema::copyAttributes(const CXXRecordDecl *KernelObj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function name is a bit generic for the functionality it provides. Should it perhaps be copySYCLKernelAttrs()
instead? That's still a bit confusing because there's no mention of copying them to what...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's a better name.
clang/lib/Sema/SemaSYCL.cpp
Outdated
// Calculate both names, since Integration headers need both. | ||
std::string CalculatedName, StableName; | ||
std::tie(CalculatedName, StableName) = | ||
constructKernelName(*this, KernelCallerFunc, MC); | ||
StringRef KernelName(getLangOpts().SYCLUnnamedLambda ? StableName | ||
: CalculatedName); | ||
|
||
std::string PF("__pf_kernel_wrapper"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the local variable here.
clang/lib/Sema/SemaSYCL.cpp
Outdated
bool DirectlyCalled = true) { | ||
if (!FD->hasAttrs()) | ||
return; | ||
for (Attr *A : FD->getAttrs()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the switch from SmallPtrSet
to SmallVector
, this can be switched to an llvm::copy_if()
(using a std::back_inserter()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done.
sycl/test/kernel_param/attr.cpp
Outdated
return 0; | ||
} | ||
|
||
// CHECK: define {{.*}}spir_kernel void @{{.*}}__pf_kernel_wrapper{{.*}}reqd_work_group_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a newline to the end of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but don't forget to open an issue about the diagnostic behavior from the existing code. Thanks!
@@ -3192,6 +3192,9 @@ def warn_dllimport_dropped_from_inline_function : Warning< | |||
InGroup<IgnoredAttributes>; | |||
def warn_attribute_ignored : Warning<"%0 attribute ignored">, | |||
InGroup<IgnoredAttributes>; | |||
def warn_attribute_on_direct_kernel_callee_only : Warning<"%0 attribute allowed" | |||
" only on function directly called from kernel; attribute ignored">, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only on function -> only on a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"called from kernel" is also a bit awkward. called from a kernel? called from a kernel function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a FE test? I suspect you'll have to add something to test headers to generate the wrapper kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for the delay in review. I was under the impression I reviewed your latest change. I didn't notice the new commit.
@@ -0,0 +1,50 @@ | |||
// RUN: %clang_cc1 %s -fsyntax-only -ast-dump -fsycl -fsycl-is-device -triple spir64 | FileCheck %s | |||
|
|||
template <typename Name, typename Type> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per new contributing guidelines (https://github.com/intel/llvm/blob/sycl/CONTRIBUTING.md), tests have to include mock headers if required. Is it possible to remove these definitions and modify test to follow guidelines? Please see section "Guidelines for adding DPC++ in-tree LIT tests"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have moved some definitions into the fake sycl.hpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
…s when rounding up the range (intel#3154) Parallel_for range rounding creates a wrapped version of the original kernel. Kernel attributes need to be copied from original kernel to new wrapped kernel. Signed-off-by: rdeodhar <rajiv.deodhar@intel.com>
Parallel_for range rounding creates a wrapped version of the original kernel. Kernel attributes need to be copied from original kernel to new wrapped kernel.