Skip to content

Copy attributes of parallel_for kernels to the wrapped versions when rounding up the range #3154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 2, 2021

Conversation

rdeodhar
Copy link
Contributor

@rdeodhar rdeodhar commented Feb 3, 2021

Parallel_for range rounding creates a wrapped version of the original kernel. Kernel attributes need to be copied from original kernel to new wrapped kernel.

@rdeodhar rdeodhar marked this pull request as draft February 3, 2021 22:34
@@ -306,6 +318,42 @@ static int64_t getIntExprValue(const Expr *E, ASTContext &Ctx) {
return E->getIntegerConstantExpr(Ctx)->getSExtValue();
}

// Collect function attributes related to SYCL
void CollectSYCLAttributes(Sema &S, FunctionDecl *FD,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be static, named collectSYCLAttributes(), and directly_called should be DirectlyCalled per the usual coding conventions.

Also, the name is a bit misleading because this function does more than just collect attributes - it also issues diagnostics and modifies the given function in some cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK on the name changes.

void CollectSYCLAttributes(Sema &S, FunctionDecl *FD,
llvm::SmallPtrSet<Attr *, 4> &Attrs,
bool directly_called = true) {
if (auto *A = FD->getAttr<IntelReqdSubGroupSizeAttr>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified into:

llvm::copy_if(FD->getAtts(), std::inserter(Attrs), [](Attr *A) {
  return isa<IntelReqdSubGroupSizeAttr, ReqdWorkGroupSizeAttr, SYCLIntelKernelArgsRestrictAttr, etc>(A);
});

This is functionally different in that the approach in the patch will only add the attribute to the container once whereas the looping approach will get all of the attributes if any of them have been duplicated.

It may be worth it to consider splitting this into two functions. One that collects all the copies of everything but SYCLIntelUseStallEnableClustersAttr, and one that uses FD->specific_attrs<SYCLIntelUseStallEnableClustersAttr>() to loop over just those attributes and handle them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking on this a bit more... it would be best if we were able to use the table generator to implement the logical guts of this function. This way, we could add something to each attribute definition in Attr.td, like let CopyToKernel = 1;, to signify which attributes opt into this behavior. Then ClangAttrEmitter.cpp could be modified to spit out a new .inc file that we #include within the llvm::copy_if() lambda. On the assumption that most attributes will not need custom logic like SYCLIntelUseStallEnableClustersAttr does, this could make it easier to opt attributes into this functionality in the future. WDYT? (To be clear, I'm not suggesting this tablegen work as part of this patch but instead as a follow-up refactoring. So I'm mostly wondering if that idea is worth opening an issue for or not.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to get the llvm::copy_if call to compile. Is there an example of similar usage elsewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I didn't realize that llvm::SmallPtrSet<> doesn't have the correct interface for its insert() method -- it doesn't have one accepting an iterator hint. I realized that after noticing that my call to std::inserter() left off the iterator argument (sorry about that). See https://github.com/intel/llvm/blob/sycl/llvm/include/llvm/ADT/SmallPtrSet.h#L364

I actually think this is a deficiency in the upstream implementation. I think SmallPtrSetImpl should be given a function like:

std::pair<iterator, bool> insert(iterator, PtrType Ptr) {
  return insert(Ptr);
}

where the hint is accepted and ignored.

However, I can understand not wanting to do that as part of this patch, so an alternative suggestion is to do the loop copy manually:

for (Attr *A : FD->getAttrs()) {
  if (isa<IntelReqdSubGroupSizeAttr, ReqdWorkGroupSizeAttr, SYCLIntelKernelArgsRestrictAttr, etc>(A))
    Attrs.insert(A);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for the tip.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up on the need for an inserter.

Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some test coverage for the changes?

@@ -3155,6 +3163,62 @@ void Sema::CheckSYCLKernelCall(FunctionDecl *KernelFunc, SourceRange CallLoc,
KernelFunc->setInvalidDecl();
}

// For a wrapped parallel_for, copy attributes from original
// kernel to wrapped kernel.
void Sema::copyAttributes(const CXXRecordDecl *KernelObj) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function name is a bit generic for the functionality it provides. Should it perhaps be copySYCLKernelAttrs() instead? That's still a bit confusing because there's no mention of copying them to what...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's a better name.

// Calculate both names, since Integration headers need both.
std::string CalculatedName, StableName;
std::tie(CalculatedName, StableName) =
constructKernelName(*this, KernelCallerFunc, MC);
StringRef KernelName(getLangOpts().SYCLUnnamedLambda ? StableName
: CalculatedName);

std::string PF("__pf_kernel_wrapper");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the local variable here.

bool DirectlyCalled = true) {
if (!FD->hasAttrs())
return;
for (Attr *A : FD->getAttrs()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the switch from SmallPtrSet to SmallVector, this can be switched to an llvm::copy_if() (using a std::back_inserter()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.

return 0;
}

// CHECK: define {{.*}}spir_kernel void @{{.*}}__pf_kernel_wrapper{{.*}}reqd_work_group_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a newline to the end of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

AaronBallman
AaronBallman previously approved these changes Feb 8, 2021
Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but don't forget to open an issue about the diagnostic behavior from the existing code. Thanks!

@rdeodhar rdeodhar marked this pull request as ready for review February 8, 2021 22:50
@rdeodhar rdeodhar requested a review from a team as a code owner February 8, 2021 22:50
@rdeodhar rdeodhar requested a review from v-klochkov February 8, 2021 22:50
@@ -3192,6 +3192,9 @@ def warn_dllimport_dropped_from_inline_function : Warning<
InGroup<IgnoredAttributes>;
def warn_attribute_ignored : Warning<"%0 attribute ignored">,
InGroup<IgnoredAttributes>;
def warn_attribute_on_direct_kernel_callee_only : Warning<"%0 attribute allowed"
" only on function directly called from kernel; attribute ignored">,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only on function -> only on a function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"called from kernel" is also a bit awkward. called from a kernel? called from a kernel function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a FE test? I suspect you'll have to add something to test headers to generate the wrapper kernel.

AaronBallman
AaronBallman previously approved these changes Feb 10, 2021
Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the delay in review. I was under the impression I reviewed your latest change. I didn't notice the new commit.

@@ -0,0 +1,50 @@
// RUN: %clang_cc1 %s -fsyntax-only -ast-dump -fsycl -fsycl-is-device -triple spir64 | FileCheck %s

template <typename Name, typename Type>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per new contributing guidelines (https://github.com/intel/llvm/blob/sycl/CONTRIBUTING.md), tests have to include mock headers if required. Is it possible to remove these definitions and modify test to follow guidelines? Please see section "Guidelines for adding DPC++ in-tree LIT tests"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have moved some definitions into the fake sycl.hpp.

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@romanovvlad romanovvlad merged commit 496f9a0 into intel:sycl Mar 2, 2021
bader added a commit that referenced this pull request Mar 3, 2021
… versions when rounding up the range (#3154)"

This reverts commit 496f9a0.
bader added a commit that referenced this pull request Mar 3, 2021
… versions when rounding up the range (#3154)" (#3293)

This reverts commit 496f9a0.
rdeodhar added a commit to rdeodhar/llvm that referenced this pull request Mar 5, 2021
…s when rounding up the range (intel#3154)

Parallel_for range rounding creates a wrapped version of the original kernel.
Kernel attributes need to be copied from original kernel to new wrapped kernel.

Signed-off-by: rdeodhar <rajiv.deodhar@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants