Skip to content

[SYCL] Refactor the way we handle duplicate attribute logic #3224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 23, 2021

Conversation

AaronBallman
Copy link
Contributor

The existing code does not handle things like redeclarations or
template instantiations properly. This refactoring changes things to
be a bit more like they would be upstream for other attributes. Namely,
it splits out an Add method that can be used for template instantiation
as well as regular attribute handling, and a Merge method to handle
redeclarations.

This patch only handles [[intel::reqd_sub_group_size]] and
[[intel::num_simd_work_items]] currently. Other attributes will be
handled in follow-up work.

The existing code does not handle things like redeclarations or
template instantiations properly. This refactoring changes things to
be a bit more like they would be upstream for other attributes. Namely,
it splits out an Add method that can be used for template instantiation
as well as regular attribute handling, and a Merge method to handle
redeclarations.

This patch only handles [[intel::reqd_sub_group_size]] and
[[intel::num_simd_work_items]] currently. Other attributes will be
handled in follow-up work.
@@ -1393,7 +1393,7 @@ def IntelReqdSubGroupSize: InheritableAttr {
let Spellings = [GNU<"intel_reqd_sub_group_size">,
CXX11<"intel", "reqd_sub_group_size">];
let Args = [ExprArgument<"Value">];
let Subjects = SubjectList<[Function, CXXMethod], ErrorDiag>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change wasn't strictly necessary in this patch, but I added it as a drive-by fix because I was already touching this attribute. It's an effectively NFC change (the pragma subject test had to be updated but the functionality is identical).

You don't need to add CXXMethod to the list because CXXMethodDecl already inherits from FunctionDecl, so the existing Function subject covers both cases.

@AaronBallman
Copy link
Contributor Author

Note, this change should probably land after #3170 due to the strong likelihood of merge conflicts.

smanna12
smanna12 previously approved these changes Feb 18, 2021
Copy link
Contributor

@smanna12 smanna12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I like this new refactoring work with attributes. This is clean and easy to understand. Thanks.

if (S.LangOpts.SYCLIsHost)
void Sema::AddIntelReqdSubGroupSize(Decl *D, const AttributeCommonInfo &CI,
Expr *E) {
if (LangOpts.SYCLIsHost)
Copy link
Contributor

@elizabethandrews elizabethandrews Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a review comment for another attribute I worked on - Should we not do any semantic analysis for attributes on host. In my PR, after discussions, I ended up handling diagnostics (even if attribute is ignored) on host. However I think this is not the 'normal' behavior for SYCL/FPGA attributes IIRC. We should be consistent with this and so I think its necessary to discuss this now as we start refactoring attributes. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising the question!

My gut instinct is: if the attribute should be treated as an unknown attribute when some option isn't set, then the attribute should get a LangOpts definition in Attr.td (or be a target-specific attribute) and the semantic checking in SemaDeclAttr.cpp should not get any special logic to bail out early beyond what's auto-generated for us. If the attribute is known to a given language mode, then semantic checking should happen for the parts of it that are possible to be checked, and we only elide checks that are senseless. Concretely, this means things like the deprecation, duplicate attribute, and invalid combination of attributes warnings should be diagnosed on host and device, while things that require (say) knowledge of device characteristics that aren't available when doing a host compilation are only checked when doing a device compile. My reason for this is: if you do a host-only compile and it compiles cleanly, you'd be rather surprised to suddenly hear that the attribute you're using is deprecated or conflicts with another attribute when doing a device compile. So I'm assuming that users sometimes find a need to split host vs device compilations and that's why we give them the option. (Is that a faulty assumption?)

That said, I have no idea how much of these decisions are driven by the fact that dpcpp compiles things three times and so the diagnostics have the potential to come out in triplicate. (FWIW, I think that's a usability issue that should be solved -- we should probably be collating all of the diagnostics into one list and removing duplicates before displaying the diagnostics. If device vs host is important to understanding how to fix a diagnostic, we could emit whether it was a host, device, or both as part of this collated list. Or, if users don't need to do separate device and host compilations on their own, I suppose the ideal would be to generate the AST once (and emit all the diagnostics once), then use the AST three times for codegen purposes (so only codegen diagnostics would potentially be duplicated, but those are few and far between).)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising the question!

My gut instinct is: if the attribute should be treated as an unknown attribute when some option isn't set, then the attribute should get a LangOpts definition in Attr.td (or be a target-specific attribute) and the semantic checking in SemaDeclAttr.cpp should not get any special logic to bail out early beyond what's auto-generated for us.

I believe this was the behavior originally. The LangOpts was SyclIsDevice, IIRC. It was modified because this resulted in 'unknown attribute' warnings during host compilation phase of triple compilation. For a user unaware of the 3 passes, the diagnostic is just confusing. So we added LangOpts SYCLIsHost to silently ignore the attribute on host.

Thinking about it now, my question above isn't really valid in triple compilation since device compilation will generate the warnings anyway. The only thing enabling diagnostics on host will do is generate multiple diagnostics. The issue only arises if someone does host and device compilation separately as you mentioned.

If the attribute is known to a given language mode, then semantic checking should happen for the parts of it that are possible to be checked, and we only elide checks that are senseless. Concretely, this means things like the deprecation, duplicate attribute, and invalid combination of attributes warnings should be diagnosed on host and device, while things that require (say) knowledge of device characteristics that aren't available when doing a host compilation are only checked when doing a device compile. My reason for this is: if you do a host-only compile and it compiles cleanly, you'd be rather surprised to suddenly hear that the attribute you're using is deprecated or conflicts with another attribute when doing a device compile. So I'm assuming that users sometimes find a need to split host vs device compilations and that's why we give them the option. (Is that a faulty assumption?)

To be honest I don't really know. @erichkeane @premanandrao could you please weigh in?

That said, I have no idea how much of these decisions are driven by the fact that dpcpp compiles things three times and so the diagnostics have the potential to come out in triplicate. (FWIW, I think that's a usability issue that should be solved -- we should probably be collating all of the diagnostics into one list and removing duplicates before displaying the diagnostics. If device vs host is important to understanding how to fix a diagnostic, we could emit whether it was a host, device, or both as part of this collated list. Or, if users don't need to do separate device and host compilations on their own, I suppose the ideal would be to generate the AST once (and emit all the diagnostics once), then use the AST three times for codegen purposes (so only codegen diagnostics would potentially be duplicated, but those are few and far between).)

This makes sense. Off the top of my head I'm not sure how much work is required to do this though. We could potentially discuss this with the architects and work on this for a future release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so the reason we never checked in 'host' mode was:

1- We don't want these attributes to make it into the AST when they aren't meaningful/don't do anything. This allows us to not bother checking SyclIsDevice later in the process (which is consistent with some other Clang arch bits).

2- We do not want them to diagnose in Host mode of course, since then the 3-compiles ends up warning on something that isn't the case.

3- We always compile 3 times, so by the time we get to host-mode (the 3rd one), any errors/warnings have been handled already.

In general, I would be OK MOVING the SyclIsHost checks to after the diagnostics (though I don't think it is a particularly high priority item), but I don't think I would want it to be added to the AST, so we would have to make sure we only did the addAttr call in device mode.

as far as collating the results, I don't have a good idea how to do that. First, one of the requirements of SYCL is that the 'host' mode compile be able to be ANY compiler, not just a SYCL compiler. That is why there is almost nothing that does host-compiling. Second, They are different cc1 invocations, so different processes. I don't know how we can teach the diagnostics engine to share between different processes.

We at one point considered disabling warnings on the host-compile entirely, but we immediately ran into the case where conditional compilation resulted in missed, useful warnings. The duplicate diagnostics is considered a necessary evil at that point :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @erichkeane.

Personally I believe it isn't necessary to emit these diagnostics on host since the attributes aren't added anyway. But I am not opposed to it either if @AaronBallman thinks its better we do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1- We don't want these attributes to make it into the AST when they aren't meaningful/don't do anything. This allows us to not bother checking SyclIsDevice later in the process (which is consistent with some other Clang arch bits).

I may regret going down this rabbit hole, but... how does this work with tooling like clang-tidy? e.g., when you run clang-tidy, do you get three compilations there as well? The reason I ask is: that could be a reason why we might want to keep these attributes in the AST -- if clang-tidy only runs one compilation mode and that mode drops the attributes, it'd be hard to write useful SYCL checkers.

as far as collating the results, I don't have a good idea how to do that. First, one of the requirements of SYCL is that the 'host' mode compile be able to be ANY compiler, not just a SYCL compiler. That is why there is almost nothing that does host-compiling. Second, They are different cc1 invocations, so different processes. I don't know how we can teach the diagnostics engine to share between different processes.

Oh, ew. I thought we'd be using the integrated cc1 functionality so that we don't have to pay the price to spawn three separate executables (which is a big perf hit on Windows).

We at one point considered disabling warnings on the host-compile entirely, but we immediately ran into the case where conditional compilation resulted in missed, useful warnings. The duplicate diagnostics is considered a necessary evil at that point :/

So conditional compilation is a thing that our users will do? If that's the case, then I think we want to issue the diagnostics in host mode, even if we don't want to attach the attribute to the AST. Though, that does make for a bit of awkwardness -- community has a general rule of thumb that any time an attribute is ignored there be a diagnostic issued about ignoring it (otherwise users think their attribute is doing something useful when it's not). If users do separate compilations of host vs device, then it means they could do only the host compilation with no device compilation... so the attribute would look like it does something meaningful when it doesn't. On the flip side, it would be outright baffling if the user did all three compilation modes and had an "ignored attribute" warning from one of them because they'd get confused by the attribute being applied in the other two modes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, we haven't really considered clang-tidy, so i actually have no idea! Presumably, it would run in the 'first' compile, which is device mode.

As far as the spawning, we ended up finding that doing all 3 in one process is actually an even worse problem, to the point I think we submitted a community patch to only do that integrated CC1 when there was only 1 process to run. You end up running out of resources if you do them all in the same process (in part, because Clang doesn't clean up after itself well, even with -no-disable-free).

On the flip side, it would be outright baffling if the user did all three compilation modes and had an "ignored attribute" warning from one of them because they'd get confused by the attribute being applied in the other two modes.

This is really what we took into consideration. We opted to suppress the diagnostics for that reason.

That said, I think the ONLY way to split the compilation is to do -cc1 directly, so we are/were less worried about it. I could go either way on diagnosing other errors/warnings in host mode, but feel free to do so if you wish (or file a bug for someone to do).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some offline discussion with @erichkeane and some thinking on it, I have the impression that the vast majority of users will not split their host and device compilations but instead pass -fsycl to the compiler and get all three compilations at once. So we should design for that use case.

Based on that, I think my answer here is to drop the attributes from the AST and not do any diagnostic checking for them when in host mode.

I think the best way to do that is from tablegen in Attr.td. We have three LangOpt subclasses, SYCL, SYCLIsDevice, and SYCLIsHost, where SYCL and SYCLIsDevice are confusingly checking the same language option. I think we should make SYCL mean "device or host" (getLangOpts().SYCL), and SYCLIs<mode> mean "SYCL is enabled and compiling for " (getLangOpts().SYCL && getLangOpts().SYCLIs<mode>()). Then I think we should modify the LangOpt class to accept a bit saying "don't diagnose the attribute as being unknown if this bit is set" and set that bit for the SYCLIsDevice and SYCLIsHost subclasses. The end result is: any attribute that's available in SYCL is available in both host and device mode or diagnosed as an unknown attribute if not compiling for either host or device mode; any attribute that's available for device or host mode exclusively (rather than both) will be diagnosed in the given mode, but otherwise silently ignored unless not compiling for SYCL at all. I think this work should be done in a follow-up patch.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we should make SYCL mean "device or host" (getLangOpts().SYCL), and SYCLIs mean "SYCL is enabled and compiling for " (getLangOpts().SYCL && getLangOpts().SYCLIs()).

To be honest, I thought that was already the case! I think it is the right idea.

Then I think we should modify the LangOpt class to accept a bit saying "don't diagnose the attribute as being unknown if this bit is set" and set that bit for the SYCLIsDevice and SYCLIsHost subclasses.

I'm not sure I see how that fits into LangOpts?

The end result is: any attribute that's available in SYCL is available in both host and device mode or diagnosed as an unknown attribute if not compiling for either host or device mode; any attribute that's available for device or host mode exclusively (rather than both) will be diagnosed in the given mode, but otherwise silently ignored unless not compiling for SYCL at all. I think this work should be done in a follow-up patch.

I like this end-result, and I think it makes a lot of sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait... you meant LangOpt the class in Attr.td, didn't you? Yeah, then I think I agree with everything above.

@AaronBallman
Copy link
Contributor Author

I can't really make sense of the Jenkins precommit failures. The issue appears to be related to a timeout and from what I can see of the test files, there's no (direct) reliance on any of the attributes modified in this patch.

@smanna12
Copy link
Contributor

I can't really make sense of the Jenkins precommit failures. The issue appears to be related to a timeout and from what I can see of the test files, there's no (direct) reliance on any of the attributes modified in this patch.

This is a flaky failure i believe. I saw this one of my PR. If you restart testing, it will go away.

@AaronBallman
Copy link
Contributor Author

I can't really make sense of the Jenkins precommit failures. The issue appears to be related to a timeout and from what I can see of the test files, there's no (direct) reliance on any of the attributes modified in this patch.

This is a flaky failure i believe. I saw this one of my PR. If you restart testing, it will go away.

Good suggestion, that seemed to solve it for me. Thanks!

@bader bader merged commit 3c22af9 into intel:sycl Feb 23, 2021
smanna12 added a commit to smanna12/llvm that referenced this pull request Feb 23, 2021
…ates()]] attributes

This patch
 1. refactors two declaration attributes: [[intel::private_copies()] and [[intel::max_replicates()]]
    using intel#3224 as a template to better fit for community standards.
 2. refactors the way we handle duplicate attributes and
    mutually exclusive attributes logic when present on a given declaration.
 3. handles redeclarations or template instantiations properly.
 4. adds test

Signed-off-by: Soumi Manna <soumi.manna@intel.com>
smanna12 added a commit to smanna12/llvm that referenced this pull request Mar 1, 2021
This patch

1. refactors two function attributes:
   [[intel::no_global_work_offset()]] and [[intel::scheduler_target_fmax_mhz()]]
   using intel#3224 as a template to better fit for community standards.
2. refactors the way we handle duplicate attributes on a given declaration.
3. handles redeclarations or template instantiations properly.
4. adds test

Signed-off-by: Soumi Manna <soumi.manna@intel.com>
romanovvlad pushed a commit that referenced this pull request Mar 2, 2021
This patch

1. refactors two function attributes:
   [[intel::no_global_work_offset()]] and [[intel::scheduler_target_fmax_mhz()]]
   using #3224 as a template to better fit for community standards.
2. refactors the way we handle duplicate attributes on a given declaration.
3. handles redeclarations or template instantiations properly.
4. adds test

Signed-off-by: Soumi Manna <soumi.manna@intel.com>
smanna12 added a commit to smanna12/llvm that referenced this pull request Mar 3, 2021
This patch
 1. refactors two function attributes using intel#3224 :
    [[intel::loop_fuse()]] and [[intel::loop_fuse_independent()]]
 2. store expression as ConstantExpr in Semantic Attributes
 3. handles template instantiations properly for duplicate attributes on a given declaration.
 4. adds test
 5. updates codegen codes

Signed-off-by: Soumi Manna <soumi.manna@intel.com>
bader pushed a commit that referenced this pull request Mar 7, 2021
This patch
 1. refactors two function attributes using #3224 :
    [[intel::loop_fuse()]] and [[intel::loop_fuse_independent()]]
 2. store expression as ConstantExpr in Semantic Attributes
 3. handles template instantiations properly for duplicate attributes on a given declaration.
 4. adds test
 5. updates codegen codes

Signed-off-by: Soumi Manna <soumi.manna@intel.com>
smanna12 added a commit to smanna12/llvm that referenced this pull request Mar 9, 2021
This patch

1. refactors FPGA function attribute using intel#3224:
   [[intel::max_global_work_dim()]] attribute to better fit for community standards.
2. refactors the way we handle duplicate attributes and
   mutually exclusive attributes logic when present on a given declaration.
3. handles redeclarations or template instantiations properly.
4. adds tests
5. adds new test cases where the value of 'max_global_work_dim' attribute
   equals to 0, we shall ensure that if max_work_group_size and
   reqd_work_group_size attributes exist, they hold equal values (1, 1, 1).

   Before the new refactoring patch, we silently accepeted this test case:

   struct TRIFuncObjBad {
     [[intel::max_work_group_size(8, 8, 8)]]
     [[cl::reqd_work_group_size(4, 4, 4)]]
     [[intel::max_global_work_dim(0)]]
      void operator()() const {}
   };

Signed-off-by: Soumi Manna <soumi.manna@intel.com>
bader pushed a commit that referenced this pull request Mar 10, 2021
…ates()]] attributes (#3251)

This patch
 1. refactors two declaration attributes using #3224: 
     [[intel::private_copies()] and [[intel::max_replicates()]] to better fit for community standards.
 2. refactors the way we handle duplicate attributes and
    mutually exclusive attributes logic when present on a given declaration.
 3. handles redeclarations or template instantiations properly.
 4. adds test

Signed-off-by: Soumi Manna <soumi.manna@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants