-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Refactor the way we handle duplicate attribute logic #3224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The existing code does not handle things like redeclarations or template instantiations properly. This refactoring changes things to be a bit more like they would be upstream for other attributes. Namely, it splits out an Add method that can be used for template instantiation as well as regular attribute handling, and a Merge method to handle redeclarations. This patch only handles [[intel::reqd_sub_group_size]] and [[intel::num_simd_work_items]] currently. Other attributes will be handled in follow-up work.
@@ -1393,7 +1393,7 @@ def IntelReqdSubGroupSize: InheritableAttr { | |||
let Spellings = [GNU<"intel_reqd_sub_group_size">, | |||
CXX11<"intel", "reqd_sub_group_size">]; | |||
let Args = [ExprArgument<"Value">]; | |||
let Subjects = SubjectList<[Function, CXXMethod], ErrorDiag>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change wasn't strictly necessary in this patch, but I added it as a drive-by fix because I was already touching this attribute. It's an effectively NFC change (the pragma subject test had to be updated but the functionality is identical).
You don't need to add CXXMethod
to the list because CXXMethodDecl
already inherits from FunctionDecl
, so the existing Function
subject covers both cases.
Note, this change should probably land after #3170 due to the strong likelihood of merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I like this new refactoring work with attributes. This is clean and easy to understand. Thanks.
if (S.LangOpts.SYCLIsHost) | ||
void Sema::AddIntelReqdSubGroupSize(Decl *D, const AttributeCommonInfo &CI, | ||
Expr *E) { | ||
if (LangOpts.SYCLIsHost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a review comment for another attribute I worked on - Should we not do any semantic analysis for attributes on host. In my PR, after discussions, I ended up handling diagnostics (even if attribute is ignored) on host. However I think this is not the 'normal' behavior for SYCL/FPGA attributes IIRC. We should be consistent with this and so I think its necessary to discuss this now as we start refactoring attributes. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising the question!
My gut instinct is: if the attribute should be treated as an unknown attribute when some option isn't set, then the attribute should get a LangOpts
definition in Attr.td (or be a target-specific attribute) and the semantic checking in SemaDeclAttr.cpp should not get any special logic to bail out early beyond what's auto-generated for us. If the attribute is known to a given language mode, then semantic checking should happen for the parts of it that are possible to be checked, and we only elide checks that are senseless. Concretely, this means things like the deprecation, duplicate attribute, and invalid combination of attributes warnings should be diagnosed on host and device, while things that require (say) knowledge of device characteristics that aren't available when doing a host compilation are only checked when doing a device compile. My reason for this is: if you do a host-only compile and it compiles cleanly, you'd be rather surprised to suddenly hear that the attribute you're using is deprecated or conflicts with another attribute when doing a device compile. So I'm assuming that users sometimes find a need to split host vs device compilations and that's why we give them the option. (Is that a faulty assumption?)
That said, I have no idea how much of these decisions are driven by the fact that dpcpp compiles things three times and so the diagnostics have the potential to come out in triplicate. (FWIW, I think that's a usability issue that should be solved -- we should probably be collating all of the diagnostics into one list and removing duplicates before displaying the diagnostics. If device vs host is important to understanding how to fix a diagnostic, we could emit whether it was a host, device, or both as part of this collated list. Or, if users don't need to do separate device and host compilations on their own, I suppose the ideal would be to generate the AST once (and emit all the diagnostics once), then use the AST three times for codegen purposes (so only codegen diagnostics would potentially be duplicated, but those are few and far between).)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising the question!
My gut instinct is: if the attribute should be treated as an unknown attribute when some option isn't set, then the attribute should get a
LangOpts
definition in Attr.td (or be a target-specific attribute) and the semantic checking in SemaDeclAttr.cpp should not get any special logic to bail out early beyond what's auto-generated for us.
I believe this was the behavior originally. The LangOpts was SyclIsDevice, IIRC. It was modified because this resulted in 'unknown attribute' warnings during host compilation phase of triple compilation. For a user unaware of the 3 passes, the diagnostic is just confusing. So we added LangOpts SYCLIsHost to silently ignore the attribute on host.
Thinking about it now, my question above isn't really valid in triple compilation since device compilation will generate the warnings anyway. The only thing enabling diagnostics on host will do is generate multiple diagnostics. The issue only arises if someone does host and device compilation separately as you mentioned.
If the attribute is known to a given language mode, then semantic checking should happen for the parts of it that are possible to be checked, and we only elide checks that are senseless. Concretely, this means things like the deprecation, duplicate attribute, and invalid combination of attributes warnings should be diagnosed on host and device, while things that require (say) knowledge of device characteristics that aren't available when doing a host compilation are only checked when doing a device compile. My reason for this is: if you do a host-only compile and it compiles cleanly, you'd be rather surprised to suddenly hear that the attribute you're using is deprecated or conflicts with another attribute when doing a device compile. So I'm assuming that users sometimes find a need to split host vs device compilations and that's why we give them the option. (Is that a faulty assumption?)
To be honest I don't really know. @erichkeane @premanandrao could you please weigh in?
That said, I have no idea how much of these decisions are driven by the fact that dpcpp compiles things three times and so the diagnostics have the potential to come out in triplicate. (FWIW, I think that's a usability issue that should be solved -- we should probably be collating all of the diagnostics into one list and removing duplicates before displaying the diagnostics. If device vs host is important to understanding how to fix a diagnostic, we could emit whether it was a host, device, or both as part of this collated list. Or, if users don't need to do separate device and host compilations on their own, I suppose the ideal would be to generate the AST once (and emit all the diagnostics once), then use the AST three times for codegen purposes (so only codegen diagnostics would potentially be duplicated, but those are few and far between).)
This makes sense. Off the top of my head I'm not sure how much work is required to do this though. We could potentially discuss this with the architects and work on this for a future release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so the reason we never checked in 'host' mode was:
1- We don't want these attributes to make it into the AST when they aren't meaningful/don't do anything. This allows us to not bother checking SyclIsDevice later in the process (which is consistent with some other Clang arch bits).
2- We do not want them to diagnose in Host mode of course, since then the 3-compiles ends up warning on something that isn't the case.
3- We always compile 3 times, so by the time we get to host-mode (the 3rd one), any errors/warnings have been handled already.
In general, I would be OK MOVING the SyclIsHost checks to after the diagnostics (though I don't think it is a particularly high priority item), but I don't think I would want it to be added to the AST, so we would have to make sure we only did the addAttr call in device mode.
as far as collating the results, I don't have a good idea how to do that. First, one of the requirements of SYCL is that the 'host' mode compile be able to be ANY compiler, not just a SYCL compiler. That is why there is almost nothing that does host-compiling. Second, They are different cc1 invocations, so different processes. I don't know how we can teach the diagnostics engine to share between different processes.
We at one point considered disabling warnings on the host-compile entirely, but we immediately ran into the case where conditional compilation resulted in missed, useful warnings. The duplicate diagnostics is considered a necessary evil at that point :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @erichkeane.
Personally I believe it isn't necessary to emit these diagnostics on host since the attributes aren't added anyway. But I am not opposed to it either if @AaronBallman thinks its better we do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1- We don't want these attributes to make it into the AST when they aren't meaningful/don't do anything. This allows us to not bother checking SyclIsDevice later in the process (which is consistent with some other Clang arch bits).
I may regret going down this rabbit hole, but... how does this work with tooling like clang-tidy? e.g., when you run clang-tidy, do you get three compilations there as well? The reason I ask is: that could be a reason why we might want to keep these attributes in the AST -- if clang-tidy only runs one compilation mode and that mode drops the attributes, it'd be hard to write useful SYCL checkers.
as far as collating the results, I don't have a good idea how to do that. First, one of the requirements of SYCL is that the 'host' mode compile be able to be ANY compiler, not just a SYCL compiler. That is why there is almost nothing that does host-compiling. Second, They are different cc1 invocations, so different processes. I don't know how we can teach the diagnostics engine to share between different processes.
Oh, ew. I thought we'd be using the integrated cc1 functionality so that we don't have to pay the price to spawn three separate executables (which is a big perf hit on Windows).
We at one point considered disabling warnings on the host-compile entirely, but we immediately ran into the case where conditional compilation resulted in missed, useful warnings. The duplicate diagnostics is considered a necessary evil at that point :/
So conditional compilation is a thing that our users will do? If that's the case, then I think we want to issue the diagnostics in host mode, even if we don't want to attach the attribute to the AST. Though, that does make for a bit of awkwardness -- community has a general rule of thumb that any time an attribute is ignored there be a diagnostic issued about ignoring it (otherwise users think their attribute is doing something useful when it's not). If users do separate compilations of host vs device, then it means they could do only the host compilation with no device compilation... so the attribute would look like it does something meaningful when it doesn't. On the flip side, it would be outright baffling if the user did all three compilation modes and had an "ignored attribute" warning from one of them because they'd get confused by the attribute being applied in the other two modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, we haven't really considered clang-tidy, so i actually have no idea! Presumably, it would run in the 'first' compile, which is device mode.
As far as the spawning, we ended up finding that doing all 3 in one process is actually an even worse problem, to the point I think we submitted a community patch to only do that integrated CC1 when there was only 1 process to run. You end up running out of resources if you do them all in the same process (in part, because Clang doesn't clean up after itself well, even with -no-disable-free).
On the flip side, it would be outright baffling if the user did all three compilation modes and had an "ignored attribute" warning from one of them because they'd get confused by the attribute being applied in the other two modes.
This is really what we took into consideration. We opted to suppress the diagnostics for that reason.
That said, I think the ONLY way to split the compilation is to do -cc1 directly, so we are/were less worried about it. I could go either way on diagnosing other errors/warnings in host mode, but feel free to do so if you wish (or file a bug for someone to do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some offline discussion with @erichkeane and some thinking on it, I have the impression that the vast majority of users will not split their host and device compilations but instead pass -fsycl
to the compiler and get all three compilations at once. So we should design for that use case.
Based on that, I think my answer here is to drop the attributes from the AST and not do any diagnostic checking for them when in host mode.
I think the best way to do that is from tablegen in Attr.td. We have three LangOpt
subclasses, SYCL
, SYCLIsDevice
, and SYCLIsHost
, where SYCL
and SYCLIsDevice
are confusingly checking the same language option. I think we should make SYCL
mean "device or host" (getLangOpts().SYCL
), and SYCLIs<mode>
mean "SYCL is enabled and compiling for " (getLangOpts().SYCL && getLangOpts().SYCLIs<mode>()
). Then I think we should modify the LangOpt
class to accept a bit saying "don't diagnose the attribute as being unknown if this bit is set" and set that bit for the SYCLIsDevice
and SYCLIsHost
subclasses. The end result is: any attribute that's available in SYCL
is available in both host and device mode or diagnosed as an unknown attribute if not compiling for either host or device mode; any attribute that's available for device or host mode exclusively (rather than both) will be diagnosed in the given mode, but otherwise silently ignored unless not compiling for SYCL at all. I think this work should be done in a follow-up patch.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think we should make SYCL mean "device or host" (getLangOpts().SYCL), and SYCLIs mean "SYCL is enabled and compiling for " (getLangOpts().SYCL && getLangOpts().SYCLIs()).
To be honest, I thought that was already the case! I think it is the right idea.
Then I think we should modify the LangOpt class to accept a bit saying "don't diagnose the attribute as being unknown if this bit is set" and set that bit for the SYCLIsDevice and SYCLIsHost subclasses.
I'm not sure I see how that fits into LangOpts?
The end result is: any attribute that's available in SYCL is available in both host and device mode or diagnosed as an unknown attribute if not compiling for either host or device mode; any attribute that's available for device or host mode exclusively (rather than both) will be diagnosed in the given mode, but otherwise silently ignored unless not compiling for SYCL at all. I think this work should be done in a follow-up patch.
I like this end-result, and I think it makes a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait... you meant LangOpt
the class in Attr.td, didn't you? Yeah, then I think I agree with everything above.
I can't really make sense of the Jenkins precommit failures. The issue appears to be related to a timeout and from what I can see of the test files, there's no (direct) reliance on any of the attributes modified in this patch. |
This is a flaky failure i believe. I saw this one of my PR. If you restart testing, it will go away. |
Good suggestion, that seemed to solve it for me. Thanks! |
…ates()]] attributes This patch 1. refactors two declaration attributes: [[intel::private_copies()] and [[intel::max_replicates()]] using intel#3224 as a template to better fit for community standards. 2. refactors the way we handle duplicate attributes and mutually exclusive attributes logic when present on a given declaration. 3. handles redeclarations or template instantiations properly. 4. adds test Signed-off-by: Soumi Manna <soumi.manna@intel.com>
This patch 1. refactors two function attributes: [[intel::no_global_work_offset()]] and [[intel::scheduler_target_fmax_mhz()]] using intel#3224 as a template to better fit for community standards. 2. refactors the way we handle duplicate attributes on a given declaration. 3. handles redeclarations or template instantiations properly. 4. adds test Signed-off-by: Soumi Manna <soumi.manna@intel.com>
This patch 1. refactors two function attributes: [[intel::no_global_work_offset()]] and [[intel::scheduler_target_fmax_mhz()]] using #3224 as a template to better fit for community standards. 2. refactors the way we handle duplicate attributes on a given declaration. 3. handles redeclarations or template instantiations properly. 4. adds test Signed-off-by: Soumi Manna <soumi.manna@intel.com>
This patch 1. refactors two function attributes using intel#3224 : [[intel::loop_fuse()]] and [[intel::loop_fuse_independent()]] 2. store expression as ConstantExpr in Semantic Attributes 3. handles template instantiations properly for duplicate attributes on a given declaration. 4. adds test 5. updates codegen codes Signed-off-by: Soumi Manna <soumi.manna@intel.com>
This patch 1. refactors two function attributes using #3224 : [[intel::loop_fuse()]] and [[intel::loop_fuse_independent()]] 2. store expression as ConstantExpr in Semantic Attributes 3. handles template instantiations properly for duplicate attributes on a given declaration. 4. adds test 5. updates codegen codes Signed-off-by: Soumi Manna <soumi.manna@intel.com>
This patch 1. refactors FPGA function attribute using intel#3224: [[intel::max_global_work_dim()]] attribute to better fit for community standards. 2. refactors the way we handle duplicate attributes and mutually exclusive attributes logic when present on a given declaration. 3. handles redeclarations or template instantiations properly. 4. adds tests 5. adds new test cases where the value of 'max_global_work_dim' attribute equals to 0, we shall ensure that if max_work_group_size and reqd_work_group_size attributes exist, they hold equal values (1, 1, 1). Before the new refactoring patch, we silently accepeted this test case: struct TRIFuncObjBad { [[intel::max_work_group_size(8, 8, 8)]] [[cl::reqd_work_group_size(4, 4, 4)]] [[intel::max_global_work_dim(0)]] void operator()() const {} }; Signed-off-by: Soumi Manna <soumi.manna@intel.com>
…ates()]] attributes (#3251) This patch 1. refactors two declaration attributes using #3224: [[intel::private_copies()] and [[intel::max_replicates()]] to better fit for community standards. 2. refactors the way we handle duplicate attributes and mutually exclusive attributes logic when present on a given declaration. 3. handles redeclarations or template instantiations properly. 4. adds test Signed-off-by: Soumi Manna <soumi.manna@intel.com>
The existing code does not handle things like redeclarations or
template instantiations properly. This refactoring changes things to
be a bit more like they would be upstream for other attributes. Namely,
it splits out an Add method that can be used for template instantiation
as well as regular attribute handling, and a Merge method to handle
redeclarations.
This patch only handles [[intel::reqd_sub_group_size]] and
[[intel::num_simd_work_items]] currently. Other attributes will be
handled in follow-up work.