-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][DOC] Design document for new mechanism of host -> device objects mapping #5910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][DOC] Design document for new mechanism of host -> device objects mapping #5910
Conversation
This is an initial sketch of the idea: apparently more changes needed to include this doc into the index page, to update specialization constants and device global design documents to incorporate this new design, etc. Before doing that, I would like to get initial feedback on the idea: any comments are welcome! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @AlexeySachkov,
I think this looks like a good direction. The changes to the device global design should be very minimal since it is essentially the same design. The changes to specialization constants will be a little more, but not much.
I think there will be a slightly higher runtime cost for specialization constants because operations on the host will need to do an extra lookup to find the unique ID string. That doesn't seem very serious to me, though.
When you finish writing up this design, you should add a description for how "shadowed" variable names in anonymous namespaces are handled. I think you can just transfer the information from the device global and specialization constant designs to this one. The impact on the integration footer will be the same as in those other designs. I'm not sure what the impact will be for the custom host compiler, but I'm sure it can be solved.
**TODO**: alternatively, we could completely re-use existing | ||
`[[__sycl_detail__::device_global]]` attribute and introduce another one for | ||
specialization constants, i.e. it is a question of whether or not we want to | ||
generalize unique IDs generation in form of a generic attribute or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have a slight preference to keep [[__sycl_detail__::uniquely_identifiable_object(kind)]]
and [[__sycl_detail__::device_global]]
as separate attributes.
FWIW, I originally proposed a general attribute like [[__sycl_detail__::restrictions("device_global")]]
, where the parameter would tell the CFE which set of restrictions to enforce. However, the CFE team preferred having a separate C++ attribute for each set of restriction. Thus, we have [[__sycl_detail__::device_global]]
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the CFE team preferred having a separate C++ attribute for each set of restriction.
I think [[__sycl_detail__::uniquely_identifiable_object(kind)]]
raises the same concerns as [[__sycl_detail__::restrictions("device_global")]]
explained by @elizabethandrews
here #4686 (comment) . In order to implement such attribute we should hardcode and keep track of possible "kinds" of uniquely identifiable objects in FE, meaning the FE should diagnose each time the passed kind is not valid and it should be updated for each new kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, having [[__sycl_detail__::uniquely_identifiable_object(kind)]]
for setting LLVM IR attribute and having separate [[__sycl_detail__::device_global]]
for semantic checks is the preferable way, right?
In order to implement such attribute we should hardcode and keep track of possible "kinds" of uniquely identifiable objects in FE, meaning the FE should diagnose each time the passed kind is not valid and it should be updated for each new kind.
Note that for LLVM IR attribute generation purposes I do not suggest that FE perform any correctness check for kind
argument except that it is non empty: FE here just serves as a translation layer and headers are responsible to pass the correct string there to ensure that everything will work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, having
[[__sycl_detail__::uniquely_identifiable_object(kind)]]
for setting LLVM IR attribute and having separate[[__sycl_detail__::device_global]]
for semantic checks is the preferable way, right?
That sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that for LLVM IR attribute generation purposes I do not suggest that FE perform any correctness check for kind argument except that it is non empty: FE here just serves as a translation layer and headers are responsible to pass the correct string there to ensure that everything will work fine.
Doesn't that introduce the dependency between the headers that will use the attribute and the compiler's layers which will use the "kind" string? I guess somehow it should be made sure that the string is correct, not issuing a diagnostic from FE will make it up to developers.
So, having [[sycl_detail::uniquely_identifiable_object(kind)]] for setting LLVM IR attribute and having separate [[sycl_detail::device_global]] for semantic checks is the preferable way, right?
In my mind it is preferable to leave [[__sycl_detail__::device_global]]
"as is" and introduce another attribute not accepting any arguments for specialization ids that also may enforce some restrictions and will trigger emission of LLVM IR attributes. No magic strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this said, I do think not having the kind at the time of initialization potentially adds some complexity to the implementation, as accesses may have to check if their entry exists already and may have to initialize if it isn't.
Knowing through the kind during registration means that operations wouldn't have to consider the possibility of an "uninitialized" entry and could simply just cast their pointer (in the hopes that there wasn't a bug somewhere.)
Could you please clarify how kind helps to understand an "uninitialized" entry?
Quick note about this. I think you're right about the pointer collision, but for objects like device_global the string map is the more frequently used one, which has higher chance of collision the bigger the map is, so if they were to share map with others there could be some negative effects on lookup-performance, though exactly how much is hard to tell.
Could you please elaborate more on this? I'm thinking of
std::unordered_map</*Object address*/void *, /*Object name*/char *>
. Which scenario would lead to a collision?
I think both approaches would work, however if we choose to go with the one that passes a kind, I think we should do it through and enum rather than a string. Since CFE passes the value through without caring what it actually is, only the SYCL headers and the SYCL runtime need to agree on this value, so an enum would be cheaper
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify how kind helps to understand an "uninitialized" entry?
The idea would be to have it create an entry with room for the information needed for the given kind, that is the allocations can be made at registration time rather than some time during execution.
Could you please elaborate more on this? I'm thinking of
std::unordered_map</*Object address*/void *, /*Object name*/char *>
. Which scenario would lead to a collision?
So you're thinking the only shared map is the "ptr -> unique ID" which will be set by the registration, and then the object-specific operations will initialize any other maps they may need? If that's the case, I agree that would be a good solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it seems like we don't have an immediate need for kind
and therefore, to simplify the design, I'm going to remove that argument - please let me know if you have any objections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please let me know if you have any objections
Can we have a long int argument for future use? We can have on RT level only as a starting point - __register_uniquely_identifiable_object
takes long int 0 as the last argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please let me know if you have any objections
Can we have a long int argument for future use? We can have on RT level only as a starting point -
__register_uniquely_identifiable_object
takes long int 0 as the last argument.
Ok, that works for me. I will then update the doc:
- the runtime registration function has the extra long int (enum) argument
- the new attribute does not have the argument
This way, our FE and middle-end implementations will be simpler, but our runtime will be ready for future potential change making it easier to maintain the same ABI
**TODO**: alternatively, we could completely re-use existing | ||
`[[__sycl_detail__::device_global]]` attribute and introduce another one for | ||
specialization constants, i.e. it is a question of whether or not we want to | ||
generalize unique IDs generation in form of a generic attribute or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the CFE team preferred having a separate C++ attribute for each set of restriction.
I think [[__sycl_detail__::uniquely_identifiable_object(kind)]]
raises the same concerns as [[__sycl_detail__::restrictions("device_global")]]
explained by @elizabethandrews
here #4686 (comment) . In order to implement such attribute we should hardcode and keep track of possible "kinds" of uniquely identifiable objects in FE, meaning the FE should diagnose each time the passed kind is not valid and it should be updated for each new kind.
approach we are taking and the decision is made based on whether or not | ||
3rd-party host compiler is in use. | ||
|
||
If `-fsycl-host-compiler` option is present, the compiler driver chooses the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if -fsycl-host-compiler=clang++
is passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly don't know what is the algorithm of determining an absolute path to a 3rd-party host compiler specified through that argument, but I suggest that we always assume that any compiler passed through the flag is a 3rd-party one: I simply don't think that it worth to implement compiler detection here, because even clang++
can point to the upstream clang, for example and it doesn't have customizations we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly don't know what is the algorithm of determining an absolute path to a 3rd-party host compiler specified through that argument, but I suggest that we always assume that any compiler passed through the flag is a 3rd-party one
I don't know how this option works either. Maybe It is worth to double check how does it work in corner case I mentioned. Maybe it would be good to mention how this works as well.
That is correct and I also don't think that it will be huge. When RT loads a device image, it could scan both: host -> device mapping and list of spec constants from properties and build direct host address -> properties mapping to still have O(constant) access to them in
My understanding is that at LLVM IR level there will be no issue with variables shadowing: all those variables will be somehow mangled and still annotated with attributes, so we should be able to identify each one of them quite easily. |
I suspect you are correct. It would be good to have a section in the design document about this. |
Copied existing section about shadowed variables and integration footer in 7373409. |
|
||
Another example is device global [implementation][device-global-design], where | ||
in order to communicate a value of `device_global` variable between host and | ||
device we need to map its host address to a symbolic name/identifier and some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specify what 'some other info' is. This is bit vague
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an example in 5b69048, but I don't think that we need to put all the details into this doc: device_global
implementation design is covered by a separate document
not integration footer is enabled in the compiler driver instead of creating | ||
a special macro for differentiating our own host compiler. | ||
|
||
The suggested macro name is `__INTEL_SYCL_HOST_COMPILER__`. It should be defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we can use 'INTEL' in the name if we're proposing this for syclos, especially if we want to upstream this at some point. I'm also not sure if we need senior management input for using INTEL in macro name. I guess its less of a concern if its not a documented macro, but it might be better to just have a macro which indicates footer is included or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just remembered that there is a __SYCL_SINGLE_SOURCE__
macro documented in the spec, which we should use here.
AFAIK, it is not yet supported, but now it seems to be the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use __SYCL_SINGLE_SOURCE__
for this purpose. The intent of that macro is to indicate a compiler that produces both host and device code in a single pass (i.e. the CFE runs just once, and then both host and device code are produced from that common IR). This is not the case for DPC++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent of that macro is to indicate a compiler that produces both host and device code in a single pass (i.e. the CFE runs just once, and then both host and device code are produced from that common IR).
Well, the spec says literally nothing about the number of CFE runs and based on the spec wording, DPC++ seems to be a single source compiler:
which produces host as well as device binary;
Anyway, I'm perfectly fine with using another macro introduced by ourselves, we just need to be able to detect that we are in integration footer mode, which will imply that we are in 3rd-party host compiler mode, which will mean do not use an attribute to avoid the warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of spec matches @AlexeySachkov's. Unless 'single-source compiler' has standardized meaning I am unaware of, it sounds like __SYCL_SINGLE_SOURCE__
works here
SYCL_SINGLE_SOURCE is defined to 1 if the source file is being compiled with a SYCL single-source compiler which produces host as well as device binary;
If we can't use this, WDYT about just having a macro INTEGRATION_FOOTER defined in footer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't use this, WDYT about just having a macro INTEGRATION_FOOTER defined in footer.
We need this macro available in regular SYCL headers, so I think that we will end up with setting something internal (__MACRO_NAME__
) in the driver
specialization_id<int> spec_const(38); | ||
``` | ||
|
||
After processed by DPC++ compiler, it will result in the following LLVM IR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean LLVM IR for device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean for both host and device: there is a paragraph a bit below, which says that. Please let me know how to better rearrange and/or rephrase that to make it more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should mention at the very beginning of attributes description that they trigger emission of new data for both host and device code. This a unique situation among SYCL attributes, so I kind of assumed that it will happen only for device code.
No description provided.