Skip to content

[SYCL][DOC] Design document for new mechanism of host -> device objects mapping #5910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AlexeySachkov
Copy link
Contributor

No description provided.

@AlexeySachkov
Copy link
Contributor Author

This is an initial sketch of the idea: apparently more changes needed to include this doc into the index page, to update specialization constants and device global design documents to incorporate this new design, etc. Before doing that, I would like to get initial feedback on the idea: any comments are welcome!

@AlexeySachkov AlexeySachkov requested a review from mlychkov March 28, 2022 20:22
Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AlexeySachkov,

I think this looks like a good direction. The changes to the device global design should be very minimal since it is essentially the same design. The changes to specialization constants will be a little more, but not much.

I think there will be a slightly higher runtime cost for specialization constants because operations on the host will need to do an extra lookup to find the unique ID string. That doesn't seem very serious to me, though.

When you finish writing up this design, you should add a description for how "shadowed" variable names in anonymous namespaces are handled. I think you can just transfer the information from the device global and specialization constant designs to this one. The impact on the integration footer will be the same as in those other designs. I'm not sure what the impact will be for the custom host compiler, but I'm sure it can be solved.

**TODO**: alternatively, we could completely re-use existing
`[[__sycl_detail__::device_global]]` attribute and introduce another one for
specialization constants, i.e. it is a question of whether or not we want to
generalize unique IDs generation in form of a generic attribute or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a slight preference to keep [[__sycl_detail__::uniquely_identifiable_object(kind)]] and [[__sycl_detail__::device_global]] as separate attributes.

FWIW, I originally proposed a general attribute like [[__sycl_detail__::restrictions("device_global")]], where the parameter would tell the CFE which set of restrictions to enforce. However, the CFE team preferred having a separate C++ attribute for each set of restriction. Thus, we have [[__sycl_detail__::device_global]] instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CFE team preferred having a separate C++ attribute for each set of restriction.

I think [[__sycl_detail__::uniquely_identifiable_object(kind)]] raises the same concerns as [[__sycl_detail__::restrictions("device_global")]] explained by @elizabethandrews
here #4686 (comment) . In order to implement such attribute we should hardcode and keep track of possible "kinds" of uniquely identifiable objects in FE, meaning the FE should diagnose each time the passed kind is not valid and it should be updated for each new kind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, having [[__sycl_detail__::uniquely_identifiable_object(kind)]] for setting LLVM IR attribute and having separate [[__sycl_detail__::device_global]] for semantic checks is the preferable way, right?

In order to implement such attribute we should hardcode and keep track of possible "kinds" of uniquely identifiable objects in FE, meaning the FE should diagnose each time the passed kind is not valid and it should be updated for each new kind.

Note that for LLVM IR attribute generation purposes I do not suggest that FE perform any correctness check for kind argument except that it is non empty: FE here just serves as a translation layer and headers are responsible to pass the correct string there to ensure that everything will work fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, having [[__sycl_detail__::uniquely_identifiable_object(kind)]] for setting LLVM IR attribute and having separate [[__sycl_detail__::device_global]] for semantic checks is the preferable way, right?

That sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that for LLVM IR attribute generation purposes I do not suggest that FE perform any correctness check for kind argument except that it is non empty: FE here just serves as a translation layer and headers are responsible to pass the correct string there to ensure that everything will work fine.

Doesn't that introduce the dependency between the headers that will use the attribute and the compiler's layers which will use the "kind" string? I guess somehow it should be made sure that the string is correct, not issuing a diagnostic from FE will make it up to developers.

So, having [[sycl_detail::uniquely_identifiable_object(kind)]] for setting LLVM IR attribute and having separate [[sycl_detail::device_global]] for semantic checks is the preferable way, right?

In my mind it is preferable to leave [[__sycl_detail__::device_global]] "as is" and introduce another attribute not accepting any arguments for specialization ids that also may enforce some restrictions and will trigger emission of LLVM IR attributes. No magic strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this said, I do think not having the kind at the time of initialization potentially adds some complexity to the implementation, as accesses may have to check if their entry exists already and may have to initialize if it isn't.
Knowing through the kind during registration means that operations wouldn't have to consider the possibility of an "uninitialized" entry and could simply just cast their pointer (in the hopes that there wasn't a bug somewhere.)

Could you please clarify how kind helps to understand an "uninitialized" entry?

Quick note about this. I think you're right about the pointer collision, but for objects like device_global the string map is the more frequently used one, which has higher chance of collision the bigger the map is, so if they were to share map with others there could be some negative effects on lookup-performance, though exactly how much is hard to tell.

Could you please elaborate more on this? I'm thinking of
std::unordered_map</*Object address*/void *, /*Object name*/char *>. Which scenario would lead to a collision?

I think both approaches would work, however if we choose to go with the one that passes a kind, I think we should do it through and enum rather than a string. Since CFE passes the value through without caring what it actually is, only the SYCL headers and the SYCL runtime need to agree on this value, so an enum would be cheaper

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify how kind helps to understand an "uninitialized" entry?

The idea would be to have it create an entry with room for the information needed for the given kind, that is the allocations can be made at registration time rather than some time during execution.

Could you please elaborate more on this? I'm thinking of std::unordered_map</*Object address*/void *, /*Object name*/char *>. Which scenario would lead to a collision?

So you're thinking the only shared map is the "ptr -> unique ID" which will be set by the registration, and then the object-specific operations will initialize any other maps they may need? If that's the case, I agree that would be a good solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it seems like we don't have an immediate need for kind and therefore, to simplify the design, I'm going to remove that argument - please let me know if you have any objections

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexeySachkov

please let me know if you have any objections

Can we have a long int argument for future use? We can have on RT level only as a starting point - __register_uniquely_identifiable_object takes long int 0 as the last argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please let me know if you have any objections

Can we have a long int argument for future use? We can have on RT level only as a starting point - __register_uniquely_identifiable_object takes long int 0 as the last argument.

Ok, that works for me. I will then update the doc:

  • the runtime registration function has the extra long int (enum) argument
  • the new attribute does not have the argument

This way, our FE and middle-end implementations will be simpler, but our runtime will be ready for future potential change making it easier to maintain the same ABI

**TODO**: alternatively, we could completely re-use existing
`[[__sycl_detail__::device_global]]` attribute and introduce another one for
specialization constants, i.e. it is a question of whether or not we want to
generalize unique IDs generation in form of a generic attribute or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CFE team preferred having a separate C++ attribute for each set of restriction.

I think [[__sycl_detail__::uniquely_identifiable_object(kind)]] raises the same concerns as [[__sycl_detail__::restrictions("device_global")]] explained by @elizabethandrews
here #4686 (comment) . In order to implement such attribute we should hardcode and keep track of possible "kinds" of uniquely identifiable objects in FE, meaning the FE should diagnose each time the passed kind is not valid and it should be updated for each new kind.

approach we are taking and the decision is made based on whether or not
3rd-party host compiler is in use.

If `-fsycl-host-compiler` option is present, the compiler driver chooses the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if -fsycl-host-compiler=clang++ is passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know what is the algorithm of determining an absolute path to a 3rd-party host compiler specified through that argument, but I suggest that we always assume that any compiler passed through the flag is a 3rd-party one: I simply don't think that it worth to implement compiler detection here, because even clang++ can point to the upstream clang, for example and it doesn't have customizations we need.

Copy link
Contributor

@Fznamznon Fznamznon Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know what is the algorithm of determining an absolute path to a 3rd-party host compiler specified through that argument, but I suggest that we always assume that any compiler passed through the flag is a 3rd-party one

I don't know how this option works either. Maybe It is worth to double check how does it work in corner case I mentioned. Maybe it would be good to mention how this works as well.

@AlexeySachkov
Copy link
Contributor Author

I think there will be a slightly higher runtime cost for specialization constants because operations on the host will need to do an extra lookup to find the unique ID string. That doesn't seem very serious to me, though.

That is correct and I also don't think that it will be huge. When RT loads a device image, it could scan both: host -> device mapping and list of spec constants from properties and build direct host address -> properties mapping to still have O(constant) access to them in set/get APIs.

I'm not sure what the impact will be for the custom host compiler, but I'm sure it can be solved.

My understanding is that at LLVM IR level there will be no issue with variables shadowing: all those variables will be somehow mangled and still annotated with attributes, so we should be able to identify each one of them quite easily.

@gmlueck
Copy link
Contributor

gmlueck commented Mar 29, 2022

My understanding is that at LLVM IR level there will be no issue with variables shadowing: all those variables will be somehow mangled and still annotated with attributes, so we should be able to identify each one of them quite easily.

I suspect you are correct. It would be good to have a section in the design document about this.

@AlexeySachkov
Copy link
Contributor Author

My understanding is that at LLVM IR level there will be no issue with variables shadowing: all those variables will be somehow mangled and still annotated with attributes, so we should be able to identify each one of them quite easily.

I suspect you are correct. It would be good to have a section in the design document about this.

Copied existing section about shadowed variables and integration footer in 7373409.
Added a section about shadowed variables and custom host compiler in 9758b95


Another example is device global [implementation][device-global-design], where
in order to communicate a value of `device_global` variable between host and
device we need to map its host address to a symbolic name/identifier and some
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify what 'some other info' is. This is bit vague

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an example in 5b69048, but I don't think that we need to put all the details into this doc: device_global implementation design is covered by a separate document

not integration footer is enabled in the compiler driver instead of creating
a special macro for differentiating our own host compiler.

The suggested macro name is `__INTEL_SYCL_HOST_COMPILER__`. It should be defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we can use 'INTEL' in the name if we're proposing this for syclos, especially if we want to upstream this at some point. I'm also not sure if we need senior management input for using INTEL in macro name. I guess its less of a concern if its not a documented macro, but it might be better to just have a macro which indicates footer is included or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just remembered that there is a __SYCL_SINGLE_SOURCE__ macro documented in the spec, which we should use here.

AFAIK, it is not yet supported, but now it seems to be the time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use __SYCL_SINGLE_SOURCE__ for this purpose. The intent of that macro is to indicate a compiler that produces both host and device code in a single pass (i.e. the CFE runs just once, and then both host and device code are produced from that common IR). This is not the case for DPC++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of that macro is to indicate a compiler that produces both host and device code in a single pass (i.e. the CFE runs just once, and then both host and device code are produced from that common IR).

Well, the spec says literally nothing about the number of CFE runs and based on the spec wording, DPC++ seems to be a single source compiler:

which produces host as well as device binary;

Anyway, I'm perfectly fine with using another macro introduced by ourselves, we just need to be able to detect that we are in integration footer mode, which will imply that we are in 3rd-party host compiler mode, which will mean do not use an attribute to avoid the warning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of spec matches @AlexeySachkov's. Unless 'single-source compiler' has standardized meaning I am unaware of, it sounds like __SYCL_SINGLE_SOURCE__ works here

SYCL_SINGLE_SOURCE is defined to 1 if the source file is being compiled with a SYCL single-source compiler which produces host as well as device binary;

If we can't use this, WDYT about just having a macro INTEGRATION_FOOTER defined in footer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't use this, WDYT about just having a macro INTEGRATION_FOOTER defined in footer.

We need this macro available in regular SYCL headers, so I think that we will end up with setting something internal (__MACRO_NAME__) in the driver

@AlexeySachkov AlexeySachkov requested a review from a team as a code owner April 4, 2022 10:18
specialization_id<int> spec_const(38);
```

After processed by DPC++ compiler, it will result in the following LLVM IR:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean LLVM IR for device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean for both host and device: there is a paragraph a bit below, which says that. Please let me know how to better rearrange and/or rephrase that to make it more clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should mention at the very beginning of attributes description that they trigger emission of new data for both host and device code. This a unique situation among SYCL attributes, so I kind of assumed that it will happen only for device code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants