Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can "item" and "nd_item" be shared across work-items? #532

Open
gmlueck opened this issue Jan 18, 2024 · 6 comments
Open

Can "item" and "nd_item" be shared across work-items? #532

gmlueck opened this issue Jan 18, 2024 · 6 comments

Comments

@gmlueck
Copy link
Contributor

gmlueck commented Jan 18, 2024

This question recently came up from our implementation team. What is the precise semantics of the member functions on item and nd_item? Do they return the index of the calling work-item or do they return the index of the work-item that created the object?

The question is relevant only in the weird case where an item or nd_item object from one work-item is copied into SLM and then referenced by another work-item. This could be done, for example, by using placement-new to copy the item or nd_item object.

Note that id may have a different semantic because it is user-constructible. The semantics of id seem clear. The member functions return the information that was used to construct the id object. Therefore, if work-item A copies its id object to SLM and then another work-item B calls its member functions, I presume work-item B would get the index of work-item A.

@AlexeySachkov
Copy link
Contributor

Linking #451 here. sub_group is probably different enough to be handled differently compared to item and nd_item, but we had the same question about it as well.

@illuhad
Copy link
Contributor

illuhad commented Jan 18, 2024

I don't think these classes are meaningful outside of their own work item, because they generally carry the semantics of "give me the position of this item in the parallel iteration space". Using objects from a different work item should be illegal and/or UB.

If one did this, even just within one implementation the behavior might not be consistent. For example, whether AdaptiveCpp stores the position in the iteration space inside nd_item/item or whether it queries this on demand using builtins changes depending on compilation flow and configuration.

@gmlueck
Copy link
Contributor Author

gmlueck commented Jan 18, 2024

I think this makes sense also. I was wondering why we didn't define all the member functions of nd_item and item as static. That would make it clear that they only return information about the calling work-item, since static member functions don't have access to any state in the object. Did we decide not to define them as static to give implementations the freedom to store information about the calling work-item in the object?

In any case, I think we could add text to the introduction for each of these objects something like:

The implementation constructs an item object for each work-item when a kernel is submitted with a range iteration space. That object (and copies of that object) may be used only by the work-item that "owns" it. Although it is possible to copy the object into memory that is accessible to other work-items or to the host, calls to member functions of the object from other work-items or from the host produce undefined behavior.

I think we could use similar language for sub_group and group also.

@Pennycook
Copy link
Contributor

I think this makes sense also. I was wondering why we didn't define all the member functions of nd_item and item as static. That would make it clear that they only return information about the calling work-item, since static member functions don't have access to any state in the object. Did we decide not to define them as static to give implementations the freedom to store information about the calling work-item in the object?

I agree that freedom of implementation is important. But I think it may also be important to note that if the functions were static, a developer wouldn't need an instance of an nd_item to call them. That would mean that developers could call nd_item member functions outside of kernels, call an nd_item member function using a class with the wrong number of dimensions, etc.

@tomdeakin
Copy link
Contributor

Discussed in WG meeting 01/25/24.
Next steps: introduce a new section similar to https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:reference-semantics to group these restrictions together. Or clarify these directly. Clarification to SYCL 2020.

bader added a commit to bader/SYCL-CTS that referenced this issue Feb 17, 2024
This change allows stateless nd_item implementation as discussed in
KhronosGroup/SYCL-Docs#532.
The list of changes includes removal of checks which compare nd_teams
shared between work-items or on the host.
Now test checks equality only on device within single work-item.
bader added a commit to bader/SYCL-CTS that referenced this issue Feb 17, 2024
This change allows stateless nd_item implementation as discussed in
KhronosGroup/SYCL-Docs#532.
The list of changes includes removal of checks which compare nd_teams
shared between work-items or on the host.
Now test checks equality only on device within single work-item.
bader added a commit to bader/SYCL-CTS that referenced this issue Feb 17, 2024
This change allows stateless nd_item implementation as discussed in
KhronosGroup/SYCL-Docs#532.
The list of changes includes removal of checks which compare nd_teams
shared between work-items or on the host.
Now test checks equality only on device within single work-item.
bader added a commit to bader/SYCL-CTS that referenced this issue Feb 17, 2024
The problem with removed code is that it compares nd_item "states"
obtained by calling methods like get_local_id() on the host. The
discussion in KhronosGroup/SYCL-Docs#532
clarfies that these methods are supposed to return meaningful values
from the kernel code only.

nd_item constructors are tested in device code, so this change do not
reduce feature coverage.
@bader
Copy link
Contributor

bader commented Feb 17, 2024

FYI. I committed stateless implementation of nd_item to DPC++ compiler. This commit exposed SYCL-CTS tests relying on stateful implementation or wrong uses of nd_item methods on the host.

  1. nd_item_equality test validates one of the rules for "by-value semantic" - https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:byval-semantics. The test allocates a buffer to store 2 nd_item objects, run a kernel which stores two nd_items from different work-items and finally uses comparison operators to check nd_items to check following rule:

T must be equality comparable on the host application (in the case where T is available on the host) and within SYCL kernel functions. Equality between two instances of T (i.e. a == b) must be true if the value of all members are equal and non-equality between two instances of T (i.e. a != b) must be true if the value of any members are not equal, unless either instance has become invalidated by a move operation. By extension of the requirements above, equality on T must guarantee to be reflexive (i.e. a == a), symmetric (i.e. a == b implies b == a and a != b implies b != a) and transitive (i.e. a == b && b == c implies c == a).

As it was agreed above sharing and comparing stateless nd_item objects doesn't make sense and I'd like to drop this test.

  1. nd_item_constructors compares different nd_item objects using more reliable technique. It saves the "state" of nd_item object by obtaining and saving ranges/ids via calling methods like get_local_id(). Unfortunately, it does it on the host too!

KhronosGroup/SYCL-CTS#869 updates both tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants