Skip to content

[SYCL] Fix linking of kernel-bundles. #4398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

cperkinsintel
Copy link
Contributor

when fsycl-device-code-split=per_kernel is used in conjunction with multiple kernels, then explicit sycl::link(sycl::compile(my-kernel-bundle)) will fail, because we link all device images in a kernel together. Any shared top level references will have conflicting symbols. But, the device-images are independent, there is presently no known situation where they need to be linked together. So this PR works around this limitation by linking them separately. This may have to be revisited once sharedlibrary style linking is supported, but that is likely true whether this bug is fixed or not.

Signed-off-by: Chris Perkins chris.perkins@intel.com

when fsycl-device-code-split=per_kernel is used in conjunction with multiple kernels, then explicit sycl::link(sycl::compile(my-kernel-bundle)) will fail, because we link all device images in a kernel together. But, the device-images are independent, there is presently no known situation where they should be linked together. So this PR works around this limitation by linking them seperately. This may have to be revisited once sharedlibrary style linking is supported, but that is likely true whether this bug is fixed or not.

Signed-off-by: Chris Perkins <chris.perkins@intel.com>
@gmlueck
Copy link
Contributor

gmlueck commented Aug 26, 2021

I'd like to understand the test case that caused the original problem. Can you point me to it or send additional information? The description just says that sycl::link(sycl::compile(my-kernel-bundle)) fails. What is my-kernel-bundle in that case?

@gmlueck
Copy link
Contributor

gmlueck commented Aug 27, 2021

Apologies in advance for a long comment. I'd like to explain how to think about the semantics of the kernel bundle linking operation.

The semantics are easiest to describe if you think about an idealized implementation where a kernel bundle always contains just a single device image. In such an idealized case, b1 = get_kernel_bundle<bundle_state::input>(ctxt, {id1, id2, id3}) returns a bundle with a single device image that contains several kernels. Calling b2 = compile(b1) compiles that single device image into object state and returns a bundle with that one object-state device image. Calling b3 = link(b2) doesn't really link anything together because there's just one device image. Instead, it just translates the object-state device image into an executable-state device image.

If the an implementation has more than one device image in a kernel bundle, the semantics of the operations still need to match the idealized case. Consider the case above where b1 contains three device images (one each for kernels id1, id2, and id3). We still consider b1 as containing one logical device image. Therefore, compile() compiles this one logical device image to single logical object-state image. In reality, this may perform three compilation steps in order to compile each physical device image into object state. Likewise, link() translates the one logical device image into executable state. Since there's only one logical device image, there's nothing to link against. In reality, the implementation translates each physical device image into executable state.

At this point, we can see that your proposed fix does make sense. You've changed the code to link each device image separately, rather than linking them all together.

As an aside, it would also be legal for the implementation to combine the physical device images together, either as part of the compile() operation or as part of the link() operation. However, if the implementation elects to do this, it needs to somehow tolerate duplicate copies of the same function in each physical device image, for example, by discarding duplicates and keeping only one copy.

Now let's consider the case where the application links several kernel bundles together:

auto b1 = get_kernel_bundle<bundle_state::input>(ctxt, {id1});
auto b2 = get_kernel_bundle<bundle_state::input>(ctxt, {id2});
auto b1a = compile(b1);
auto b2a = compile(b2);
auto b3 = link({b1a, b2a});

First, consider the idealized case. Bundles b1, b2, b1a, and b2a each contain just one device image. The link() operation combines two device images together, resolves undefined symbol references from b1a to definitions in b2a and vice-versa, and produces a single linked device image.

Now, let's consider the non-idealized case, but make the simplifying assumption that there are no undefined symbols in any device image. Even though b1a and b2a may each contain many physical device images, there's no need to actually link anything together because we know there aren't any undefined symbols to resolve. Therefore, it's still safe to simply translate each device image individually into executable state.

Thus, we can see that your proposed solution works in this case too.

Finally, let's consider the case where device images in b1a and b2a might contain references to undefined symbols. In this case, your proposed solution breaks down. We really do need to link devices images in b1a against device images in b2a in order to resolve these undefined references.

As I said in another forum, there is no way in standard SYCL to create a kernel bundle that has an unresolved reference to a symbol. However, it is possible to get this situation using either OpenCL or Level Zero backend interop APIs. If we expect to support these scenarios, then I think more work needs to be done for the link() implementation beyond what you have here.

…07 which is not yet merged.

Signed-off-by: Chris Perkins <chris.perkins@intel.com>
@cperkinsintel cperkinsintel marked this pull request as ready for review September 1, 2021 16:52
@cperkinsintel cperkinsintel requested a review from a team as a code owner September 1, 2021 16:52
@@ -192,7 +192,6 @@ class kernel_bundle_impl {
// TODO: Unify with c'tor for sycl::comile and sycl::build by calling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test to sycl/unittests/SYCL2020 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lit test for the multiple kernel stuff here: intel/llvm-test-suite#440
Is that what you are asking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I was asking about adding a unit/component test in sycl/unittests/SYCL2020 directory of this repo. This test should be able to validate this behavior without any backend available.
Let's do it in a separate PR?

@gmlueck
Copy link
Contributor

gmlueck commented Sep 3, 2021

I think it would be worth adding a comment explaining the limitations. Something like:

The loop below just links each device image separately, not linking any two device images together. This is correct so long as each device image has no unresolved symbols. That's the case when device images are created from generic SYCL APIs. There's no way in generic SYCL to create a kernel which references an undefined symbol. If we decide in the future to allow a backend interop API to create a "sycl::kernel_bundle" that references undefined symbols, then the logic in this loop will need to be changed.

@bader bader changed the title separate linking of kernel-bundles. [SYCL] Fix linking of kernel-bundles. Sep 3, 2021
@cperkinsintel
Copy link
Contributor Author

The matching test for this is here: intel/llvm-test-suite#440
And here is the run of that and this PR together and passing: http://icl-jenkins.sc.intel.com:8080/job/SYCL_CI/job/intel/job/Lin/job/LLVM_Test_Suite/7433/

Signed-off-by: Chris Perkins <chris.perkins@intel.com>
@cperkinsintel
Copy link
Contributor Author

@gmlueck - thanks. I inserted your comment.

@bader bader requested a review from romanovvlad September 4, 2021 14:02
@romanovvlad romanovvlad merged commit d21082f into intel:sycl Sep 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants