Skip to content

[SYCL][Bindless][1/4] Add experimental implementation of SYCL bindless images extension #9808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Seanst98
Copy link
Contributor

@Seanst98 Seanst98 commented Jun 9, 2023

This PR stands as the first PR of four to make code review easier, covering the changes made to libclc.

See the original whole PR.

Co-authored-by: Isaac Ault isaac.ault@codeplay.com
Co-authored-by: Hugh Bird hugh.bird@codeplay.com
Co-authored-by: Duncan Brawley duncan.brawley@codeplay.com
Co-authored-by: Przemek Malon przemek.malon@codeplay.com
Co-authored-by: Chedy Najjar chedy.najjar@codeplay.com
Co-authored-by: Sean Stirling sean.stirling@codeplay.com
Co-authored-by: Peter Zuzek peter@codeplay.com

This is the first part of the implementation of revision 3 of the bindless images extension proposal.

…s images extension

This commit stands as the first commit of four to make code review easier,
covering the changes made to libclc.

Co-authored-by: Isaac Ault <isaac.ault@codeplay.com>
Co-authored-by: Hugh Bird <hugh.bird@codeplay.com>
Co-authored-by: Duncan Brawley <duncan.brawley@codeplay.com>
Co-authored-by: Przemek Malon <przemek.malon@codeplay.com>
Co-authored-by: Chedy Najjar <chedy.najjar@codeplay.com>
Co-authored-by: Sean Stirling <sean.stirling@codeplay.com>
Co-authored-by: Peter Zuzek <peter@codeplay.com>

Implement revision 3 of the bindless images extension proposal:
https://github.com/intel/llvm/blob/8a48106617a9e2365c1935302d20e53bc7174368/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images/sycl_ext_oneapi_bindless_images.asciidoc
@Seanst98 Seanst98 requested review from a team as code owners June 9, 2023 11:12
@Seanst98 Seanst98 requested review from againull and npmiller June 9, 2023 11:12
@Seanst98
Copy link
Contributor Author

Seanst98 commented Jun 9, 2023

Tagging @gmlueck @jbrodman @bashbaug

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks good, but I would prefer we only use a COMMA macro if absolutely necessary, which it can be in the right cases but here we might as well use variadics.

Also, @npmiller | @jchlanda feel free to have a look. I believe we should be able to merge these without the extension document being in given it doesn't introduce any public interfaces.

imageHandle, coord_parameter, data_input); \
}

#define COMMA ,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with a bit of reordering and using variadic macros (... argument and __VA_ARGS__) you can remove the need for this COMMA macro.

I.e. for _CLC_DEFINE_IMAGE_BINDLESS_READ_BUILTIN you should be able to do something like:

#define _CLC_DEFINE_IMAGE_BINDLESS_READ_BUILTIN(                               \
    elem_t, dimension, elem_t_mangled, vec_size, coord_mangled, coord_input,   \
    ...)                                                           \
  _CLC_DEF                                                                     \
  elem_t _Z17__spirv_ImageReadI##elem_t_mangled##m##coord_mangled##ET_T0_T1_(  \
      unsigned long imageHandle, coord_input) {                                \
    return __nvvm_suld_##dimension##d_##vec_size##_clamp_s(imageHandle,        \
                                                           __VA_ARGS__);   \
  }

while for _CLC_DEFINE_IMAGE_BINDLESS_WRITE_BUILTIN you could do something similar, but you will need to reorder the arguments so the variadic one is last. Likewise you would have to go through the uses of it and make the appropriate reordering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. We will change to use __VA_ARGS__, although for the mipmap macros this might require splitting our current single macro into multiple, at least one for each dimension

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the non-mipmap macros to take __VA_ARGS__ instead of COMMA separated arguments.

For the mipmap macros, however, it is not possible as there are multiple COMMA separated parameters that feed into different function calls within the macro. We could triplicate all the mipmap macros (per each dimension), but then we would have a very large list of parameters to these macros.

I've come up with the following compromise, which uses a comma in a macro, however I think the end result is much more readable.

#define COORD_INPUT_1D float x
#define COORD_INPUT_2D float x, float y
#define COORD_INPUT_3D float x, float y, float z

#define COORD_PARAMS_1D x
#define COORD_PARAMS_2D x, y
#define COORD_PARAMS_3D x, y, z

#define GRAD_INPUT_1D float dX, float dY
#define GRAD_INPUT_2D float dXx, float dXy, float dYx, float dYy
#define GRAD_INPUT_3D float dXx, float dXy, float dXz, float dYx, float dYy, float dYz

_CLC_DEFINE_MIPMAP_BINDLESS_THUNK_READS_BUILTIN(float, 1, f32, v4f32, COORD_INPUT_1D, COORD_PARAMS_1D, GRAD_INPUT_1D, dX, dY)
_CLC_DEFINE_MIPMAP_BINDLESS_THUNK_READS_BUILTIN(float, 2, f32, v4f32, COORD_INPUT_2D, COORD_PARAMS_2D, GRAD_INPUT_2D, dXx, dXy, dYx, dYy)
_CLC_DEFINE_MIPMAP_BINDLESS_THUNK_READS_BUILTIN(float, 3, f32, v4f32, COORD_INPUT_3D, COORD_PARAMS_3D, GRAD_INPUT_3D, dXx, dXy, dXz, dYx, dYy, dYz)

The last parameters are left as __VA_ARGS__.

Let me know what your thoughts are on this. If necessary, we can split the mipmap macros into three (per dimension), but like I said the readability would be much worse in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good, thanks!

Comment on lines +1053 to +1057
ret = __nvvm_suld_1d_v4i16_clamp_s(imageHandle, coord);
short2 b;
b.x = ret.x;
b.y = ret.y;
return b;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can abstract return value fixup (like here, or (short)(__builtin)[0] into some kind of helper and then lift the mapping from API to the underlying impl with some DSL-like code (either via macros/templates or a standalone code generator). I'm not 100% sure it would work though. Any thoughts on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it could be done, however it seems to be a common pattern across the codebase (builtins) if we introduced it in here I'd be tempted to say we should somehow generalise it and roll out everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generalizing everywhere is beyond what we are trying to achieve with this PR, and as we are already pressed for time to get all our extension PRs in before feature freeze, I think we should leave this as is.


// Generated funcs -- READS
// int -- int4 already defined
int __nvvm_suld_1d_i32_clamp_s(long, int) __asm("llvm.nvvm.suld.1d.i32.clamp");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of the functions here mimics the convention of intrinsics (__nvvm..., there is already a bunch of them defined), is it safe to follow that convention, should the list of supported intrinsics extend we would crash with them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing these functions, we followed the style of the existing intrinsic wrappers, e.g. here

I think it's unlikely that the NVVM intrinsics will be extended with names that have the _s suffix like we do here for the helpers/wrappers. If they do, then other code will break as well, not just ours. I'm leaning towards leaving the names as is, following the naming style set out by others in the same file, but we're open to discussion on it. If we change it, do we then update the functions not created by our PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of the _s suffix is a good point, that I've not noticed before, happy for it to stay as is.

char2 __nvvm_suld_1d_v2i8_clamp_s(long imageHandle, int coord) {
short2 a = __nvvm_suld_1d_v2i8_clamp_s_helper(imageHandle, coord);
char2 ret;
ret.x = (char) a.x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to fold clang-formatting into your generation script? There is a bunch of things that could do with reformatting here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've formatted all of our non-macro code and added some whitespace where the code was a bit crowded. I think it made some sections of the code slightly less readable though, but if we want to enforce clang-formatting with line length limit then that's how it will turn out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have clang-format on by default on all the code, with no modifications; there is already a rules file that can be used. In my experience, it is always beneficial in a long run, as in one less problem to think about.

Comment on lines +1053 to +1057
ret = __nvvm_suld_1d_v4i16_clamp_s(imageHandle, coord);
short2 b;
b.x = ret.x;
b.y = ret.y;
return b;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it could be done, however it seems to be a common pattern across the codebase (builtins) if we introduced it in here I'd be tempted to say we should somehow generalise it and roll out everywhere?

Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl/include/** changes lgtm.

@ldrumm
Copy link
Contributor

ldrumm commented Jun 21, 2023

@intel/llvm-reviewers-runtime Looks like the original authors have no permission to respond to feedback.

"An owner of this repository has limited the ability to comment to users that have contributed to this repository in the past. "

This seems like it might make contribution difficult. Is this expected?

In any case, could you please grant the co-authors permission to comment?

Co-Authors: @Seanst98, @isaacault, @przemektmalon, @DBDuncan, @ProGTX

@alycm
Copy link
Contributor

alycm commented Jun 21, 2023

In any case, could you please grant the co-authors permission to comment?

Co-Authors: @Seanst98, @isaacault, @przemektmalon, @DBDuncan, @ProGTX

I've submitted a request for that permission to be set.

- Formatted non-macro code in image.cl
- The COMMA macro has been mostly removed in favour of using
  __VA_ARGS__
- Mipmap macros have been modified, retaining some commas, for
  better readability
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@przemektmalon przemektmalon temporarily deployed to aws June 22, 2023 13:00 — with GitHub Actions Inactive
@przemektmalon przemektmalon temporarily deployed to aws June 22, 2023 13:49 — with GitHub Actions Inactive
@ProGTX
Copy link
Contributor

ProGTX commented Jun 23, 2023

Please note that this PR is now implementing the proposal at #9842 , which is now in revision 4. Changes from revision 3 to 4 have no impact on this PR, though.

@ProGTX
Copy link
Contributor

ProGTX commented Jun 23, 2023

The CI failures seem unrelated to our changes, what is the process here, do we need to wait and rebase until they're fixed in the sycl branch (or in the CI in general), or can this be merged regardless?

@steffenlarsen
Copy link
Contributor

@przemektmalon | @Seanst98 - Could you please push a merge commit to rerun CI?

@przemektmalon przemektmalon temporarily deployed to aws June 28, 2023 10:13 — with GitHub Actions Inactive
@przemektmalon przemektmalon temporarily deployed to aws June 28, 2023 11:12 — with GitHub Actions Inactive
@przemektmalon
Copy link
Contributor

@przemektmalon | @Seanst98 - Could you please push a merge commit to rerun CI?

@steffenlarsen Merge commit is pushed and CI passes now

@steffenlarsen steffenlarsen merged commit 58a8f20 into intel:sycl Jun 28, 2023
steffenlarsen pushed a commit that referenced this pull request Jul 6, 2023
Replaces #8307

Required for #9808 and
#10112 .

Addressed some of the feedback on revision 3 as revision 4. Any larger
changes will need to move to subsequent revisions.

---------

Co-authored-by: Przemek Malon <przemek.malon@codeplay.com>
Co-authored-by: Isaac Ault <isaac.ault@codeplay.com>
Co-authored-by: Sean Stirling <sean.stirling@codeplay.com>
Co-authored-by: Duncan Brawley <duncan.brawley@codeplay.com>
steffenlarsen pushed a commit that referenced this pull request Aug 2, 2023
…CL bindless images extension (#10500)

# Experimental Implementation of SYCL Bindless Images Extension

This commit stands as the fourth, and final, commit of four to make code
review easier, covering the additional tests for bindless images to the
e2e test suite. Implementing [revision 4 of the bindless images
extension proposal](#9842).

This will not compile or run until
[PR3](#10454) has been merged.
However, it can be reviewed simultaneously with PR3.

## Overview

The bindless images extension provides a new interface for allocating,
creating, and accessing images in SYCL. Image memory allocation is
seperated from image handle creation, and image handles can be passed to
kernels without requesting access through accessors. This approach
provides much more flexibility to the user, as well as enabling programs
to implement features that were impossible to implement using standard
SYCL images, such as a texture atlas. In addition to providing a new
interface for images, this extension also provides initial experimental
support for importing external memory into SYCL.

## Previous PRs
* [1/4] [libclc](#9808)
* [2/4] [PI/UR](#10112)
* [3/4] [SYCL API](#10454)
* [4/4] Tests <--- This one

## Authors
Co-authored-by: Isaac Ault isaac.ault@codeplay.com
Co-authored-by: Hugh Bird hugh.bird@codeplay.com
Co-authored-by: Duncan Brawley duncan.brawley@codeplay.com
Co-authored-by: Przemek Malon przemek.malon@codeplay.com
Co-authored-by: Chedy Najjar chedy.najjar@codeplay.com
Co-authored-by: Sean Stirling sean.stirling@codeplay.com
Co-authored-by: Peter Zuzek peter@codeplay.com
Co-authored-by: SYCL Unbound Team <sycl.unbound@codeplay.com>
mdtoguchi pushed a commit to mdtoguchi/llvm that referenced this pull request Oct 18, 2023
…CL bindless images extension (intel#10500)

# Experimental Implementation of SYCL Bindless Images Extension

This commit stands as the fourth, and final, commit of four to make code
review easier, covering the additional tests for bindless images to the
e2e test suite. Implementing [revision 4 of the bindless images
extension proposal](intel#9842).

This will not compile or run until
[PR3](intel#10454) has been merged.
However, it can be reviewed simultaneously with PR3.

## Overview

The bindless images extension provides a new interface for allocating,
creating, and accessing images in SYCL. Image memory allocation is
seperated from image handle creation, and image handles can be passed to
kernels without requesting access through accessors. This approach
provides much more flexibility to the user, as well as enabling programs
to implement features that were impossible to implement using standard
SYCL images, such as a texture atlas. In addition to providing a new
interface for images, this extension also provides initial experimental
support for importing external memory into SYCL.

## Previous PRs
* [1/4] [libclc](intel#9808)
* [2/4] [PI/UR](intel#10112)
* [3/4] [SYCL API](intel#10454)
* [4/4] Tests <--- This one

## Authors
Co-authored-by: Isaac Ault isaac.ault@codeplay.com
Co-authored-by: Hugh Bird hugh.bird@codeplay.com
Co-authored-by: Duncan Brawley duncan.brawley@codeplay.com
Co-authored-by: Przemek Malon przemek.malon@codeplay.com
Co-authored-by: Chedy Najjar chedy.najjar@codeplay.com
Co-authored-by: Sean Stirling sean.stirling@codeplay.com
Co-authored-by: Peter Zuzek peter@codeplay.com
Co-authored-by: SYCL Unbound Team <sycl.unbound@codeplay.com>
@Seanst98 Seanst98 deleted the codeplay/bindless_images_libclc branch January 31, 2024 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants