-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][Bindless][1/4] Add experimental implementation of SYCL bindless images extension #9808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Bindless][1/4] Add experimental implementation of SYCL bindless images extension #9808
Conversation
…s images extension This commit stands as the first commit of four to make code review easier, covering the changes made to libclc. Co-authored-by: Isaac Ault <isaac.ault@codeplay.com> Co-authored-by: Hugh Bird <hugh.bird@codeplay.com> Co-authored-by: Duncan Brawley <duncan.brawley@codeplay.com> Co-authored-by: Przemek Malon <przemek.malon@codeplay.com> Co-authored-by: Chedy Najjar <chedy.najjar@codeplay.com> Co-authored-by: Sean Stirling <sean.stirling@codeplay.com> Co-authored-by: Peter Zuzek <peter@codeplay.com> Implement revision 3 of the bindless images extension proposal: https://github.com/intel/llvm/blob/8a48106617a9e2365c1935302d20e53bc7174368/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images/sycl_ext_oneapi_bindless_images.asciidoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks good, but I would prefer we only use a COMMA
macro if absolutely necessary, which it can be in the right cases but here we might as well use variadics.
Also, @npmiller | @jchlanda feel free to have a look. I believe we should be able to merge these without the extension document being in given it doesn't introduce any public interfaces.
imageHandle, coord_parameter, data_input); \ | ||
} | ||
|
||
#define COMMA , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with a bit of reordering and using variadic macros (...
argument and __VA_ARGS__
) you can remove the need for this COMMA
macro.
I.e. for _CLC_DEFINE_IMAGE_BINDLESS_READ_BUILTIN
you should be able to do something like:
#define _CLC_DEFINE_IMAGE_BINDLESS_READ_BUILTIN( \
elem_t, dimension, elem_t_mangled, vec_size, coord_mangled, coord_input, \
...) \
_CLC_DEF \
elem_t _Z17__spirv_ImageReadI##elem_t_mangled##m##coord_mangled##ET_T0_T1_( \
unsigned long imageHandle, coord_input) { \
return __nvvm_suld_##dimension##d_##vec_size##_clamp_s(imageHandle, \
__VA_ARGS__); \
}
while for _CLC_DEFINE_IMAGE_BINDLESS_WRITE_BUILTIN
you could do something similar, but you will need to reorder the arguments so the variadic one is last. Likewise you would have to go through the uses of it and make the appropriate reordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. We will change to use __VA_ARGS__
, although for the mipmap macros this might require splitting our current single macro into multiple, at least one for each dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the non-mipmap macros to take __VA_ARGS__
instead of COMMA
separated arguments.
For the mipmap macros, however, it is not possible as there are multiple COMMA
separated parameters that feed into different function calls within the macro. We could triplicate all the mipmap macros (per each dimension), but then we would have a very large list of parameters to these macros.
I've come up with the following compromise, which uses a comma in a macro, however I think the end result is much more readable.
#define COORD_INPUT_1D float x
#define COORD_INPUT_2D float x, float y
#define COORD_INPUT_3D float x, float y, float z
#define COORD_PARAMS_1D x
#define COORD_PARAMS_2D x, y
#define COORD_PARAMS_3D x, y, z
#define GRAD_INPUT_1D float dX, float dY
#define GRAD_INPUT_2D float dXx, float dXy, float dYx, float dYy
#define GRAD_INPUT_3D float dXx, float dXy, float dXz, float dYx, float dYy, float dYz
_CLC_DEFINE_MIPMAP_BINDLESS_THUNK_READS_BUILTIN(float, 1, f32, v4f32, COORD_INPUT_1D, COORD_PARAMS_1D, GRAD_INPUT_1D, dX, dY)
_CLC_DEFINE_MIPMAP_BINDLESS_THUNK_READS_BUILTIN(float, 2, f32, v4f32, COORD_INPUT_2D, COORD_PARAMS_2D, GRAD_INPUT_2D, dXx, dXy, dYx, dYy)
_CLC_DEFINE_MIPMAP_BINDLESS_THUNK_READS_BUILTIN(float, 3, f32, v4f32, COORD_INPUT_3D, COORD_PARAMS_3D, GRAD_INPUT_3D, dXx, dXy, dXz, dYx, dYy, dYz)
The last parameters are left as __VA_ARGS__
.
Let me know what your thoughts are on this. If necessary, we can split the mipmap macros into three (per dimension), but like I said the readability would be much worse in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good, thanks!
ret = __nvvm_suld_1d_v4i16_clamp_s(imageHandle, coord); | ||
short2 b; | ||
b.x = ret.x; | ||
b.y = ret.y; | ||
return b; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can abstract return value fixup (like here, or (short)(__builtin)[0]
into some kind of helper and then lift the mapping from API to the underlying impl with some DSL-like code (either via macros/templates or a standalone code generator). I'm not 100% sure it would work though. Any thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it could be done, however it seems to be a common pattern across the codebase (builtins) if we introduced it in here I'd be tempted to say we should somehow generalise it and roll out everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generalizing everywhere is beyond what we are trying to achieve with this PR, and as we are already pressed for time to get all our extension PRs in before feature freeze, I think we should leave this as is.
|
||
// Generated funcs -- READS | ||
// int -- int4 already defined | ||
int __nvvm_suld_1d_i32_clamp_s(long, int) __asm("llvm.nvvm.suld.1d.i32.clamp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of the functions here mimics the convention of intrinsics (__nvvm...
, there is already a bunch of them defined), is it safe to follow that convention, should the list of supported intrinsics extend we would crash with them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing these functions, we followed the style of the existing intrinsic wrappers, e.g. here
I think it's unlikely that the NVVM intrinsics will be extended with names that have the _s
suffix like we do here for the helpers/wrappers. If they do, then other code will break as well, not just ours. I'm leaning towards leaving the names as is, following the naming style set out by others in the same file, but we're open to discussion on it. If we change it, do we then update the functions not created by our PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of the _s
suffix is a good point, that I've not noticed before, happy for it to stay as is.
char2 __nvvm_suld_1d_v2i8_clamp_s(long imageHandle, int coord) { | ||
short2 a = __nvvm_suld_1d_v2i8_clamp_s_helper(imageHandle, coord); | ||
char2 ret; | ||
ret.x = (char) a.x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to fold clang-formatting into your generation script? There is a bunch of things that could do with reformatting here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've formatted all of our non-macro code and added some whitespace where the code was a bit crowded. I think it made some sections of the code slightly less readable though, but if we want to enforce clang-formatting with line length limit then that's how it will turn out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have clang-format on by default on all the code, with no modifications; there is already a rules file that can be used. In my experience, it is always beneficial in a long run, as in one less problem to think about.
ret = __nvvm_suld_1d_v4i16_clamp_s(imageHandle, coord); | ||
short2 b; | ||
b.x = ret.x; | ||
b.y = ret.y; | ||
return b; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it could be done, however it seems to be a common pattern across the codebase (builtins) if we introduced it in here I'd be tempted to say we should somehow generalise it and roll out everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl/include/** changes lgtm.
@intel/llvm-reviewers-runtime Looks like the original authors have no permission to respond to feedback.
This seems like it might make contribution difficult. Is this expected? In any case, could you please grant the co-authors permission to comment? Co-Authors: @Seanst98, @isaacault, @przemektmalon, @DBDuncan, @ProGTX |
I've submitted a request for that permission to be set. |
- Formatted non-macro code in image.cl - The COMMA macro has been mostly removed in favour of using __VA_ARGS__ - Mipmap macros have been modified, retaining some commas, for better readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Please note that this PR is now implementing the proposal at #9842 , which is now in revision 4. Changes from revision 3 to 4 have no impact on this PR, though. |
The CI failures seem unrelated to our changes, what is the process here, do we need to wait and rebase until they're fixed in the |
@przemektmalon | @Seanst98 - Could you please push a merge commit to rerun CI? |
@steffenlarsen Merge commit is pushed and CI passes now |
Replaces #8307 Required for #9808 and #10112 . Addressed some of the feedback on revision 3 as revision 4. Any larger changes will need to move to subsequent revisions. --------- Co-authored-by: Przemek Malon <przemek.malon@codeplay.com> Co-authored-by: Isaac Ault <isaac.ault@codeplay.com> Co-authored-by: Sean Stirling <sean.stirling@codeplay.com> Co-authored-by: Duncan Brawley <duncan.brawley@codeplay.com>
…CL bindless images extension (#10500) # Experimental Implementation of SYCL Bindless Images Extension This commit stands as the fourth, and final, commit of four to make code review easier, covering the additional tests for bindless images to the e2e test suite. Implementing [revision 4 of the bindless images extension proposal](#9842). This will not compile or run until [PR3](#10454) has been merged. However, it can be reviewed simultaneously with PR3. ## Overview The bindless images extension provides a new interface for allocating, creating, and accessing images in SYCL. Image memory allocation is seperated from image handle creation, and image handles can be passed to kernels without requesting access through accessors. This approach provides much more flexibility to the user, as well as enabling programs to implement features that were impossible to implement using standard SYCL images, such as a texture atlas. In addition to providing a new interface for images, this extension also provides initial experimental support for importing external memory into SYCL. ## Previous PRs * [1/4] [libclc](#9808) * [2/4] [PI/UR](#10112) * [3/4] [SYCL API](#10454) * [4/4] Tests <--- This one ## Authors Co-authored-by: Isaac Ault isaac.ault@codeplay.com Co-authored-by: Hugh Bird hugh.bird@codeplay.com Co-authored-by: Duncan Brawley duncan.brawley@codeplay.com Co-authored-by: Przemek Malon przemek.malon@codeplay.com Co-authored-by: Chedy Najjar chedy.najjar@codeplay.com Co-authored-by: Sean Stirling sean.stirling@codeplay.com Co-authored-by: Peter Zuzek peter@codeplay.com Co-authored-by: SYCL Unbound Team <sycl.unbound@codeplay.com>
…CL bindless images extension (intel#10500) # Experimental Implementation of SYCL Bindless Images Extension This commit stands as the fourth, and final, commit of four to make code review easier, covering the additional tests for bindless images to the e2e test suite. Implementing [revision 4 of the bindless images extension proposal](intel#9842). This will not compile or run until [PR3](intel#10454) has been merged. However, it can be reviewed simultaneously with PR3. ## Overview The bindless images extension provides a new interface for allocating, creating, and accessing images in SYCL. Image memory allocation is seperated from image handle creation, and image handles can be passed to kernels without requesting access through accessors. This approach provides much more flexibility to the user, as well as enabling programs to implement features that were impossible to implement using standard SYCL images, such as a texture atlas. In addition to providing a new interface for images, this extension also provides initial experimental support for importing external memory into SYCL. ## Previous PRs * [1/4] [libclc](intel#9808) * [2/4] [PI/UR](intel#10112) * [3/4] [SYCL API](intel#10454) * [4/4] Tests <--- This one ## Authors Co-authored-by: Isaac Ault isaac.ault@codeplay.com Co-authored-by: Hugh Bird hugh.bird@codeplay.com Co-authored-by: Duncan Brawley duncan.brawley@codeplay.com Co-authored-by: Przemek Malon przemek.malon@codeplay.com Co-authored-by: Chedy Najjar chedy.najjar@codeplay.com Co-authored-by: Sean Stirling sean.stirling@codeplay.com Co-authored-by: Peter Zuzek peter@codeplay.com Co-authored-by: SYCL Unbound Team <sycl.unbound@codeplay.com>
This PR stands as the first PR of four to make code review easier, covering the changes made to libclc.
See the original whole PR.
Co-authored-by: Isaac Ault isaac.ault@codeplay.com
Co-authored-by: Hugh Bird hugh.bird@codeplay.com
Co-authored-by: Duncan Brawley duncan.brawley@codeplay.com
Co-authored-by: Przemek Malon przemek.malon@codeplay.com
Co-authored-by: Chedy Najjar chedy.najjar@codeplay.com
Co-authored-by: Sean Stirling sean.stirling@codeplay.com
Co-authored-by: Peter Zuzek peter@codeplay.com
This is the first part of the implementation of revision 3 of the bindless images extension proposal.