-
Notifications
You must be signed in to change notification settings - Fork 30
dpctl.tensor.where
output preserves memory order of inputs
#1342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1342/index.html |
Array API standard conformance tests for dpctl=0.14.6dev2=py310h7bf5fec_5 ran successfully. |
- Now when operands are cast, stride simplification can still be performed on non-C contiguous inputs - Implements _empty_like_triple_orderK to allocate output of where
- Now calls _empty_like_pair_orderK when two arrays are of equal shape and larger than the third
cb138ce
to
6be4731
Compare
Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_22 ran successfully. |
- Dimensions of size 1 are effectively disregarded in sorting
Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_23 ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good to go in @ndgrigorian. I had a small nitpick, but it just a matter of style.
Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_24 ran successfully. |
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_43 ran successfully. |
This PR adjusts the behavior of
dpctl.tensor.where
to preserve the memory layout of its inputs. This improves the access pattern of the kernel.Performance of old behavior:
New behavior:
which is an improvement of ~4x in certain cases.
_empty_like_triple_orderK
is introduced for this purpose and a test was added to ensure that the output strides are as expected.