-
Notifications
You must be signed in to change notification settings - Fork 30
Development milestone 0.14.6dev4 #1354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This improves performance 8x-fold: ``` In [1]: import dpctl.tensor as dpt In [2]: x = dpt.ones((4096, 4096), dtype="f4") In [3]: y = dpt.sum(x, axis=0) In [4]: %time y = dpt.sum(x, axis=0) CPU times: user 2.64 ms, sys: 4.4 ms, total: 7.04 ms Wall time: 10 ms In [5]: %time y = dpt.sum(x, axis=0) CPU times: user 1.93 ms, sys: 3.22 ms, total: 5.16 ms Wall time: 4.74 ms In [6]: %time y = dpt.sum(x, axis=0) CPU times: user 1.7 ms, sys: 2.83 ms, total: 4.53 ms Wall time: 4.1 ms In [7]: %time y = dpt.sum(x, axis=0) CPU times: user 1.98 ms, sys: 3.3 ms, total: 5.28 ms Wall time: 4.7 ms ``` The timing before was around 38ms
- Adjusted to reduce branching and hopefully improve vectorization of the loop by removing a conditional
1. Removed unused usm_ndarray._clone static C-only method 2. Removed _dispatch* utilities 3. Used direct calls to unary/binary operators in implementation of special methods
Provide cabs private method implementating abs for complex types, paying attention to array-API mandated special values. To work-around gh-1279, use std::hypot to compute value for finite inputs. Compile with -DUSE_STD_ABS_FOR_COMPLEX_TYPES to use std::abs(z) instead of std::hypot(std::real(z), std::imag(z)).
This change provides private method csqrt to evaluate square-root for complex types. It handles special values as mandated by array API. The finite input, it provides its own implementation based on std::hypot and std::sqrt for real types instead of calling std::sqrt on finite input of complex type. Compile with -DUSE_STD_SQRT_FOR_COMPLEX_TYPES to use std::sqrt instead of custom implementation. Cursory performance study suggests that custom implementation is at least not worse than std::sqrt one.
This utility function is based on symmetric check, unlike numpy.allclose, and verifies that abs(x1-x2) < atol + rtol * max(abs(x1), abs(x2)) This way allclose(x1, x2) is symmetric, and allclose(x1,x2) implies allclose(x2, x1).
The intel/llvm/pull/10551 has been merged, so the build should succeed and produce working binary. The intel/llvm project has transitioned from sycl-nightly/YYYYMMDD tags to nightly-YYYY-MM-DD tags instead. The artifact of intel/llvm nightly build has also changed the name and the structure. Adjusting the code for that.
…bundle Use latest sycl bundle to build DPCTL
test_sycl_queue.py::test_cython_api requires a compiler to build a native extension.
Adds a simple C extension, compiled with C compiler that includes dpctl_capi header file. This mimics use dpctl_capi from numba_dpex.
Correct typo in an exception text
1. Aligned default values with those of np.allclose 2. Replaced less test with less_equal to align with NumPy.
Also added tests for early exits to improve coverage.
Test environment requires compilers
This changes builds up on gh-1265 and takes into account queue from the pre-allocated buffer, if provided.
This improves accuracy at extremes of supported range. Use sycl:: namespace ldexp and ilogb to prevent problem with VS 2017 headers.
Fix bad order=K code logic in tensor.asarray
Reworked text based per PR feedback.
Reworked text based in PR feedback
Reworked text based on PR feedback
ilogb would have to pay attention to correctly computing scale of denormal floats, while simpler code suffices. Also use unscaled version in most cases, and scaled version only for very large inputs.
We work around issues with these functions when their implementation is taken from VS 2017 headers on Windows though.
Update README for wheel installation
Fix gh-1279, implement tensor.allclose
Improvement to performance of tensor.sum
* Where result now keeps order of operands - Now when operands are cast, stride simplification can still be performed on non-C contiguous inputs - Implements _empty_like_triple_orderK to allocate output of where * Adds test for correct order="K" behavior in where * Adjusted logic in _empty_like_triple_orderK - Now calls _empty_like_pair_orderK when two arrays are of equal shape and larger than the third * Changes to order "K" stride sorting - Dimensions of size 1 are effectively disregarded in sorting * Fixed typo in _empty_like_orderK
* Binary elementwise functions can now act on any input in-place - A temporary will be allocated as necessary (i.e., when arrays overlap, are not going to be cast, and are not the same logical arrays) - Uses dedicated in-place kernels where they are implemented - Now called directly by Python operators - Removes _inplace method of BinaryElementwiseFunc class - Removes _find_inplace_dtype function * Tests for new out parameter behavior for add * Broadcasting made conditional in binary functions where memory overlap is possible - Broadcasting can change the values of strides without changing array shape * Changed exception types raised Use ExecutionPlacementError for CFD violations. Use ValueError is types of input are as expected, but values are not as expected. * Adding tests to improve coverage Removed tests expecting error raised in case of overlapping inputs. Added tests guided by coverage report. * Removed provably unreachable branches in _resolve_weak_types Since o1_dtype_kind_num > o2_dtype_kind_num, o1 can be not be weak boolean type, since it has the lowest kind number in the hierarchy. * All in-place operators now use call operator of BinaryElementwiseFunc * Removed some redundant and obsolete tests - Removed from test_floor_ceil_trunc, test_hyperbolic, test_trigonometric, and test_logaddexp - These tests would fail on GPU but never run on CPU, and therefore were not impacting the coverage - These tests focused on aspects of the BinaryElementwiseFunc class rather than the behavior of the operator --------- Co-authored-by: Oleksandr Pavlyk <oleksandr.pavlyk@intel.com>
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1354/index.html |
Array API standard conformance tests for dpctl=0.14.6dev4=py310ha25a700_33 ran successfully. |
1 similar comment
Array API standard conformance tests for dpctl=0.14.6dev4=py310ha25a700_33 ran successfully. |
This PR is developmental milestone, containing the following changes after 0.14.6dev3:
usm_ndarray
in-place arithmetic operators #1352dpctl.tensor.where
output preserves memory order of inputs #1342