-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-33984: [C++][Python] DLPack implementation for Arrow Arrays (producer) #38472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @rok: this is an initial draft for the dlpack support. Will still need to create a separate method for fixed shape tensor array plus add more tests (inspecting the produced PyCapsule with cffi). |
|
TODO: C++ tests, tests with arrays with offsets |
cpp/src/arrow/dlpack.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if someone hands an array backed by CudaBuffer buffers?
I think we need to validate these are CPU buffers, or better yet support the cases where we have non-CPU buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense yes. We planned to support only CPU device at first. But I should not hardcode it like I did but validate they they are in fact CPU buffers and raise if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check for CPU device: 6c886fd
It would be easy to add support for other cases in a follow-up I think. Not sure about the tests though =)
|
@pitrou thank you so much for the reviews! I addressed all your comments so far. |
| // Create ManagerCtx with the reference to | ||
| // the data of the array | ||
| std::shared_ptr<ArrayData> array_ref = arr->data(); | ||
| std::unique_ptr<ManagerCtx> ctx(new ManagerCtx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I think it's best practice to use std::make_unique instead of wrapping new call. Any reason we can't use make_unique here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change to make_unique. @pitrou any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we have the explicit delete as well, I personally like seeing the new+delete combo explicitly, even though make_unique is exactly equivalent AFAIU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless that delete call can then be replaced with a unique_ptr reset() ?
|
@kkraus14 I am double-checking if I am correct in assuming DLpack only supports byte-packed booleans. Or are bit-packed booleans also supported but not really used? I am not sure what came out of the discussion in dmlc/dlpack#75 and I am finding the docs unclear: Will be happy to hear your thoughts on this. |
|
@pitrou would you have some time to run through this PR again? |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after a couple minor cleanups. Thank you @AlenkaF !
|
@github-actions crossbow submit -g cpp |
|
@github-actions crossbow submit cuda |
|
Revision: 49a978f Submitted crossbow builds: ursacomputing/crossbow @ actions-bbd691007d |
|
Revision: 49a978f Submitted crossbow builds: ursacomputing/crossbow @ actions-b8525f4884
|
|
Thank you @pitrou for all the suggestions and help! Listing possible follow-up issues: |
|
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 6c326db. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 10 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…(producer) (apache#38472) ### Rationale for this change DLPack is selected for Array API protocol so it is important to have it implemented for Arrow/PyArrow Arrays also. This is possible for primitive type arrays (int, uint and float) with no validity buffer. Device support is not in scope of this PR (CPU only). ### What changes are included in this PR? - `ExportArray` and `ExportDevice` methods on Arrow C++ Arrays - `__dlpack__` method on the base PyArrow Array class exposing `ExportArray` method - `__dlpack_device__` method on the base PyArrow Array class exposing `ExportDevice` method ### Are these changes tested? Yes, tests are added to `dlpack_test.cc` and `test_array.py`. ### Are there any user-facing changes? No. * Closes: apache#33984 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
DLPack is selected for Array API protocol so it is important to have it implemented for Arrow/PyArrow Arrays also. This is possible for primitive type arrays (int, uint and float) with no validity buffer. Device support is not in scope of this PR (CPU only).
What changes are included in this PR?
ExportArrayandExportDevicemethods on Arrow C++ Arrays__dlpack__method on the base PyArrow Array class exposingExportArraymethod__dlpack_device__method on the base PyArrow Array class exposingExportDevicemethodAre these changes tested?
Yes, tests are added to
dlpack_test.ccandtest_array.py.Are there any user-facing changes?
No.