-
Notifications
You must be signed in to change notification settings - Fork 3.9k
ARROW-3515: [C++] Introduce NumericTensor class #2759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8b7241e to
bcab9fd
Compare
Codecov Report
@@ Coverage Diff @@
## master #2759 +/- ##
=========================================
+ Coverage 87.55% 88.56% +1%
=========================================
Files 403 342 -61
Lines 62113 58379 -3734
=========================================
- Hits 54386 51705 -2681
+ Misses 7657 6674 -983
+ Partials 70 0 -70
Continue to review full report at Codecov.
|
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a couple nits.
cpp/src/arrow/tensor.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strides can be negative here or should you add "non-negative" as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. I'll also add "non-negative" also in the same comment in Tensor's definition because I copied it.
cpp/src/arrow/tensor.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why you're special-casing index.size() - 1 here. You could have:
for (size_t i = 0; i < index.size(); ++i) {
offset = index[i] + offset * shape_[i];
}
offset *= element_size;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching my bad. I think it may be a trace of implementation trials.
cpp/src/arrow/tensor.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On some CPU architectures, this assumes the data is naturally aligned. It probably doesn't matter for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to tackle this is to make CalculateValueOffset return an offset in units of TYPE instead of bytes. It'll also make it easier to work with non-fixed-bytes in the future (also possibly correcting the problem with BOOL).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this function to return an offset in units of TYPE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo the change (sorry). The strides are in bytes units, so the offset has to be calculated in bytes units. When you're dividing the bytes offset by itemsize, you have no guarantee that there is no remainder (though that will be the common case).
In any case, the change didn't fix the issue, as raw_data() could be misaligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was used for temporary debugging? You should remove it now :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I should. I removed it.
6c7731d to
3df4cb1
Compare
cpp/src/arrow/tensor.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to tackle this is to make CalculateValueOffset return an offset in units of TYPE instead of bytes. It'll also make it easier to work with non-fixed-bytes in the future (also possibly correcting the problem with BOOL).
cpp/src/arrow/tensor.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to fail when TYPE = BOOL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But Tensor class currently doesn't support TYPE = BOOL.
I think it's great if Tensor class supports BOOL element, and I want to try to implement it after SparseTensor.
de070c5 to
5f3f1cb
Compare
cpp/src/arrow/tensor.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coverage result tells me a Tensor always fills its strides_, so I can remove this else clause.
5f3f1cb to
7afbca5
Compare
This commit defines the new NumericTensor<T> class as a subclass of Tensor class. NumericTensor<T> extends Tensor class by adding a member function to access element values in a tensor.
Tensor's strides_ is always filled.
16baa39 to
37f0bb4
Compare
|
Travis CI was failed but the cause of the failure may be GitHub's authentication error. |
|
@pitrou I think this pull-request was completed. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
This commit defines the new NumericTensor class as a subclass
of Tensor class. NumericTensor extends Tensor class by adding
a member function to access element values in a tensor.
I want to use this new feature for writing tests of SparseTensor in #2546.