-
Couldn't load subscription status.
- Fork 3.9k
ARROW-4341: [C++] Refactor Primitive builders and BooleanBuilder to use TypedBufferBuilder<T> #3575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-4341: [C++] Refactor Primitive builders and BooleanBuilder to use TypedBufferBuilder<T> #3575
Conversation
|
Hmmm, this is a performance regression. Before: After: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fishy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also fishy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
cpp/src/arrow/buffer-builder.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot more CPU instructions than what it was doing before AFAICT
0cfb57c to
8e2063a
Compare
|
Rebased. |
8e2063a to
e8cbe8d
Compare
|
BooleanBuilder seems to be more than 4 times slower with this patch after before Based on this evidence the iterator based unrolled bit generator vectorizes poorly. I just pushed changes be8442f reverting some of these changes (but still using If these changes are acceptable I suggest that we merge this patch (once the build is passing) and leave this be for now. |
|
With my changes the perf seems slightly improved from the baseline |
|
It seems that there are symbol visibility problems on Windows caused by the PrimitiveBuilder / NumericBuilder collapse, and valgrind problems additionally. I would say to limit how much additional time (say 2 hours or less) you spend on this patch as there's a lot of feature work and bug fixes in the 0.13 backlog. |
|
I moved the entire |
only zero padding on Finish()
…track falses. Restore faster code from before this patch for appending C arrays and vector<bool> Change-Id: I01d2d8e87db520a5c38f892a58e86d7a645e8877
…ity concerns Change-Id: I5cddcfde0880f5a1af4cc10a3740a186064e85ea
Change-Id: I749eab94c580775a9faf48b6baa0dabf5dc39111
Change-Id: I6fbf60477bd95d828ff14c54a124c0d079065bfb
Change-Id: Ic32f48c3cd5c5d34494a2e515848e5249c1ca56f
3d1ea03 to
daf5244
Compare
|
I reverted some changes and think I fixed the valgrind / Windows / Python failures |
Change-Id: I9f24a526ffb0c602a879c19cd7cd5f45aa32c251
|
Seems there are some correctness issues https://travis-ci.org/apache/arrow/jobs/493616072. @bkietz can you look at the Ruby failure? We should try to reproduce the failure as a C++ unit test |
|
@wesm I'll try to write a unit test which produces the same error |
- resize correctly on Finish() - added C++ test
| Status Finish(std::shared_ptr<Buffer>* out, bool shrink_to_fit = true) { | ||
| // set bytes_builder_.size_ == byte size of data | ||
| bytes_builder_.UnsafeAdvance(BitUtil::BytesForBits(bit_length_) - | ||
| bytes_builder_.length()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these bytes need to be zeroed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypedBufferBuilder<bool> doesn't utilize bytes_builder_'s append methods, which means the size of bytes_builder_ doesn't change as bits are appended.
I'm thinking it's not worthwhile to use BufferBuilder in this class, I'm going to refactor and instead just manage a buffer directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, please do that refactoring in a follow up PR and not this one =)
|
I can't reproduce https://travis-ci.org/apache/arrow/jobs/493915377#L2632 locally |
|
OK, I can take it from here then |
|
Fixed the Ruby test failures and added a C++ unit test that failed. @xhochy I removed the https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/builder_primitive.h#L55 If anyone was using this argument, it was not doing anything useful, unless the goal is to append zeroes. I'm not sure if this needs to go through a deprecation cycle. In our codebase it was only being used for internal purposes |
|
+1 |
This reduces code duplication.