Skip to content

Conversation

@bkietz
Copy link
Member

@bkietz bkietz commented Feb 6, 2019

This reduces code duplication.

@bkietz bkietz changed the title Arrow 4341 primitive builders use bufferbuilder ARROW-4341: [C++] primitive builders use bufferbuilder Feb 6, 2019
@bkietz
Copy link
Member Author

bkietz commented Feb 6, 2019

Hmmm, this is a performance regression.

Before:

BM_BuildPrimitiveArrayNoNulls/repeats:2_mean                 365772 us     365779 us          1     2.291GB/s
BM_BuildVectorNoNulls/repeats:2_mean                         555089 us     555097 us          1   922.362MB/s
BM_BuildBooleanArrayNoNulls/repeats:2_mean                   121138 us     121139 us          6   4.12748GB/s
BM_BuildAdaptiveIntNoNulls/repeats:2_mean                     13406 us      13406 us         52   4.66192GB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3_mean         28445 us      28445 us         25   2.19714GB/s
BM_BuildAdaptiveUIntNoNulls/repeats:2_mean                    26330 us      26331 us         27   4.74726GB/s
BM_BuildAdaptiveUIntNoNullsScalarAppend/repeats:2_mean        27448 us      27448 us         25   2.27696GB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_mean               136 us        136 us       5063   561.917MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_mean           125 us        125 us       5606   611.482MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_mean              126 us        126 us       5546   605.348MB/s
BM_BuildStringDictionaryArray/repeats:2_mean                    317 us        317 us       2209   315.468MB/s

After:

BM_BuildPrimitiveArrayNoNulls/repeats:2_mean                 417623 us     417631 us          1   1.75068GB/s
BM_BuildVectorNoNulls/repeats:2_mean                         555594 us     555601 us          1   921.525MB/s
BM_BuildBooleanArrayNoNulls/repeats:2_mean                   730000 us     730006 us          1   701.364MB/s
BM_BuildAdaptiveIntNoNulls/repeats:2_mean                     13394 us      13394 us         52   4.66611GB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3_mean         28451 us      28451 us         25    2.1967GB/s
BM_BuildAdaptiveUIntNoNulls/repeats:2_mean                    26312 us      26313 us         27   4.75047GB/s
BM_BuildAdaptiveUIntNoNullsScalarAppend/repeats:2_mean        27457 us      27457 us         25    2.2762GB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_mean               139 us        139 us       5040   549.399MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_mean           126 us        126 us       5570   606.613MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_mean              129 us        129 us       5406   589.165MB/s
BM_BuildStringDictionaryArray/repeats:2_mean                    319 us        319 us       2193   313.216MB/s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fishy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fishy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot more CPU instructions than what it was doing before AFAICT

@xhochy xhochy force-pushed the ARROW-4341-primitive-builders-use-bufferbuilder branch from 0cfb57c to 8e2063a Compare February 8, 2019 21:33
@xhochy
Copy link
Member

xhochy commented Feb 8, 2019

Rebased.

@bkietz bkietz force-pushed the ARROW-4341-primitive-builders-use-bufferbuilder branch from 8e2063a to e8cbe8d Compare February 11, 2019 21:44
@wesm
Copy link
Member

wesm commented Feb 14, 2019

BooleanBuilder seems to be more than 4 times slower with this patch

after

BM_BuildBooleanArrayNoNulls/repeats:2                        789455 us     789468 us          1   648.538MB/s

before

BM_BuildBooleanArrayNoNulls/repeats:2                        172299 us     172302 us          4   2.90189GB/s

Based on this evidence the iterator based unrolled bit generator vectorizes poorly. I just pushed changes be8442f reverting some of these changes (but still using TypedBufferBuilder<bool>, so better than before), as well as an option to disable tracking the false count, which we don't care about. For the record, disabling the false count only improved the benchmark from 650 MB/s to 1 GB/s, far less than the original 3GB/s. The compiler just isn't generating very good code for the CPU

If these changes are acceptable I suggest that we merge this patch (once the build is passing) and leave this be for now.

@wesm
Copy link
Member

wesm commented Feb 14, 2019

With my changes the perf seems slightly improved from the baseline

BM_BuildBooleanArrayNoNulls/repeats:2                        163077 us     162609 us          4   3.07486GB/s

@wesm
Copy link
Member

wesm commented Feb 14, 2019

It seems that there are symbol visibility problems on Windows caused by the PrimitiveBuilder / NumericBuilder collapse, and valgrind problems additionally. I would say to limit how much additional time (say 2 hours or less) you spend on this patch as there's a lot of feature work and bug fixes in the 0.13 backlog.

@wesm
Copy link
Member

wesm commented Feb 14, 2019

I moved the entire NumericBuilder<T> implementation to the header file. I think this should make the linker problems go away. I don't expect an application to have a ton of instantiations of these templates so I don't know that it's worth stressing about code size for these until it can be demonstrated to be a problem. The size of arrow-array-test in a release build increased by about 3%.

@wesm wesm force-pushed the ARROW-4341-primitive-builders-use-bufferbuilder branch from 3d1ea03 to daf5244 Compare February 15, 2019 06:33
@wesm
Copy link
Member

wesm commented Feb 15, 2019

I reverted some changes and think I fixed the valgrind / Windows / Python failures

Change-Id: I9f24a526ffb0c602a879c19cd7cd5f45aa32c251
@wesm
Copy link
Member

wesm commented Feb 15, 2019

Seems there are some correctness issues https://travis-ci.org/apache/arrow/jobs/493616072. @bkietz can you look at the Ruby failure? We should try to reproduce the failure as a C++ unit test

@bkietz
Copy link
Member Author

bkietz commented Feb 15, 2019

@wesm I'll try to write a unit test which produces the same error

- resize correctly on Finish()
- added C++ test
Status Finish(std::shared_ptr<Buffer>* out, bool shrink_to_fit = true) {
// set bytes_builder_.size_ == byte size of data
bytes_builder_.UnsafeAdvance(BitUtil::BytesForBits(bit_length_) -
bytes_builder_.length());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these bytes need to be zeroed?

Copy link
Member Author

@bkietz bkietz Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypedBufferBuilder<bool> doesn't utilize bytes_builder_'s append methods, which means the size of bytes_builder_ doesn't change as bits are appended.

I'm thinking it's not worthwhile to use BufferBuilder in this class, I'm going to refactor and instead just manage a buffer directly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, please do that refactoring in a follow up PR and not this one =)

@bkietz
Copy link
Member Author

bkietz commented Feb 15, 2019

I can't reproduce https://travis-ci.org/apache/arrow/jobs/493915377#L2632 locally

@wesm
Copy link
Member

wesm commented Feb 15, 2019

OK, I can take it from here then

@wesm
Copy link
Member

wesm commented Feb 16, 2019

Fixed the Ruby test failures and added a C++ unit test that failed.

@xhochy I removed the valid_bytes argument from AppendNulls

https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/builder_primitive.h#L55

If anyone was using this argument, it was not doing anything useful, unless the goal is to append zeroes. I'm not sure if this needs to go through a deprecation cycle. In our codebase it was only being used for internal purposes

@wesm wesm changed the title ARROW-4341: [C++] primitive builders use bufferbuilder ARROW-4341: [C++] Refactor Primitive builders and BooleanBuilder to use TypedBufferBuilder<T> Feb 16, 2019
@wesm
Copy link
Member

wesm commented Feb 16, 2019

+1

@wesm wesm closed this in bbca717 Feb 16, 2019
@bkietz bkietz deleted the ARROW-4341-primitive-builders-use-bufferbuilder branch February 22, 2019 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants