-
Notifications
You must be signed in to change notification settings - Fork 333
Description
Currently, BlockWriter and BlockSpiller encourage a row wise approach to writing results. These interfaces are often viewed as simpler than there would be columnar equivalents. Even though many of the systems that we've integrated with using this SDK do not themselves support columnar access patterns, there is value in offering a a variant of these mechanisms that provide the skeleton for columnar writing of results.
The current SDK versions take the approach that experts can drop into 'native' Apache Arrow mode and simply not use these abstractions. This approach of making common things easy and still enabling access to a 'power user' mode is one we'd like to stick with but we'd also like to make it easier for customers that can/want a more columnar experience to be able to do so more easily.
Some of the key goals of this new facility would be to alleviate the performance penalty associated with all the field vector lookups and type conversion object overhead that the current row wise convince facades introduce. Depending on the source system being integrated with, these changes can improve cells/second throughput between 20% - 30% in our testing. The improvement is more dramatic when there is limited parallelism / pipelining available to hide this inefficiency.