Skip to content

[FEATURE] Create a columnar variant of BlockWriter/BlockSpiller #1

@avirtuos

Description

@avirtuos

Currently, BlockWriter and BlockSpiller encourage a row wise approach to writing results. These interfaces are often viewed as simpler than there would be columnar equivalents. Even though many of the systems that we've integrated with using this SDK do not themselves support columnar access patterns, there is value in offering a a variant of these mechanisms that provide the skeleton for columnar writing of results.

The current SDK versions take the approach that experts can drop into 'native' Apache Arrow mode and simply not use these abstractions. This approach of making common things easy and still enabling access to a 'power user' mode is one we'd like to stick with but we'd also like to make it easier for customers that can/want a more columnar experience to be able to do so more easily.

Some of the key goals of this new facility would be to alleviate the performance penalty associated with all the field vector lookups and type conversion object overhead that the current row wise convince facades introduce. Depending on the source system being integrated with, these changes can improve cells/second throughput between 20% - 30% in our testing. The improvement is more dramatic when there is limited parallelism / pipelining available to hide this inefficiency.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceRelating to performance

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions