Skip to content

Discussion: relationship / unification of arrow-rs and arrow2 going forward #1176

@alamb

Description

@alamb

TLDR: please comment on this ticket if you have opinions about if and/or how the community should unite its efforts on a single Rust implementation of Apache Arrow.

Related mailing list thread: https://lists.apache.org/thread/dsyks2ylbonhs8ngnx6529dzfyfdjjzo

There is active discussion and a PR apache/datafusion#1556 about switching the DataFusion project to use the arrow2 Rust implementation of Arrow from @jorgecarleitao. While this DataFusion PR is not yet ready to merge, if DataFusion were to switch to arrow2, that leaves a question of what will happen with this (arrow-rs) code.

Since many of the PRs, contributors and maintainers of this (arrow-rs) crate are part of the DataFusion community, I believe if DataFusion switches to arrow2, much of the maintenance and extension efforts would follow arrow2

arrow2is largely developed by @jorgecarleitao, who is an Apache Arrow PMC member and committer, but the project itself has not been under the Apache Software Foundation’s governance. Additional background can be found on the mailing list archives and past mailing list threads such as this and this

It is my opinion that the Rust / Arrow / DataFusion community has general consensus on:

  1. Having one implementation of Arrow in Rust where we can focus would be better than 2 which split attention and resources
  2. The technical underpinnings of arrow2 are more ergonomic

It is not clear to me if there is a consensus on:

  1. How important the Apache Governance model is (please lend your opinions here!)
  2. How important the stability of APIs / the specific versioning scheme (0.x vs 1.x or later)

Possible ideas for a way forward:

  1. Switch datafusion to arrow2, making no changes to arrow-rs. It could be maintained by anyone who wished to contribute,
  2. Bring arrow2 code into the arrow-rs repo, with appropriate IP clearance and adopt that as the officially maintained arrow implementation (*)
  3. Start more actively porting the more ergonomic parts of arrow2 into arrow-rs to reduce the feature gap as suggested in Discussion: Switch DataFusion to using arrow2? datafusion#1532 (comment) by @tustvold
  4. Others?

Option 2 leaves open the question of “how does arrow2 development move forward” – where would patches be sent, for example? I would hope we can find a way that is compatible with Apache governance, but I don't think we have a specific proposal yet, and it also depends in large part on what @jorgecarleitao is comfortable with

So, for any users of this crate not also in the DataFusion community, what are your hopes / needs / plans from this crate? How important is the apache governance to you? Please tell us your thoughts!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions