Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples for arrow crate to readme and library documentation #131

Merged
merged 4 commits into from
Feb 6, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 5, 2024

First of all, thank you @chmp for this crate I think it is super powerful (and not well understood) how to read/write Rust structs to/from parquet and/or arrow, and this crate offers a neat way to do it.

Rationale

While discussing some changes in the arrow JSON writer apache/arrow-rs#5364 I think one of the usecases may be better served by this crate. I would like to be able to refer people to this crate if they want to convert their structs to / from arrow/json/parquet but in my opinion it not super obvious by the documentation how to use this with arrow as all the examples are in terms of the (now unmaintained) arrow2 crate.

Changes

Add examples to README.md and lib.rs that show how to use the APIs in this crate to write structs to arrow/parquet

serde_arrow/src/lib.rs Outdated Show resolved Hide resolved
Readme.md Show resolved Hide resolved
Readme.md Show resolved Hide resolved
Readme.md Outdated Show resolved Hide resolved
@chmp
Copy link
Owner

chmp commented Feb 5, 2024

@alamb Thanks a lot for the kind words.

As I only have the outside perspective: what is the situation with arrow2 crate? If it is really defunct, maybe it makes sense to complete remove the prominent usage docs in the readme and in the docs index page? In principle everything is documented via the API docs of the respective functions. So I don't feel it's strictly necessary to include top-level docs for the arrow2 crate, if it is essentially deprecated.

Re suggestion usage of serde_arrow: for me one the last remaining obstacles are the non-additive features. This issue makes it quite hard to build other libraries on top of serde_arrow.

@alamb
Copy link
Contributor Author

alamb commented Feb 6, 2024

@alamb Thanks a lot for the kind words.

As I only have the outside perspective: what is the situation with arrow2 crate? If it is really defunct, maybe it makes sense to complete remove the prominent usage docs in the readme and in the docs index page?

To be clear, I am a maintainer of arrow-rs so this is only my perspective:

My assessment is that arrow2 is a great piece of technology but doesn't have anyone (individually or as a community) who is maintaining it so it has effectively gone into stasis and is slowly bit rotting -- you can read lots of backgrould / backstory on jorgecarleitao/arrow2#1429.

In principle everything is documented via the API docs of the respective functions. So I don't feel it's strictly necessary to include top-level docs for the arrow2 crate, if it is essentially deprecated.

I think you need to make that call. Given my involvement as one of the maintainers of arrow-rs I don't feel I can offer an unbiased opinion on what you should do with your crate

Re suggestion usage of serde_arrow: for me one the last remaining obstacles are the non-additive features. This issue makes it quite hard to build other libraries on top of serde_arrow.

I am not quite sure what you mean here but I think one major problem is that we have released many major releases of arrow-rs (mostly to cutdown the required maintenance burden of tracking, scheduling, testing and releasing API changes in detail). Perhaps now that development in that crate slows down we will be able to stop doing breaking API changes so frequently

Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review and consideration @chmp -- I believe I have made the changes you suggested, but if there is more to be done, I am happy to make more changes.

Readme.md Show resolved Hide resolved
serde_arrow/src/lib.rs Outdated Show resolved Hide resolved
@chmp
Copy link
Owner

chmp commented Feb 6, 2024

@alamb Thanks for the changes. Looks great!

I think you need to make that call. Given my involvement as one of the maintainers of arrow-rs I don't feel I can offer an unbiased opinion on what you should do with your crate

TBH, from the issue I kind of assumed, there was a decision to deprecate arrow2, that was just not communicated. If this is not the case, I would say let's keep everything as is and include arrow2 in the top level docs as well. Just as you suggested.

I am not quite sure what you mean here but I think one major problem is that we have released many major releases of arrow-rs (mostly to cutdown the required maintenance burden of tracking, scheduling, testing and releasing API changes in detail). Perhaps now that development in that crate slows down we will be able to stop doing breaking API changes so frequently

My concern are scenarios such as the following: you built a crate on top of serde_arrow using say the arrow48 feature and somebody else built a crate on top of serde_arrow using the arrow49 feature. As soon as someone combines both of these crates, feature unification will select both the arrow48 and arrow49 features. In this case, serde_arrow will select the arrow=49 depedency and your crate will break, in probably hard to diagnose ways. I think for library authors it should be possible to select a fixed arrow version and stick with it, regardless of the larger context of the program.

I really would like to use serde_arrow's major versions to indicate API breakage of serde_arrow while still being compatible with the user's previous choice of arrow. But maybe that's a bit overengineered.

In case you're interested, I'm thinking aloud about the different options here

@chmp chmp merged commit 31a88c7 into chmp:main Feb 6, 2024
1 check passed
@alamb alamb deleted the alamb/update_readme branch February 6, 2024 11:35
@alamb
Copy link
Contributor Author

alamb commented Feb 6, 2024

TBH, from the issue I kind of assumed, there was a decision to deprecate arrow2, that was just not communicated. If this is not the case, I would say let's keep everything as is and include arrow2 in the top level docs as well. Just as you suggested.

From my perspective, the challenge is that it is not at all clear who would make a decision to deprecate arrow2, as it has no maintainers and no one seems willing to put the time in to chart a path forward. I think there were several lessons I learned as part of the process.

I have added a note to this effect on jorgecarleitao/arrow2#1429 (comment)

I really would like to use serde_arrow's major versions to indicate API breakage of serde_arrow while still being compatible with the user's previous choice of arrow. But maybe that's a bit overengineered.

Perhaps this is a solution to a problem that would not be as great if the arrow API was more stable. I filed apache/arrow-rs#5368 to discuss this further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants