Conversation
alamb
left a comment
There was a problem hiding this comment.
Thank you @kumarlokesh -- this looks really nice
To make reviewing this PR easier, is there any way you can break it into multiple smaller PRs? Specifically,
- Improve the comments
- Support for Map
- Support for Utf8View?
Also it is not clear to me how much of a conflict this PR would have with:
arrow-avro/src/codec.rs
Outdated
There was a problem hiding this comment.
This is really nice -- thank you
arrow-avro/src/codec.rs
Outdated
There was a problem hiding this comment.
this seems to be a new feature (not just string view, but also map support)
Have you had a chance to review this PR?
arrow-avro/src/compression.rs
Outdated
There was a problem hiding this comment.
Thank you for these comments
91a31ce to
78f8c5b
Compare
78f8c5b to
13d05d5
Compare
13d05d5 to
ae4fe0b
Compare
|
@alamb updated this PR to remove In terms of scope of work between between #7434 and #6965, don't see a major overlap between what's intended. Therefore, I think the code conflict between the two should be minimal, and fairly trivial to resolve.
|
alamb
left a comment
There was a problem hiding this comment.
Thank you @kumarlokesh -- I think this looks really nicely documented and tested.
I had a few API questions / suggestions but I don't think they are necessary.
It seems to me that this PR does not change any APIs (only adds new ones) so I think we could merge it before the next major release. What do you think?
Thanks again for pushing this along and breaking it up into smaller chunks
| } | ||
|
|
||
| #[test] | ||
| fn test_string_with_utf8view_enabled() { |
arrow-avro/src/lib.rs
Outdated
| /// | ||
| /// Provides types for parsing and representing Avro schema definitions. | ||
| mod schema; | ||
| pub mod schema; |
There was a problem hiding this comment.
is it needed to make these module public?
arrow-avro/src/reader/record.rs
Outdated
| /// * `use_utf8view` - If true, use StringViewArray instead of StringArray for string data | ||
| pub fn try_new_with_options( | ||
| data_type: &AvroDataType, | ||
| use_utf8view: bool, |
There was a problem hiding this comment.
Shall we use ReadOptions here rather than the single field? That would make updating these easier in the future
arrow-avro/src/reader/mod.rs
Outdated
| /// into Arrow's StringViewArray instead of the standard StringArray. | ||
| /// | ||
| /// Default: false | ||
| pub use_utf8view: bool, |
There was a problem hiding this comment.
I think it would be nice if this field was not pub -- that way we could add new options to ReadOptions in the future without it being a breaking API change
|
Marking as next-major-release as a precaution to avoid merging until we convince ourselves this PR has no breaking changes |
@alamb thanks for the review! Yes, I think this PR could be merged before the next major release. Addressed review comments here cf50b5b |
|
|
||
| let mut file = temp_file.reopen()?; | ||
|
|
||
| file.write_all(b"AVRO")?; |
There was a problem hiding this comment.
I wondered why this benchmark isn't using a avro writer and then I realized it is because there is no writer in avro-arrow 👍
| Binary, | ||
| /// String data represented as UTF-8 encoded bytes, corresponding to Arrow's StringArray | ||
| Utf8, | ||
| /// String data represented as UTF-8 encoded bytes with an optimized view representation, |
There was a problem hiding this comment.
This is not pub it seems: https://docs.rs/arrow-avro/latest/arrow_avro/?search=Codec
Though now I look it seems there is nothing that is pub: https://docs.rs/arrow-avro/latest/arrow_avro/all.html 🤔
|
Let's keep thing moving and merge this PR. @tustvold do you remember what the plan / status of the |
|
Thanks again @kumarlokesh |
Which issue does this PR close?
Closes #7262.
Rationale for this change
What changes are included in this PR?
This PR adds support for
Utf8Viewto the arrow-avro reader, enabling the creation ofStringViewArrayinstead of regularStringArrayfor string data.with_utf8viewmethod for codec conversionDecoderenum with a StringView variant to handle StringViewArray creationuse_utf8viewtoReadOptionsstructBenchmark output:
From the above benchmark results, we can draw following conclusions:
Are there any user-facing changes?