You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
We have some encoders implemented in parquet2here. arrow2 expose them to some datatypes, as encoding is an argument of the write APIs like you pointed to.
I have not implemented the remaining primarily because it has been a bit difficult for me to find parquet readers that support them, making it difficult to prove interoperability. For example,
pyarrow: does not support DeltaLengthByteArray yet ARROW-13388,
pyarrow: reading dictionary-encoded has been challenging ARROW-13487
spark: only the non-vectorized reader supports DeltaLengthByteArray (see here and this thread on parquet's mailing list)
It is also a bit difficult to me to reproduce parquet's current behavior because the parquet crate has no integration tests against e.g. pyarrow or spark. I.e. we have to trust that it is well implemented and that consumers can read from it.
Since parquet is a storage format and not being able to read stored data is not a pleasant experience, I am defensive and require integration tests against at least a well known consumer before exposing the encoding for writing.
Do you have an encoding (parquet version, physical type, encoding) that you have in mind that we should support?
Regarding encodings, I suggest arrow2 can have more encodings. When using arrow2 to write and read by yourself, you can achieve better compression encoding effects.The user chooses by himself.
At present, all datatypes are plain
For example, deltabitpackencoder: in arrow rs
https://github.com/apache/arrow-rs/blob/master/parquet/src/encodings/encoding.rs#L624
arrow2 :
arrow2/src/io/parquet/write/mod.rs
Line 93 in bfb4910
Is there a plan to implement the complete encoding function?
The text was updated successfully, but these errors were encountered: