-
Notifications
You must be signed in to change notification settings - Fork 941
Support round trip reading / writing Arrow Duration
type to parquet
#7482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI Failure should be fixed by @crepererum 's PR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Liyixin95 -- this looks really nice. I poked around and actually found that there were some pre-existing tests ❤️
I also tested this PR manually using the duration.parquet
file from apache/parquet-testing#80 and the following code:
Details
fn main() {
let file = File::open("/Users/andrewlamb/Downloads/duration.parquet").unwrap();
let reader = ParquetRecordBatchReaderBuilder::try_new(file)
.unwrap()
.build()
.unwrap();
for batch in reader {
let batch = batch.unwrap();
println!("Read batch schema: {:?}", batch.schema());
println!("Read batch num rows: {:?}", batch.num_rows());
println!("Read batch pretty:\n{}", pretty_format_batches(&[batch]).unwrap());
}
}
Before this PR:
Field { name: "a", data_type: Int64
:
warning: `parquet_test` (bin "parquet_test") generated 1 warning
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s
Running `target/debug/parquet_test`
Read batch schema: Schema { fields: [Field { name: "a", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }
Read batch num rows: 100
Read batch pretty:
+-------------+
| a |
+-------------+
| 86400000000 |
...
After this PR (the schema shows duration):
Field { name: "a", data_type: Duration(Microsecond),
Read batch schema: Schema { fields: [Field { name: "a", data_type: Duration(Microsecond), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }
Read batch num rows: 100
Read batch pretty:
+----------+
| a |
+----------+
| PT86400S |
@@ -2471,25 +2471,21 @@ mod tests { | |||
} | |||
|
|||
#[test] | |||
#[should_panic(expected = "Converting Duration to parquet not supported")] | |||
fn duration_second_single_column() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found there were already existing tests for the feature ❤️ (thanks @carols10cents )
Duration
type to parquet
Duration
type to parquetDuration
type to parquet
Oh, Thanks. I didn't notice there were some pre-existing tests. I just create the pr and go to sleep. |
I didn't realize there were any tests either until I checked out the branch locally and they failed for me 😆 |
continue from previous pr. @alamb