Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Upgrade arrow/parquet to 53.1.0 #12724

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Oct 2, 2024

Which issue does this PR close?

Related to apache/arrow-rs#6340

Closes #3174

Rationale for this change

Keep up with dependencies

I am creating this PR in advance as a way to test the arrow release candidate as well

What changes are included in this PR?

  1. Upgrade arrow/parquet to 53.1.0
  2. Switch to non-deprecated functions

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added functions sqllogictest SQL Logic Tests (.slt) physical-expr Physical Expressions core Core DataFusion crate labels Oct 2, 2024
&suffix[metadata_start - footer_start..suffix_len - 8],
)?)
}
ParquetMetaDataReader::new()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can finally have nice things -- this code is way simpler now thanks to all @etseidl's work upstream on apache/arrow-rs#6447

so-beautiful

@@ -61,7 +61,7 @@ logical_plan
physical_plan
01)CoalesceBatchesExec: target_batch_size=8192
02)--FilterExec: column1@0 != 42
03)----ParquetExec: file_groups={4 groups: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:0..87], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:87..174], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:174..261], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:261..347]]}, projection=[column1], predicate=column1@0 != 42, pruning_predicate=CASE WHEN column1_null_count@2 = column1_row_count@3 THEN false ELSE column1_min@0 != 42 OR 42 != column1_max@1 END, required_guarantees=[column1 not in (42)]
03)----ParquetExec: file_groups={4 groups: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:0..88], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:88..176], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:176..264], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:264..351]]}, projection=[column1], predicate=column1@0 != 42, pruning_predicate=CASE WHEN column1_null_count@2 = column1_row_count@3 THEN false ELSE column1_min@0 != 42 OR 42 != column1_max@1 END, required_guarantees=[column1 not in (42)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files are slightly different sizes (larger) due to apache/arrow-rs#6490

// specific language governing permissions and limitations
// under the License.

//! Common utilities for implementing regex functions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlm365 ported this upstream to arrow-rs in apache/arrow-rs#6376 ❤️

@@ -771,7 +771,7 @@ mod tests {
"c7: Int64",
"c8: Int64",
"c9: Int64",
"c10: Int64",
"c10: Utf8",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to apache/arrow-rs#6481 -- the fix for #3174 (see #3174 (comment))

@@ -103,65 +99,6 @@ impl PhysicalExpr for IsNullExpr {
}
}

/// workaround <https://github.com/apache/arrow-rs/issues/6017>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the implementation added in apache/arrow-rs#6303 from @gstvg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate functions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug with csv type inference
1 participant