Skip to content

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

}
DataType::Timestamp(TimeUnit::Millisecond, None) => {
let array = as_primitive_array::<TimestampMillisecondType>(&array);
let tz: Tz = tz.parse()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this called frequently (per row)? timezone parse is somewhat expensive (and does not change for a session).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is once per array, but I think the parsing could happen once during planning rather than per batch/array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, we can defer this for the moment.

/// The schema of the table. This is the expected schema after conversion
/// and it should match the schema of the query result.
projected_table_schema: SchemaRef,
required_schema: SchemaRef,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 :)

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still draft, but we can commit this whenever ready.

allow_incompat,
)?),
_ if is_datafusion_spark_compatible(from_type, to_type, allow_incompat) => {
_ if ugly_hack_for_poc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the cases (that we know of) where this gets invoked? (If we know we can replace this flag with an explicit check for those cases, perhaps?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we need to implement specific logic for adapting parquet schemas, rather than re-using our Spark-compatible cast. There is likely some overlap, so we can refactor the common code out. For example, regular spark casts do not need to support unsigned integers, but we need this when adapting Parquet schemas.

@andygrove andygrove marked this pull request as ready for review December 6, 2024 15:10
@andygrove andygrove merged commit bd797f5 into apache:comet-parquet-exec Dec 6, 2024
10 of 70 checks passed
@andygrove andygrove deleted the schema-adapter-fixes branch December 6, 2024 15:11
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I found that the description is empty. So I'm not sure what this tries to fix. Could you add some more details there? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants