Skip to content

Conversation

@andygrove
Copy link
Member

GitHub won't let me merge Parth's PR, so trying this instead.

@andygrove andygrove merged commit ab09337 into apache:comet-parquet-exec Dec 4, 2024
6 of 77 checks passed
@andygrove andygrove deleted the comet-parquet-exec-2 branch December 4, 2024 18:51
@viirya
Copy link
Member

viirya commented Dec 4, 2024

Is this the second POC? If so, shouldn't it be another feature branch?

@parthchandra
Copy link
Contributor

This is the second POC. We are converging towards using the same schema adapter code and eventually having just one implementation, and it seemed a good idea to have just one branch.

@viirya
Copy link
Member

viirya commented Dec 5, 2024

Hmm, I thought that the 2 different approaches are conflicting?

@parthchandra
Copy link
Contributor

Not really. They are pretty much independent. The difference is that POC 1 uses DF ParquetExec operator directly. POC 2 uses arrow reader to replace the existing native column readers with arrow column readers and also replaces the jvm file reader with the arrow native file reader. Iceberg integration uses the column readers directly which is why we are doing it this way. At the moment the code paths are completely independent and have no code overlap.
I'm looking at using DF under the covers instead of arrow directly so we can use the schema adapter code written in POC 1 and we'll be able to reuse some of the work in POC1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants