Skip to content

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Dec 18, 2024

@andygrove suggested it might be helpful to see what the comet-parquet-exec branch with main merged into it (see #1183) looks like against upstream/main to see if the diff looks reasonable. Please do not merge!

mbutrovich and others added 30 commits November 8, 2024 13:54
add partial support for multiple parquet files
"filter with string" test now passes
* wip - CometNativeScan

* fix and make config internal
…e debug logging (apache#1080)

* update tests, remove some debug logging

* update tests, remove some debug logging

* update tests, remove some debug logging

* remove unused import
…che#1081)

* I think serde works. Gonna try removing the old stuff.

* Fixes after merging in upstream.

* Remove previous file_config logic. Clippy.

* Temporary assertion for testing.

* Remove old path proto value.

* Selectively generate projection vector.
…stead of FileScanRDD (apache#1088)

* DataSourceRDD handling (seems to be related to prefetching, so maybe not relevant for our ParquetExec).

* Refactor to reduce duplicate code.
…pache#1106)

* init

* more

* more

* fix clippy

* Use Spark and Arrow types for partition schema
* fix: Use RDD partition index (apache#1112)

* fix: Use RDD partition index

* fix

* fix

* fix

* fix style
…e#1138)

* WIP: (POC2) A Parquet reader that uses the arrow-rs Parquet reader directly

* Change default config

---------

Co-authored-by: Parth Chandra <parthc@apache.org>
…rquet (apache#1075)

* implement basic native code for casting struct to struct

* add another test

* rustdoc

* add scala side

* code cleanup

* clippy

* clippy

* add scala test

* improve test

* simple struct case passes

* save progress

* copy schema adapter code from DataFusion

* more tests pass

* save progress

* remove debug println

* remove debug println
…e#1142)

* Serialize original data schema and required schema, generate projection vector on the Java side.

* Sending over more schema info like column names and nullability.

* Using the new stuff in the proto. About to take the old out.

* Remove old logic.

* remove errant print.

* Serialize original data schema and required schema, generate projection vector on the Java side.

* Sending over more schema info like column names and nullability.

* Using the new stuff in the proto. About to take the old out.

* Remove old logic.

* remove errant print.

* Remove commented print. format.

* Remove commented print. format.

* Fix projection_vector to include partition_schema cols correctly.

* Rename variable.
parthchandra and others added 15 commits December 5, 2024 15:37
* support more timestamp conversions

* improve error handling

* rename projected_table_schema to required_schema

* Save

* save

* save

* code cleanup
…implementation (apache#1170)

* fix: CometScanExec was created for unsupported cases if only COMET_NATIVE_SCAN is enabled

* fix: Another try to fix '  test("Comet native metrics: BroadcastHashJoin")

* fix: some tests are valid only when full native scan is enabled

* Merge pull request #1 from andygrove/fix-tests-spark-cast-options
…or use in iceberg reads (apache#1174)

* wip. Use DF's ParquetExec for Iceberg API

* wip - await??

* wip

* wip -

* fix shading issue

* fix shading issue

* fixes

* refactor to remove arrow based reader

* rename config

* Fix config defaults

---------

Co-authored-by: Andy Grove <agrove@apache.org>
# Conflicts:
#	native/Cargo.lock
#	native/Cargo.toml
#	native/core/src/execution/jni_api.rs
#	native/core/src/execution/planner.rs
#	native/core/src/execution/schema_adapter.rs
#	native/spark-expr/src/cast.rs
#	native/spark-expr/src/lib.rs
#	native/spark-expr/src/test_common/mod.rs
#	native/spark-expr/src/utils.rs
#	spark/src/main/scala/org/apache/comet/CometExecIterator.scala
#	spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala
#	spark/src/main/scala/org/apache/comet/Native.scala
#	spark/src/main/scala/org/apache/spark/sql/comet/operators.scala
#	spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
#	spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala
@mbutrovich mbutrovich closed this Jan 2, 2025
@mbutrovich mbutrovich deleted the merge_upstream_main branch January 2, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants