Skip to content

Commit

Permalink
update expected plans for Spark 3.4
Browse files Browse the repository at this point in the history
  • Loading branch information
andygrove committed Oct 9, 2024
1 parent fd87412 commit bec5856
Show file tree
Hide file tree
Showing 50 changed files with 5,152 additions and 5,825 deletions.
6 changes: 2 additions & 4 deletions common/src/main/scala/org/apache/comet/CometConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -265,11 +265,9 @@ object CometConf extends ShimCometConf {

val COMET_REPLACE_SMJ: ConfigEntry[Boolean] =
conf(s"$COMET_EXEC_CONFIG_PREFIX.replaceSortMergeJoin")
.doc("Whether to replace SortMergeJoin with ShuffledHashJoin for improved " +
"performance (experimental).")
.internal()
.doc("Whether to replace SortMergeJoin with ShuffledHashJoin for improved performance.")
.booleanConf
.createWithDefault(false)
.createWithDefault(true)

val COMET_EXEC_SHUFFLE_CODEC: ConfigEntry[String] = conf(
s"$COMET_EXEC_CONFIG_PREFIX.shuffle.codec")
Expand Down
1 change: 1 addition & 0 deletions docs/source/user-guide/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Comet provides the following configuration settings.
| spark.comet.exec.localLimit.enabled | Whether to enable localLimit by default. | true |
| spark.comet.exec.memoryFraction | The fraction of memory from Comet memory overhead that the native memory manager can use for execution. The purpose of this config is to set aside memory for untracked data structures, as well as imprecise size estimation during memory acquisition. Default value is 0.7. | 0.7 |
| spark.comet.exec.project.enabled | Whether to enable project by default. | true |
| spark.comet.exec.replaceSortMergeJoin | Whether to replace SortMergeJoin with ShuffledHashJoin for improved performance. | true |
| spark.comet.exec.shuffle.codec | The codec of Comet native shuffle used to compress shuffle data. Only zstd is supported. | zstd |
| spark.comet.exec.shuffle.enabled | Whether to enable Comet native shuffle. Note that this requires setting 'spark.shuffle.manager' to 'org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager'. 'spark.shuffle.manager' must be set before starting the Spark application and cannot be changed during the application. | true |
| spark.comet.exec.sort.enabled | Whether to enable sort by default. | true |
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,44 +1,59 @@
WholeStageCodegen (2)
WholeStageCodegen (7)
HashAggregate [sum,sum,count] [sum(UnscaledValue(cs_ext_ship_cost)),sum(UnscaledValue(cs_net_profit)),count(cs_order_number),order count ,total shipping cost ,total net profit ,sum,sum,count]
InputAdapter
Exchange #1
WholeStageCodegen (1)
WholeStageCodegen (6)
HashAggregate [cs_order_number] [sum(UnscaledValue(cs_ext_ship_cost)),sum(UnscaledValue(cs_net_profit)),count(cs_order_number),sum,sum,count,sum,sum,count]
HashAggregate [cs_order_number] [sum(UnscaledValue(cs_ext_ship_cost)),sum(UnscaledValue(cs_net_profit)),sum,sum,sum,sum]
ColumnarToRow
InputAdapter
CometHashAggregate [cs_order_number,sum,sum,cs_ext_ship_cost,cs_net_profit]
CometProject [cs_order_number,cs_ext_ship_cost,cs_net_profit]
CometBroadcastHashJoin [cs_call_center_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit,cc_call_center_sk]
CometProject [cs_call_center_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit]
CometBroadcastHashJoin [cs_ship_addr_sk,cs_call_center_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit,ca_address_sk]
CometProject [cs_ship_addr_sk,cs_call_center_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit]
CometBroadcastHashJoin [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit,d_date_sk]
CometSortMergeJoin [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit,cr_order_number]
CometProject [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit]
CometSortMergeJoin [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_warehouse_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit,cs_order_number,cs_warehouse_sk]
CometSort [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_warehouse_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit]
CometExchange [cs_order_number] #2
CometProject [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_warehouse_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit]
CometFilter [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_warehouse_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit,cs_sold_date_sk]
CometScan parquet spark_catalog.default.catalog_sales [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_warehouse_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit,cs_sold_date_sk]
CometSort [cs_warehouse_sk,cs_order_number]
CometExchange [cs_order_number] #3
CometProject [cs_warehouse_sk,cs_order_number]
CometScan parquet spark_catalog.default.catalog_sales [cs_warehouse_sk,cs_order_number,cs_sold_date_sk]
CometSort [cr_order_number]
CometExchange [cr_order_number] #4
CometProject [cr_order_number]
CometScan parquet spark_catalog.default.catalog_returns [cr_order_number,cr_returned_date_sk]
CometBroadcastExchange [d_date_sk] #5
CometProject [d_date_sk]
CometFilter [d_date_sk,d_date]
CometScan parquet spark_catalog.default.date_dim [d_date_sk,d_date]
CometBroadcastExchange [ca_address_sk] #6
CometProject [ca_address_sk]
CometFilter [ca_address_sk,ca_state]
CometScan parquet spark_catalog.default.customer_address [ca_address_sk,ca_state]
CometBroadcastExchange [cc_call_center_sk] #7
CometProject [cc_call_center_sk]
CometFilter [cc_call_center_sk,cc_county]
CometScan parquet spark_catalog.default.call_center [cc_call_center_sk,cc_county]
HashAggregate [cs_order_number,cs_ext_ship_cost,cs_net_profit] [sum(UnscaledValue(cs_ext_ship_cost)),sum(UnscaledValue(cs_net_profit)),sum,sum,sum,sum]
Project [cs_order_number,cs_ext_ship_cost,cs_net_profit]
BroadcastHashJoin [cs_call_center_sk,cc_call_center_sk]
Project [cs_call_center_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit]
BroadcastHashJoin [cs_ship_addr_sk,ca_address_sk]
Project [cs_ship_addr_sk,cs_call_center_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit]
BroadcastHashJoin [cs_ship_date_sk,d_date_sk]
ShuffledHashJoin [cs_order_number,cr_order_number]
InputAdapter
WholeStageCodegen (1)
ColumnarToRow
InputAdapter
CometProject [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit]
CometHashJoin [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_warehouse_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit,cs_order_number,cs_warehouse_sk]
CometExchange [cs_order_number] #2
CometProject [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_warehouse_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit]
CometFilter [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_warehouse_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit,cs_sold_date_sk]
CometScan parquet spark_catalog.default.catalog_sales [cs_ship_date_sk,cs_ship_addr_sk,cs_call_center_sk,cs_warehouse_sk,cs_order_number,cs_ext_ship_cost,cs_net_profit,cs_sold_date_sk]
CometExchange [cs_order_number] #3
CometProject [cs_warehouse_sk,cs_order_number]
CometScan parquet spark_catalog.default.catalog_sales [cs_warehouse_sk,cs_order_number,cs_sold_date_sk]
InputAdapter
WholeStageCodegen (2)
ColumnarToRow
InputAdapter
CometExchange [cr_order_number] #4
CometProject [cr_order_number]
CometScan parquet spark_catalog.default.catalog_returns [cr_order_number,cr_returned_date_sk]
InputAdapter
BroadcastExchange #5
WholeStageCodegen (3)
ColumnarToRow
InputAdapter
CometProject [d_date_sk]
CometFilter [d_date_sk,d_date]
CometScan parquet spark_catalog.default.date_dim [d_date_sk,d_date]
InputAdapter
BroadcastExchange #6
WholeStageCodegen (4)
ColumnarToRow
InputAdapter
CometProject [ca_address_sk]
CometFilter [ca_address_sk,ca_state]
CometScan parquet spark_catalog.default.customer_address [ca_address_sk,ca_state]
InputAdapter
BroadcastExchange #7
WholeStageCodegen (5)
ColumnarToRow
InputAdapter
CometProject [cc_call_center_sk]
CometFilter [cc_call_center_sk,cc_county]
CometScan parquet spark_catalog.default.call_center [cc_call_center_sk,cc_county]
Loading

0 comments on commit bec5856

Please sign in to comment.