Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Spark 4.0 sql tests #551

Open
kazuyukitanimura opened this issue Jun 10, 2024 · 4 comments
Open

Fix Spark 4.0 sql tests #551

kazuyukitanimura opened this issue Jun 10, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@kazuyukitanimura
Copy link
Contributor

kazuyukitanimura commented Jun 10, 2024

Describe the bug

Regarding #537, there are 103 Spark 4.0 sql tests failing.

  • sql1 91 tests failing
  • sql2 12 tests failing

Fix comet shims for the Spark 4.0 profile and remove IgnoreComet for those tests. Some of the tests may share same root causes

sql-1

WIP PR posted Done Failing test
SPARK-43402: FileSourceScanExec supports push down data filter with scalar subquery
[SPARK-43226] extra constant metadata fields with extractors
parquet widening conversion ShortType -> IntegerType
parquet widening conversion IntegerType -> ShortType
parquet widening conversion IntegerType -> LongType
parquet widening conversion ShortType -> DoubleType
parquet widening conversion IntegerType -> DoubleType
parquet widening conversion DateType -> TimestampNTZType
parquet widening conversion ByteType -> DecimalType(10,0)
parquet widening conversion ByteType -> DecimalType(20,0)
parquet widening conversion ShortType -> DecimalType(10,0)
parquet widening conversion ShortType -> DecimalType(20,0)
parquet widening conversion ShortType -> DecimalType(38,0)
parquet widening conversion IntegerType -> DecimalType(10,0)
parquet widening conversion IntegerType -> DecimalType(20,0)
parquet widening conversion IntegerType -> DecimalType(38,0)
parquet widening conversion LongType -> DecimalType(20,0)
parquet widening conversion LongType -> DecimalType(38,0)
parquet widening conversion ByteType -> DecimalType(11,1)
parquet widening conversion ShortType -> DecimalType(11,1)
parquet widening conversion IntegerType -> DecimalType(11,1)
parquet widening conversion LongType -> DecimalType(21,1)
unsupported parquet conversion ByteType -> DecimalType(1,0)
unsupported parquet conversion ByteType -> DecimalType(3,0)
unsupported parquet conversion ShortType -> DecimalType(3,0)
unsupported parquet conversion ShortType -> DecimalType(5,0)
unsupported parquet conversion IntegerType -> DecimalType(5,0)
unsupported parquet conversion ByteType -> DecimalType(4,1)
unsupported parquet conversion ShortType -> DecimalType(6,1)
unsupported parquet conversion LongType -> DecimalType(10,0)
unsupported parquet conversion ByteType -> DecimalType(2,0)
unsupported parquet conversion ShortType -> DecimalType(4,0)
unsupported parquet conversion IntegerType -> DecimalType(9,0)
unsupported parquet conversion LongType -> DecimalType(19,0)
unsupported parquet conversion ByteType -> DecimalType(3,1)
unsupported parquet conversion ShortType -> DecimalType(5,1)
unsupported parquet conversion IntegerType -> DecimalType(10,1)
unsupported parquet conversion LongType -> DecimalType(20,1)
unsupported parquet timestamp conversion TimestampType (TIMESTAMP_MICROS) -> DateType
unsupported parquet timestamp conversion TimestampType (TIMESTAMP_MILLIS) -> DateType
unsupported parquet timestamp conversion TimestampNTZType (INT96) -> DateType
unsupported parquet timestamp conversion TimestampNTZType (TIMESTAMP_MICROS) -> DateType
unsupported parquet timestamp conversion TimestampNTZType (TIMESTAMP_MILLIS) -> DateType
parquet decimal precision change Decimal(5, 2) -> Decimal(7, 2)
parquet decimal precision change Decimal(5, 2) -> Decimal(10, 2)
parquet decimal precision change Decimal(5, 2) -> Decimal(20, 2)
parquet decimal precision change Decimal(10, 2) -> Decimal(12, 2)
parquet decimal precision change Decimal(10, 2) -> Decimal(20, 2)
parquet decimal precision change Decimal(20, 2) -> Decimal(22, 2)
parquet decimal precision change Decimal(7, 2) -> Decimal(5, 2)
parquet decimal precision change Decimal(10, 2) -> Decimal(5, 2)
parquet decimal precision change Decimal(20, 2) -> Decimal(5, 2)
parquet decimal precision change Decimal(12, 2) -> Decimal(10, 2)
parquet decimal precision change Decimal(20, 2) -> Decimal(10, 2)
parquet decimal precision change Decimal(22, 2) -> Decimal(20, 2)
parquet decimal precision and scale change Decimal(5, 2) -> Decimal(7, 4)
parquet decimal precision and scale change Decimal(5, 2) -> Decimal(10, 7)
parquet decimal precision and scale change Decimal(5, 2) -> Decimal(20, 17)
parquet decimal precision and scale change Decimal(10, 2) -> Decimal(12, 4)
parquet decimal precision and scale change Decimal(10, 2) -> Decimal(20, 12)
parquet decimal precision and scale change Decimal(20, 2) -> Decimal(22, 4)
parquet decimal precision and scale change Decimal(7, 4) -> Decimal(5, 2)
parquet decimal precision and scale change Decimal(10, 7) -> Decimal(5, 2)
parquet decimal precision and scale change Decimal(20, 17) -> Decimal(5, 2)
parquet decimal precision and scale change Decimal(12, 4) -> Decimal(10, 2)
parquet decimal precision and scale change Decimal(20, 17) -> Decimal(10, 2)
parquet decimal precision and scale change Decimal(22, 4) -> Decimal(20, 2)
parquet decimal precision and scale change Decimal(10, 6) -> Decimal(12, 4)
parquet decimal precision and scale change Decimal(20, 7) -> Decimal(22, 5)
parquet decimal precision and scale change Decimal(12, 4) -> Decimal(10, 6)
parquet decimal precision and scale change Decimal(22, 5) -> Decimal(20, 7)
parquet decimal precision and scale change Decimal(5, 2) -> Decimal(6, 4)
parquet decimal precision and scale change Decimal(10, 4) -> Decimal(12, 7)
parquet decimal precision and scale change Decimal(20, 5) -> Decimal(22, 8)
parquet decimal type change Decimal(5, 2) -> Decimal(3, 2) overflows with parquet-mr
partition pruning in broadcast hash joins with aliases
partition pruning in broadcast hash joins
SPARK-32817: DPP throws error when the broadcast side is empty
SPARK-36444: Remove OptimizeSubqueries from batch of PartitionPruning
SPARK-38674: Remove useless deduplicate in SubqueryBroadcastExec
SPARK-39338: Remove dynamic pruning subquery if pruningKey's references is empty
SPARK-39217: Makes DPP support the pruning side has Union
partition pruning in broadcast hash joins with aliases
partition pruning in broadcast hash joins
different broadcast subqueries with identical children
SPARK-32817: DPP throws error when the broadcast side is empty
SPARK-36444: Remove OptimizeSubqueries from batch of PartitionPruning
SPARK-38674: Remove useless deduplicate in SubqueryBroadcastExec
SPARK-39338: Remove dynamic pruning subquery if pruningKey's references is empty
SPARK-39217: Makes DPP support the pruning side has Union
join with ordering requirement

sql-2

WIP PR posted Done Failing test
collations.sql
SPARK-39166: Query context of binary arithmetic should be serialized to executors when WSCG is off
SPARK-39175: Query context of Cast should be serialized to executors when WSCG is off
SPARK-39190,SPARK-39208,SPARK-39210: Query context of decimal overflow error should be serialized to executors when WSCG is off
SPARK-40389: Don't eliminate a cast which can cause overflow
postgreSQL/float8.sql
postgreSQL/groupingsets.sql
postgreSQL/int4.sql
SPARK-47120: subquery literal filter pushdown
SPARK-47120: subquery literal filter pushdown
view-schema-binding-config.sql
view-schema-compensation.sql

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

@kazuyukitanimura kazuyukitanimura added the bug Something isn't working label Jun 10, 2024
@kazuyukitanimura
Copy link
Contributor Author

Search https://github.com/apache/datafusion-comet/issues/551 in dev/diffs/4.0.0-preview1.diff to find ignored tests

@viirya
Copy link
Member

viirya commented Jun 11, 2024

Hmm, this failed tests are additional to Spark 3.4? I.e., they are passed in Spark 3.4 + Comet but fail in Spark 4.0?

@kazuyukitanimura
Copy link
Contributor Author

kazuyukitanimura commented Jun 11, 2024

Hmm, this failed tests are additional to Spark 3.4? I.e., they are passed in Spark 3.4 + Comet but fail in Spark 4.0?

Could be both additional and regression. Could be due to ANSI. This ticket is for getting help from the community after #537 is merged

kazuyukitanimura added a commit that referenced this issue Jun 19, 2024
## Rationale for this change

To be ready for Spark 4.0

## What changes are included in this PR?

This PR enables the spark-4.0 tests with comet enabled except for the ones listed in #551

## How are these changes tested?

ANSI is enabled for Spark-4.0
@kazuyukitanimura
Copy link
Contributor Author

@parthchandra is looking into

  • SPARK-43402: FileSourceScanExec supports push down data filter with scalar subquery
  • [SPARK-43226] extra constant metadata fields with extractors

kazuyukitanimura pushed a commit that referenced this issue Jul 19, 2024
## Which issue does this PR close?
Part of #372  and #551 

## Rationale for this change
With Spark 4.0, the `SubquerySuite` in Spark fails as Comet scan did not support the scala subquery feature.

## What changes are included in this PR?
Adds the support for scalar subquery pushdown into Comet scan 

## How are these changes tested?
Existing Spark/sql unit tests in `SubquerySuite`
kazuyukitanimura added a commit that referenced this issue Jul 20, 2024
## Which issue does this PR close?

Part of #372 and #551

## Rationale for this change

To be ready for Spark 4.0

## What changes are included in this PR?

This PR fixes the test that requires to see SparkArithmeticException

## How are these changes tested?

Enabled `SPARK-40389: Don't eliminate a cast which can cause overflow`
himadripal pushed a commit to himadripal/datafusion-comet that referenced this issue Sep 7, 2024
## Rationale for this change

To be ready for Spark 4.0

## What changes are included in this PR?

This PR enables the spark-4.0 tests with comet enabled except for the ones listed in apache#551

## How are these changes tested?

ANSI is enabled for Spark-4.0
himadripal pushed a commit to himadripal/datafusion-comet that referenced this issue Sep 7, 2024
## Which issue does this PR close?
Part of apache#372  and apache#551 

## Rationale for this change
With Spark 4.0, the `SubquerySuite` in Spark fails as Comet scan did not support the scala subquery feature.

## What changes are included in this PR?
Adds the support for scalar subquery pushdown into Comet scan 

## How are these changes tested?
Existing Spark/sql unit tests in `SubquerySuite`
himadripal pushed a commit to himadripal/datafusion-comet that referenced this issue Sep 7, 2024
## Which issue does this PR close?

Part of apache#372 and apache#551

## Rationale for this change

To be ready for Spark 4.0

## What changes are included in this PR?

This PR fixes the test that requires to see SparkArithmeticException

## How are these changes tested?

Enabled `SPARK-40389: Don't eliminate a cast which can cause overflow`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants