Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable decimal to decimal cast of different precision and scale #1086

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

himadripal
Copy link
Contributor

@himadripal himadripal commented Nov 14, 2024

Which issue does this PR close?

Part of #375

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@himadripal
Copy link
Contributor Author

@andygrove @viirya

@andygrove andygrove changed the title enable decimal to decimal cast of different precision and scale feat: enable decimal to decimal cast of different precision and scale Nov 14, 2024
@@ -895,6 +895,18 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper {
}
}

test("cast between decimals with different precision and scale") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see that some basic tests now pass. I assume there must have been improvements in DataFusion since we started this project.

I'd like to see the tests cover more scenarios, such as:

  • Casting from smaller scale to larger scale e.g. (10, 1) to (10, 4)
  • Tests for edge cases such as negative scale and zero scale

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for more scenarios

  • edge case inputs like null
  • negative scale may not be allowed in Spark IIRC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m adding more cases, negative scale throws scale exception in spark as well.

// cast between default Decimal(38, 18) to Decimal(9,1)
val values = Seq(BigDecimal("12345.6789"), BigDecimal("9876.5432"), BigDecimal("123.4567"))
val df = withNulls(values).toDF("a")
castTest(df, DataTypes.createDecimalType(7, 2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original test case in the issue was casting to Decimal(6, 2) instead of Decimal(7, 2)?

Copy link
Contributor Author

@himadripal himadripal Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the result of the test with (6,2) - on the right ( spark answer), the converted (precision,scale) is (7,2) for the last row and left side is matching with the issue description value. Should the left side produce 12345.68 instead of null?

== Results ==
!== Correct Answer - 4 ==                  == Spark Answer - 4 ==
 struct<a:decimal(38,18),a:decimal(6,2)>   struct<a:decimal(38,18),a:decimal(6,2)>
 [null,null]                               [null,null]
 [123.456700000000000000,123.46]           [123.456700000000000000,123.46]
 [9876.543200000000000000,9876.54]         [9876.543200000000000000,9876.54]
![12345.678900000000000000,null]           [12345.678900000000000000,12345.68]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is confusing here, but the left side is Spark and the right side is Comet. Spark is producing null but Comet is producing 12345.68. We need to have Comet use the same logic as Spark here.

case (_: DecimalType, _: DecimalType) =>
// https://github.com/apache/datafusion-comet/issues/375
Incompatible()
case (from: DecimalType, to: DecimalType) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

higher to lower precision conversion in Datafusion changing the integer part, hence marked it as incompatible().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change the PR description to part of #375 instead of closing...

@@ -231,11 +231,9 @@ abstract class CometTestBase
df: => DataFrame): (Option[Throwable], Option[Throwable]) = {
var expected: Option[Throwable] = None
withSQLConf(CometConf.COMET_ENABLED.key -> "false") {
val dfSpark = Dataset.ofRows(spark, df.logicalPlan)
expected = Try(dfSpark.collect()).failed.toOption
expected = Try(Dataset.ofRows(spark, df.logicalPlan).collect()).failed.toOption
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the plan parsing encounters a problem like negative precision, this will catch that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the change here is just formatting: changing from 2 lines to 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier only df.collect() part was inside the Try, I also included df.logicalPlan - this includes any exception during parsing, in this example, it was throwing parsing error for this

select a, cast(a as DECIMAL(10,-4)) from t order by a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants