Skip to content

Sum should not panicked when overflow  #2455

@yjshen

Description

@yjshen

Describe the bug
A simple sum over Int8 column panicked at "attempt to add with overflow".

To Reproduce
#[tokio::test]
async fn csv_query_array_agg_simple() -> Result<()> {
let ctx = SessionContext::new();
register_aggregate_csv(&ctx).await?;
let sql =
"select c2, sum(c3) sum_c3";
let actual = execute_to_batches(&ctx, sql).await;
let expected = vec![
"+--------+",
"| sum_c3 |",
"+--------+",
"| TBD |",
"+--------+",
];
assert_batches_eq!(expected, &actual);
Ok(())
}

Expected behavior
The overflow should return null or wrapping around at the boundary

Additional context
Apache Spark provides two versions of sum: one named Sum that wraps around the boundary, and TrySum that returns null on overflow.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L159-L255

Also, Spark uses several common types as sum result types:

  protected lazy val resultType = child.dataType match {
    case DecimalType.Fixed(precision, scale) =>
      DecimalType.bounded(precision + 10, scale)
    case _: IntegralType => LongType
    case it: YearMonthIntervalType => it
    case it: DayTimeIntervalType => it
    case _ => DoubleType
  }

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions