Panic on arithmetic overflow when executing some UDAFs #3385

MichaelScofield · 2024-02-26T07:49:21Z

What type of bug is this?

Unexpected error

What subsystems are affected?

Query Engine

Minimal reproduce step

Start a standalone instance, create a table like this:

create table foo(ts timestamp time index, i int64)

Then insert some data:

insert into foo values(1, -1), (2, 9223372036854775807), (3, 1)

"9223372036854775807" is i64::MAX, to explicitly produce some arithmetic overflow.
Now executeselect polyval(i, 2) from foo, you will see the panic in greptimedb:

2024-02-26T07:33:51.671361Z ERROR on_query: common_telemetry::panic_hook: panicked at src/common/function/src/scalars/aggregate/polyval.rs:205:32:
attempt to multiply with overflow backtrace=   0: backtrace::backtrace::libunwind::trace
...

To see another panic with the same cause of arithmetic overflow, try executing select diff(i) from foo through http endpoint:

curl -i -X POST -d 'sql=select diff(i) from foo' http://127.0.0.1:4000/v1/sql

(The reason we have to invoke "diff" func like this is because it produces “list" datatype, which is not writable for mysql.)

What did you expect to see?

Correct result, or error(instead of panic) on "arithmetic overflow".

What did you see instead?

panic

What operating system did you use?

all

What version of GreptimeDB did you use?

main

Relevant log output and stack trace

No response

The text was updated successfully, but these errors were encountered:

MichaelScofield · 2024-02-26T08:24:30Z

I see there may have 4 possible fix for this:

Always able to calculate (no overflow): make the "largest type" of i64 or u64 to be float, or string. No need to say this is the least elegant way to go.
Refactor current codes to error on overflow. This might require some subtle type refactor. For example, to check substraction overflow, we have make the input datatype impl CheckedSub trait. But then we need to take special care for float datatypes, which is a little annoying to impl.
Wrapping on overflow (Datafusion's "sum" impl does this).
Combine 2 and 3, use the same way how Datafusion deal with overflow (by taking Arrow's ArrowNativeTypeOp), then error on it if overflow happens. There are some overflow checking methods in ArrowNativeTypeOp we can use.

waynexia · 2024-02-28T04:02:38Z

Offloading the op and related check to arrow/datafusion makes sense to me.

This pattern occurs in other UDF impl in our codebase. Maybe we should consider switching to ArrowNativeTypeOp in other places as well?

MichaelScofield added the C-bug Category Bugs label Feb 26, 2024

MichaelScofield mentioned this issue Feb 26, 2024

aggregate_diff value sub #199

Closed

killme2008 assigned MichaelScofield May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic on arithmetic overflow when executing some UDAFs #3385

Panic on arithmetic overflow when executing some UDAFs #3385

MichaelScofield commented Feb 26, 2024

MichaelScofield commented Feb 26, 2024

waynexia commented Feb 28, 2024

Panic on arithmetic overflow when executing some UDAFs #3385

Panic on arithmetic overflow when executing some UDAFs #3385

Comments

MichaelScofield commented Feb 26, 2024

What type of bug is this?

What subsystems are affected?

Minimal reproduce step

What did you expect to see?

What did you see instead?

What operating system did you use?

What version of GreptimeDB did you use?

Relevant log output and stack trace

MichaelScofield commented Feb 26, 2024

waynexia commented Feb 28, 2024