Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on arithmetic overflow when executing some UDAFs #3385

Open
MichaelScofield opened this issue Feb 26, 2024 · 2 comments
Open

Panic on arithmetic overflow when executing some UDAFs #3385

MichaelScofield opened this issue Feb 26, 2024 · 2 comments
Assignees
Labels
C-bug Category Bugs

Comments

@MichaelScofield
Copy link
Collaborator

What type of bug is this?

Unexpected error

What subsystems are affected?

Query Engine

Minimal reproduce step

Start a standalone instance, create a table like this:

create table foo(ts timestamp time index, i int64)

Then insert some data:

insert into foo values(1, -1), (2, 9223372036854775807), (3, 1)

"9223372036854775807" is i64::MAX, to explicitly produce some arithmetic overflow.
Now executeselect polyval(i, 2) from foo, you will see the panic in greptimedb:

2024-02-26T07:33:51.671361Z ERROR on_query: common_telemetry::panic_hook: panicked at src/common/function/src/scalars/aggregate/polyval.rs:205:32:
attempt to multiply with overflow backtrace=   0: backtrace::backtrace::libunwind::trace
...

To see another panic with the same cause of arithmetic overflow, try executing select diff(i) from foo through http endpoint:

curl -i -X POST -d 'sql=select diff(i) from foo' http://127.0.0.1:4000/v1/sql

(The reason we have to invoke "diff" func like this is because it produces “list" datatype, which is not writable for mysql.)

What did you expect to see?

Correct result, or error(instead of panic) on "arithmetic overflow".

What did you see instead?

panic

What operating system did you use?

all

What version of GreptimeDB did you use?

main

Relevant log output and stack trace

No response

@MichaelScofield
Copy link
Collaborator Author

I see there may have 4 possible fix for this:

  1. Always able to calculate (no overflow): make the "largest type" of i64 or u64 to be float, or string. No need to say this is the least elegant way to go.
  2. Refactor current codes to error on overflow. This might require some subtle type refactor. For example, to check substraction overflow, we have make the input datatype impl CheckedSub trait. But then we need to take special care for float datatypes, which is a little annoying to impl.
  3. Wrapping on overflow (Datafusion's "sum" impl does this).
  4. Combine 2 and 3, use the same way how Datafusion deal with overflow (by taking Arrow's ArrowNativeTypeOp), then error on it if overflow happens. There are some overflow checking methods in ArrowNativeTypeOp we can use.

@waynexia
Copy link
Member

Offloading the op and related check to arrow/datafusion makes sense to me.

This pattern occurs in other UDF impl in our codebase. Maybe we should consider switching to ArrowNativeTypeOp in other places as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category Bugs
Projects
None yet
Development

No branches or pull requests

2 participants