Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal #26822

Merged
merged 2 commits into from
Nov 13, 2023

Conversation

jacktengg
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

For sql select cast('0.00164999999999998' as decimalv3(9,0)); be will coredump:

INFO: java_cmd /mnt/disk2/tengjianping/local/jdk1.8.0_131/bin/java
INFO: jdk_version 8
*** Query id: 1140fbeb086a499c-85d388ee84060c3f ***
*** tablet id: 0 ***                                                                                                                                                                                                                                                                         
*** Aborted at 1699611007 (unix time) try "date -d @1699611007" if you are using GNU date ***
*** Current BE git commitID: ad4bb1671b ***
*** SIGFPE integer divide by zero (@0x55d7f737ee73) received by PID 88254 (TID 88575 OR 0x7e9440102700) from PID 18446744073562222195; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/tengjianping/doris-38/be/src/common/signal_handler.h:417
 1# 0x00007FC6FB405400 in /lib64/libc.so.6
 2# int doris::StringParser::string_to_decimal<(doris::PrimitiveType)28, int>(char const*, int, int, int, doris::StringParser::ParseResult*) at /mnt/disk2/tengjianping/doris-38/be/src/util/string_parser.hpp:807
 3# doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDecimal<doris::vectorized::Decimal<int> >, doris::vectorized::NameCast>::execute<doris::vectorized::PrecisionScaleArg>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool, doris::vectorized::PrecisionScaleArg) at /mnt/disk2/tengjianping/doris-38/be/src/vec/functions/function_cast.h:1396
 4# bool doris::vectorized::FunctionCast::create_decimal_wrapper<doris::vectorized::Decimal<int> >(std::shared_ptr<doris::vectorized::IDataType const> const&, doris::vectorized::DataTypeDecimal<doris::vectorized::Decimal<int> > const*) const::{lambda(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long)#1}::operator()(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) const::{lambda(auto:1 const&)#1}::operator()<doris::vectorized::TypePair<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDecimal<doris::vectorized::Decimal<int> > > >(doris::vectorized::TypePair<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDecimal<doris::vectorized::Decimal<int> > > const&) const at /mnt/disk2/tengjianping/doris-38/be/src/vec/functions/function_cast.h:1639
 5# bool doris::vectorized::call_on_index_and_data_type<doris::vectorized::DataTypeDecimal<doris::vectorized::Decimal<int> >, doris::vectorized::FunctionCast::create_decimal_wrapper<doris::vectorized::Decimal<int> >(std::shared_ptr<doris::vectorized::IDataType const> const&, doris::vectorized::DataTypeDecimal<doris::vectorized::Decimal<int> > const*) const::{lambda(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long)#1}::operator()(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) const::{lambda(auto:1 const&)#1}>(doris::vectorized::TypeIndex, doris::vectorized::FunctionCast::create_decimal_wrapper<doris::vectorized::Decimal<int> >(std::shared_ptr<doris::vectorized::IDataType const> const&, doris::vectorized::DataTypeDecimal<doris::vectorized::Decimal<int> > const*) const::{lambda(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long)#1}::operator()(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) const::{lambda(auto:1 const&)#1}&&) in /mnt/disk2/tengjianping/doris-38/output/be/lib/doris_be
 6# doris::vectorized::FunctionCast::create_decimal_wrapper<doris::vectorized::Decimal<int> >(std::shared_ptr<doris::vectorized::IDataType const> const&, doris::vectorized::DataTypeDecimal<doris::vectorized::Decimal<int> > const*) const::{lambda(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long)#1}::operator()(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) const at /mnt/disk2/tengjianping/doris-38/be/src/vec/functions/function_cast.h:1650
 7# std::_Function_handler<doris::Status (doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long), doris::vectorized::FunctionCast::create_decimal_wrapper<doris::vectorized::Decimal<int> >(std::shared_ptr<doris::vectorized::IDataType const> const&, doris::vectorized::DataTypeDecimal<doris::vectorized::Decimal<int> > const*) const::{lambda(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long)#1}>::_M_invoke(std::_Any_data const&, doris::FunctionContext*&&, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long&&, unsigned long&&) at /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
 8# doris::vectorized::FunctionCast::prepare_remove_nullable(doris::FunctionContext*, std::shared_ptr<doris::vectorized::IDataType const> const&, std::shared_ptr<doris::vectorized::IDataType const> const&, bool) const::{lambda(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long)#1}::operator()(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) const at /mnt/disk2/tengjianping/doris-38/be/src/vec/functions/function_cast.h:2051
 9# std::_Function_handler<doris::Status (doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long), doris::vectorized::FunctionCast::prepare_remove_nullable(doris::FunctionContext*, std::shared_ptr<doris::vectorized::IDataType const> const&, std::shared_ptr<doris::vectorized::IDataType const> const&, bool) const::{lambda(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long)#1}>::_M_invoke(std::_Any_data const&, doris::FunctionContext*&&, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long&&, unsigned long&&) at /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
10# doris::vectorized::PreparedFunctionCast::execute_impl(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) at /mnt/disk2/tengjianping/doris-38/be/src/vec/functions/function_cast.h:1315
11# doris::vectorized::PreparedFunctionImpl::default_implementation_for_constant_arguments(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool, bool*) at /mnt/disk2/tengjianping/doris-38/be/src/vec/functions/function.cpp:201
12# doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool) in /mnt/disk2/tengjianping/doris-38/output/be/lib/doris_be
13# doris::vectorized::PreparedFunctionImpl::execute(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool) at /mnt/disk2/tengjianping/doris-38/be/src/vec/functions/function.cpp:268
14# doris::vectorized::IFunctionBase::execute(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool) at /mnt/disk2/tengjianping/doris-38/be/src/vec/functions/function.h:177
15# doris::vectorized::VCastExpr::execute(doris::vectorized::VExprContext*, doris::vectorized::Block*, int*) at /mnt/disk2/tengjianping/doris-38/be/src/vec/exprs/vcast_expr.cpp:110
16# doris::vectorized::VExprContext::execute(doris::vectorized::Block*, int*) at /mnt/disk2/tengjianping/doris-38/be/src/vec/exprs/vexpr_context.cpp:60
17# doris::vectorized::VUnionNode::get_next_const(doris::RuntimeState*, doris::vectorized::Block*) at /mnt/disk2/tengjianping/doris-38/be/src/vec/exec/vunion_node.cpp:226
18# doris::vectorized::VUnionNode::get_next(doris::RuntimeState*, doris::vectorized::Block*, bool*) in /mnt/disk2/tengjianping/doris-38/output/be/lib/doris_be
19# doris::ExecNode::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk2/tengjianping/doris-38/be/src/exec/exec_node.h:127
20# std::_Function_handler<doris::Status (doris::RuntimeState*, doris::vectorized::Block*, bool*), std::_Bind<doris::Status (doris::ExecNode::*(doris::vectorized::VUnionNode*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(doris::RuntimeState*, doris::vectorized::Block*, bool*)> >::_M_invoke(std::_Any_data const&, doris::RuntimeState*&&, doris::vectorized::Block*&&, bool*&&) at /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
21# doris::ExecNode::get_next_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*, std::function<doris::Status (doris::RuntimeState*, doris::vectorized::Block*, bool*)> const&, bool) at /mnt/disk2/tengjianping/doris-38/be/src/exec/exec_node.cpp:596
22# doris::pipeline::SourceOperator<doris::pipeline::ConstValueOperatorBuilder>::get_block(doris::RuntimeState*, doris::vectorized::Block*, doris::pipeline::SourceState&) at /mnt/disk2/tengjianping/doris-38/be/src/pipeline/exec/operator.h:413
23# doris::pipeline::PipelineTask::execute(bool*) at /mnt/disk2/tengjianping/doris-38/be/src/pipeline/pipeline_task.cpp:259
24# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /mnt/disk2/tengjianping/doris-38/be/src/pipeline/task_scheduler.cpp:266

The reason is that function common::exp10_i32 called by get_scale_multiplier(11) will read out-of-range of the exp_table array, which result in Undefined Behaviour, and the if (LIKELY(divisor > 0)) statement if false positive.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@jacktengg
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

inline constexpr std::int32_t exp10_i32(int x) {
return exp_details::get_exp<std::int32_t, 10, 10>(x);
constexpr inline int exp10_i32(int x) {
if (x < 0) return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (x < 0) return 0;
if (x < 0) { return 0;
}

if (x < 0) return 0;
if (x > 9) return std::numeric_limits<int>::max();

constexpr int values[] = {1, 10, 100, 1000, 10000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not declare C-style arrays, use std::array<> instead [modernize-avoid-c-arrays]

    constexpr int values[] = {1,      10,      100,      1000,      10000,
              ^

inline constexpr std::int64_t exp10_i64(int x) {
return exp_details::get_exp<std::int64_t, 10, 19>(x);
constexpr inline int64_t exp10_i64(int x) {
if (x < 0) return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (x < 0) return 0;
if (x < 0) { return 0;
}

if (x < 0) return 0;
if (x > 18) return std::numeric_limits<int64_t>::max();

constexpr int64_t values[] = {1LL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not declare C-style arrays, use std::array<> instead [modernize-avoid-c-arrays]

    constexpr int64_t values[] = {1LL,
              ^

inline constexpr __int128 exp10_i128(int x) {
return exp_details::get_exp<__int128, 10, 39>(x);
constexpr inline __int128 exp10_i128(int x) {
if (x < 0) return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (x < 0) return 0;
if (x < 0) { return 0;
}

if (x < 0) return 0;
if (x > 38) return std::numeric_limits<__int128>::max();

constexpr __int128 values[] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not declare C-style arrays, use std::array<> instead [modernize-avoid-c-arrays]

    constexpr __int128 values[] = {
              ^

@jacktengg
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

if (x < 0) {
return 0;
}
if (x > 19) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 19 is a magic number; consider replacing it with a named constant [readability-magic-numbers]

    if (x > 19) {
            ^

if (x < 0) {
return 0;
}
if (x > 9) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 9 is a magic number; consider replacing it with a named constant [readability-magic-numbers]

    if (x > 9) {
            ^

if (x < 0) {
return 0;
}
if (x > 18) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 18 is a magic number; consider replacing it with a named constant [readability-magic-numbers]

    if (x > 18) {
            ^

if (x < 0) {
return 0;
}
if (x > 38) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 38 is a magic number; consider replacing it with a named constant [readability-magic-numbers]

    if (x > 38) {
            ^

if (x < 0) {
return 0;
}
if (x > 76) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 76 is a magic number; consider replacing it with a named constant [readability-magic-numbers]

    if (x > 76) {
            ^

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.79% (8407/22852)
Line Coverage: 29.31% (68232/232774)
Region Coverage: 27.91% (35227/126209)
Branch Coverage: 24.73% (18015/72840)
Coverage Report: http://coverage.selectdb-in.cc/coverage/4f75cf035edbf41d34eb324a3643dff4f4b3aa33_4f75cf035edbf41d34eb324a3643dff4f4b3aa33/report/index.html

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.79% (8407/22852)
Line Coverage: 29.31% (68232/232794)
Region Coverage: 27.91% (35229/126209)
Branch Coverage: 24.73% (18016/72840)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a81aaac58748588c90ac156057a4c52feb949ad0_a81aaac58748588c90ac156057a4c52feb949ad0/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.42 seconds
stream load tsv: 572 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162615493 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit a81aaac58748588c90ac156057a4c52feb949ad0, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5276	5085	5049	5049
q2	362	192	198	192
q3	2078	2020	2012	2012
q4	1506	1440	1426	1426
q5	4112	4167	4146	4146
q6	254	138	137	137
q7	2122	1608	1599	1599
q8	2780	2737	2780	2737
q9	10392	10216	10245	10216
q10	3494	3582	3556	3556
q11	385	269	259	259
q12	461	316	313	313
q13	4562	4120	4064	4064
q14	326	303	296	296
q15	626	564	589	564
q16	703	623	605	605
q17	1147	1094	1070	1070
q18	7846	7279	7420	7279
q19	1706	1691	1698	1691
q20	595	373	346	346
q21	4956	4560	4590	4560
q22	537	429	437	429
Total cold run time: 56226 ms
Total hot run time: 52546 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5012	5002	4961	4961
q2	371	239	248	239
q3	4033	3999	3964	3964
q4	2813	2776	2754	2754
q5	6450	6472	6501	6472
q6	246	131	131	131
q7	3142	2739	2622	2622
q8	4852	4766	4769	4766
q9	17730	17703	17572	17572
q10	4050	4170	4135	4135
q11	824	653	674	653
q12	981	841	812	812
q13	4296	3938	3895	3895
q14	386	368	349	349
q15	632	587	576	576
q16	780	705	706	705
q17	3898	3980	3919	3919
q18	9445	9154	9269	9154
q19	1900	1777	1773	1773
q20	2393	2043	2044	2043
q21	8771	8748	8709	8709
q22	883	870	883	870
Total cold run time: 83888 ms
Total hot run time: 81074 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.82 seconds
stream load tsv: 571 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162603571 Bytes

@jacktengg
Copy link
Contributor Author

run p0

Copy link
Member

@mrhhsg mrhhsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 13, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@yiguolei yiguolei merged commit 7332b1b into apache:master Nov 13, 2023
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 13, 2023
…ring to decimal (apache#26822)

* [fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal

* fix format
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 14, 2023
…ring to decimal (apache#26822)

* [fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal

* fix format
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…ring to decimal (apache#26822)

* [fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal

* fix format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants