Skip to content

Failure in kqp_planner.cpp under load #8512

Closed
@GrigoriyPA

Description

@GrigoriyPA

Воспроизводится на версии:

YQL_ENSURE(it != PendingComputeActors.end());

Стек падения:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f1985b2e859 in __GI_abort () at abort.c:79
#2  0x0000558cba3edf82 in bt_terminate_handler ()
    at /home/grigoriypisar/ydbwork/ydb/contrib/libs/cxxsupp/libcxxrt/exception.cc:347
#3  0x0000558cba3ec787 in std::terminate ()
    at /home/grigoriypisar/ydbwork/ydb/contrib/libs/cxxsupp/libcxxrt/exception.cc:1636
#4  report_failure (err=<optimized out>, thrown_exception=0x4510d740910)
    at /home/grigoriypisar/ydbwork/ydb/contrib/libs/cxxsupp/libcxxrt/exception.cc:813
#5  0x0000558cbbe50f9f in NYql::NDetail::YqlPanic (file=..., line=line@entry=554, function=<optimized out>,
    condition=..., message=...) at /home/grigoriypisar/ydbwork/ydb/ydb/library/yql/utils/yql_panic.cpp:14
#6  0x0000558cc9a96dc1 in NKikimr::NKqp::TKqpPlanner::CompletedCA (this=0x4517c75fe00, taskId=<optimized out>,
    computeActor=...) at /home/grigoriypisar/ydbwork/ydb/ydb/core/kqp/executer_actor/kqp_planner.cpp:554
#7  0x0000558cc9a89d4f in NKikimr::NKqp::(anonymous namespace)::TKqpDataExecuter::HandleShutdown (this=0x45119fb1e00,
    ev=...) at /home/grigoriypisar/ydbwork/ydb/ydb/core/kqp/executer_actor/kqp_data_executer.cpp:2652
#8  NKikimr::NKqp::(anonymous namespace)::TKqpDataExecuter::WaitShutdownState (this=0x45119fb1e00, ev=...)
    at /home/grigoriypisar/ydbwork/ydb/ydb/core/kqp/executer_actor/kqp_data_executer.cpp:2637
#9  0x0000558cbb15b802 in NActors::TGenericExecutorThread::Execute<NActors::TMailboxTable::THTSwapMailbox> (
    this=this@entry=0x450fe104500, mailbox=0x450fa4f3040, hint=hint@entry=7361, isTailExecution=<optimized out>)
    at /home/grigoriypisar/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:251
#10 0x0000558cbb152e5b in NActors::TGenericExecutorThread::ProcessExecutorPool(NActors::IExecutorPool*)::$_0::operator()(unsigned int, bool) const (this=this@entry=0x7f19700b2f68, activation=activation@entry=7361, isTailExecution=false)
    at /home/grigoriypisar/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:440
#11 0x0000558cbb1527c0 in NActors::TGenericExecutorThread::ProcessExecutorPool (this=this@entry=0x450fe104500,
    pool=<optimized out>) at /home/grigoriypisar/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:493
#12 0x0000558cbb1536b7 in NActors::TExecutorThread::ThreadProc (this=0x450fe104500)
    at /home/grigoriypisar/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:524
#13 0x0000558cba4a6aca in (anonymous namespace)::TPosixThread::ThreadProxy (arg=0x450fe3b05a0)
    at /home/grigoriypisar/ydbwork/ydb/util/system/thread.cpp:244
#14 0x00007f1985d0b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x00007f1985c2b353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Падение предположительно вызвано временным дисконнектом ноды во время shut down для data executor (это происходит из-за переполнения input буфера IC), при дисконнекте вызывается этот метод (для CA не успевших отослать TEvDqCompute::TEvState):

Planner->TaskNotStarted(task.Id);

Соответственно этого CA нету в списке PendingComputeActors и когда пул IC освобождается и приходит TEvDqCompute::TEvState с проблемной ноды, тут происходит падение:

YQL_ENSURE(it != PendingComputeActors.end());

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions