-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Describe the bug
Sometimes the program may crashed at AckGroupingTrackerEnabled#scheduleTimer. Though #8519 tries to solve the problem by extending the lifetime of AckGroupingTrackerEnabled so that the callback won't access the outdated this. However, the segmentation fault still happens.
A typical stack trace is:
#6 <signal handler called>
#7 0x00007f5aad920b60 in ?? ()
#8 0x00007f6e9ee7d1bb in boost::asio::detail::wait_handler<pulsar::AckGroupingTrackerEnabled::scheduleTimer()::{lambda(boost::system::error_code const&)#1}>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) ()
from /opt/vertica/verticadb/v_verticadb_node0003_catalog/Libraries/0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c/PulsarSourceLib_0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c.so
#9 0x00007f6e9edd78d3 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
from /opt/vertica/verticadb/v_verticadb_node0003_catalog/Libraries/0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c/PulsarSourceLib_0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c.so
#10 0x00007f6e9edd4aa6 in pulsar::ExecutorService::startWorker(std::shared_ptr<boost::asio::io_context>) ()
from /opt/vertica/verticadb/v_verticadb_node0003_catalog/Libraries/0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c/PulsarSourceLib_0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c.so
#11 0x00007f6e9edd9c82 in std::thread::_Impl<std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::)(std::shared_ptr<boost::asio::io_context>)> (pulsar::ExecutorService, std::shared_ptr<boost::asio::io_context>)> ()> >::_M_run() ()
from /opt/vertica/verticadb/v_verticadb_node0003_catalog/Libraries/0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c/PulsarSourceLib_0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c.so
#12 0x00007f6fcb5d2070 in ?? () from /lib64/libstdc++.so.6
#13 0x00007f6fcb006dd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f6fca923ead in clone () from /lib64/libc.so.6
To Reproduce
It cannot be reproduced easily. The running environment is that a Client is long lived, and many Readers are periodly created and used to read some messages.
Expected behavior
The segmentation fault should not happen.
Additional context
A solution that may work is refactoring the timer design. Currently, the deadline timer is recreated each time in the callback. And there's no state check like PartitionedConsumerImpl::partitionsUpdateTimer_:
void PartitionedConsumerImpl::runPartitionUpdateTask() {
partitionsUpdateTimer_->expires_from_now(partitionsUpdateInterval_);
partitionsUpdateTimer_->async_wait(
std::bind(&PartitionedConsumerImpl::getPartitionMetadata, shared_from_this()));
}
void PartitionedConsumerImpl::getPartitionMetadata() {
using namespace std::placeholders;
lookupServicePtr_->getPartitionMetadataAsync(topicName_)
.addListener(std::bind(&PartitionedConsumerImpl::handleGetPartitions, shared_from_this(), _1, _2));
}
void PartitionedConsumerImpl::handleGetPartitions(Result result,
const LookupDataResultPtr& lookupDataResult) {
Lock stateLock(mutex_);
if (state_ != Ready) {
// NOTE: when consumer is not ready, the runPartitionUpdateTask won't be scheduled
return;
}
/* do the real work... */
runPartitionUpdateTask();
}However, we still need to give a detail explanation for the stack trace that's mentioned before.