Thread pool synchronizing problem? [JIRA: RIAK-1912] #141
Description
in, submit(eleveldb_thread_pool::submit)
:
else if (!FindWaitingThread(item))
{
// no waiting threads, put on backlog queue
lock();
eleveldb::inc_and_fetch(&work_queue_atomic);
work_queue.push_back(item);
unlock();
// to address race condition, thread might be waiting now
FindWaitingThread(NULL);
perf()->Inc(leveldb::ePerfElevelQueued);
ret_flag=true;
} // if
It first check if there's an waiting thread to submit directly(and pthread_cond_broadcast in FindWaitingThread
), or it will add item
to work_queue.
On the other hand, in eleveldb_write_thread_worker(threading.cc), when a working thread cannot find a work item, it will wait on its condition variable.
pthread_mutex_lock(&tdata.m_Mutex);
tdata.m_DirectWork=NULL; // safety
// only wait if we are really sure no work pending
if (0==h.work_queue_atomic)
{
// yes, thread going to wait. set available now.
tdata.m_Available=1;
pthread_cond_wait(&tdata.m_Condition, &tdata.m_Mutex);
} // if
tdata.m_Available=0; // safety
submission=(eleveldb::WorkTask *)tdata.m_DirectWork; // NULL is valid
tdata.m_DirectWork=NULL;// safety
pthread_mutex_unlock(&tdata.m_Mutex);
after checking the working queue is empty(work_queue_atomic==0), it sets its available to 1.
The problem is, if an working thread has checked that the working queue is empty , and before it's setting available, submit
maybe cannot find a waiting thread, add an item to queue. At this moment, the working thread starts waiting, then this working thread will wait forever.(The second FindWaitingThread(NULL)
doesn't help, it also maybe executed in this period). In fact, if the working thread has setting its available, and before waiting on its condition, and submit
set its direct_work
, broadcast the condition variable, and the working thread wait, it will wait forever too.