Skip to content

Commit

Permalink
Implement TReentrantRWLock::Reset/Restore
Browse files Browse the repository at this point in the history
  • Loading branch information
pcanal committed Nov 6, 2017
1 parent 5db13c6 commit 4deb81d
Showing 1 changed file with 77 additions and 5 deletions.
82 changes: 77 additions & 5 deletions core/thread/src/TReentrantRWLock.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -184,24 +184,96 @@ void TReentrantRWLock<MutexT, RecurseCountsT>::WriteUnLock()
namespace {
template <typename MutexT, typename RecurseCountsT>
struct TReentrantRWLockState: public TVirtualMutex::State {
RecurseCountsT fRecurseCounts;
int fReadersCount = 0;
size_t fWriteRecurse = 0;
bool fIsWriter = false;
};
}

template <typename MutexT, typename RecurseCountsT>
std::unique_ptr<TVirtualMutex::State> TReentrantRWLock<MutexT, RecurseCountsT>::Reset()
{
std::unique_ptr<TReentrantRWLockState<MutexT, RecurseCountsT>> pState;
using State_t = TReentrantRWLockState<MutexT, RecurseCountsT>;

std::unique_ptr<State_t> pState(new State_t);
auto local = fRecurseCounts.GetLocal();

size_t readerCount;
{
std::unique_lock<MutexT> lock(fMutex);
readerCount = fRecurseCounts.GetLocalReadersCount(local);
}

pState->fReadersCount = readerCount;

if (fWriter && !fRecurseCounts.IsNotCurrentWriter(local)) {

// We are holding the write lock.
pState->fIsWriter = true;
pState->fWriteRecurse = fRecurseCounts.fWriteRecurse;

// Now set the lock (and potential read locks) for immediate release.
fReaders -= readerCount;
fRecurseCounts.fWriteRecurse = 1;
// insertion in the reader count can only happen during a ReadLock
// which can not execute until we release the write lock, so no
// need to take the local mutex here.
fRecurseCounts.ResetReadCount(local, 0);

// Release this thread's write lock
WriteUnLock();
} else if (readerCount) {
// Now set the lock for release.
{
std::unique_lock<MutexT> lock(fMutex);
fReaders -= (readerCount-1);
fRecurseCounts.ResetReadCount(local, 1);
}

// Release this thread's reader lock(s)
ReadUnLock();
}

// Do something.
::Fatal("Reset()", "Not implemented, contact pcanal@fnal.gov");
//::Fatal("Reset()", "Not implemented, contact pcanal@fnal.gov");
return std::move(pState);
}

template <typename MutexT, typename RecurseCountsT>
void TReentrantRWLock<MutexT, RecurseCountsT>::Restore(std::unique_ptr<TVirtualMutex::State> &&)
void TReentrantRWLock<MutexT, RecurseCountsT>::Restore(std::unique_ptr<TVirtualMutex::State> &&state)
{
TReentrantRWLockState<MutexT, RecurseCountsT> *pState = dynamic_cast<TReentrantRWLockState<MutexT, RecurseCountsT> *>(state.get());
if (!pState) {
if (state) {
SysError("Restore", "LOGIC ERROR - invalid state object!");
return;
}
// No state, do nothing.
return;
}

if (pState->fIsWriter) {
WriteLock();
// Now that we go the lock, fix up the recursion count.
std::unique_lock<MutexT> lock(fMutex);
fRecurseCounts.fWriteRecurse = pState->fWriteRecurse;
auto local = fRecurseCounts.GetLocal();
fRecurseCounts.ResetReadCount(local, pState->fReadersCount);
fReaders += pState->fReadersCount;
} else {
ReadLock();
// Now that we go the read lock, fix up the local recursion count.
assert( pState->fReadersCount >= 1 );

This comment has been minimized.

Copy link
@linev

linev Jan 4, 2018

Member

@pcanal

Are you sure about this condition?

I have a code which fails at this place when I invoking gROOT->ProcessLine().
It is qt5-based application and this call happens in TTimer handler - means when I call gSystem->ProcessEvents().
If I simply comment out this line - everything works for me.
May be I need to change some global configuration to make some global mute reentrant?

This comment has been minimized.

Copy link
@pcanal

pcanal Jan 8, 2018

Author Member

I think you are right. Moreover, this assert should actually by an if clause added to the else statement above. I will provide a fix shortly.

This comment has been minimized.

Copy link
@pcanal

pcanal Jan 10, 2018

Author Member

Hi Sergey,

Eventhough the code is obviously wrong, I am also failing to find (by reading the code) how we can get there with pState->fIsWriter false. Can you give me a reproducer or at least a completely stack trace where the assert is triggered?

Thanks,
Philippe.

This comment has been minimized.

Copy link
@linev

linev Jan 10, 2018

Member

Problem, that it is large framework with highly experimental code.
As I mentioned, it happens for me when i call ProcessLine in TTimer handler.
I will try to write standalone reproducer. My stack looks like:

#0 0x7f385f26b746
#1 0x7f385de9988a
#2 0x7f385f26babc
#3 0x7f385a77a270
#4 0x7f38598560d0 __GI_raise
#5 0x7f38598576b1 __GI_abort
#6 0x7f385984e6fa __assert_fail_base
#7 0x7f385984e772 __GI___assert_fail
#8 0x7f38670cc943 ROOT::TReentrantRWLock<>::Restore()
#9 0x7f38670c5d58 ROOT::TRWMutexImp<>::Restore()
#10 0x7f38380ac60c TCling__RestoreInterpreterMutex
#11 0x7f383819b34c TClingCallbacks::ReturnedFromUserCode()
#12 0x7f38381d356b cling::MultiplexInterpreterCallbacks::ReturnedFromUserCode()
#13 0x7f38381d7b7f cling::Interpreter::RunFunction()
#14 0x7f38381d7f5d cling::Interpreter::EvaluateInternal()
#15 0x7f38381d80eb cling::Interpreter::process()
#16 0x7f383828447d cling::MetaProcessor::process()
#17 0x7f38380b421e HandleInterpreterException()
#18 0x7f38380b4de9 TCling::ProcessLine()
#19 0x7f386af12391 TApplication::ProcessLine()
#20 0x7f386ae7ccb3 TROOT::ProcessLine()
#21 0x7f3864afd2da TWebCanvas::ProcessData()
#22 0x7f3864afc4a3 ZZN10TWebCanvas15CreateWebWindowEiENKUljRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEE_clEjS7
#23 0x7f3864afe00f ZNSt17_Function_handlerIFvjRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEZN10TWebCanvas15CreateWebWindowEiEUljS7_E_E9_M_invokeERKSt9_Any_dataOjS7
#24 0x7f3856c903ae std::function<>::operator()()
#25 0x7f3856c8e32b ROOT::Experimental::TWebWindow::ProcessWS()
#26 0x7f3856c8fa94 ROOT::Experimental::TWebWindowWSHandler::ProcessWS()
#27 0x7f386486d8d4 THttpWSHandler::HandleWS()
#28 0x7f386486a868 THttpServer::ProcessRequest()
#29 0x7f38648690e0 THttpServer::ProcessRequests()
#30 0x7f386486c6d4 THttpTimer::Timeout()
#31 0x7f386af96ae8 TTimer::Notify()
#32 0x7f386af96ab4 TTimer::CheckTimer()
#33 0x7f386b083539 TUnixSystem::DispatchTimers()
#34 0x7f386b07ea29 TUnixSystem::DispatchOneEvent()
#35 0x7f386af8327a TSystem::ProcessEvents()

This comment has been minimized.

Copy link
@linev

linev Jan 10, 2018

Member

I tried with simple macro - of course, it is working :(

This comment has been minimized.

Copy link
@linev

linev Jan 11, 2018

Member

I finally found reason - we are disable interpreter locking completely in our program - with the call:

gInterpreter->SetProcessLineLock(kFALSE);

But why it should complain about it - with assert?
And I also found reproducer code:

#include "TH1.h"
#include "TThread.h"
#include "TInterpreter.h"

#include <stdio.h>
#include <locale.h>

int main(int argc, char **argv) 
{
   printf("[%ld] Running\n", TThread::SelfId());
    
   gInterpreter->SetProcessLineLock(kFALSE);

   TH1I* h1 = new TH1I("histo1","histo title", 100, -10., 10.);
   h1->SetDirectory(0);
   h1->FillRandom("gaus",10000);

   printf("[%ld] Done\n", TThread::SelfId());

   return 0;
}

To build it, use simple makefile:

FLAGS=$(shell root-config --cflags)
LIBS=$(shell root-config --libs)
app: app.cxx
	g++ app.cxx -o app $(FLAGS) $(LIBS) 

Critical point is first printf - without it code works.
If thread id shown, it crashed inside TH1::FillRandom

This comment has been minimized.

Copy link
@linev

linev Jan 11, 2018

Member

Here I attach reproducer program:

app.tar.gz

This comment has been minimized.

Copy link
@pcanal

pcanal Jan 11, 2018

Author Member

But why it should complain about it - with assert?

As mentioned earlier, the code around assert is wrong and will be fixed.

I finally found reason - we are disable interpreter locking completely in our program - with the call:

Okay, that is what I was missing. and now the behavior makes sense (i.e. is self consistent) and I can fix it properly.

That said and on a different note. Why did you decide to disable interpreter locking? Do you have an external way of making sure the interpreter is not tripping over itself?

[One possible guess is that you disabled it because until recently having it enabled lead to dead locks].

Cheers,
Philippe.

This comment has been minimized.

Copy link
@linev

linev Jan 11, 2018

Member

Why did you decide to disable interpreter locking?
Do you have an external way of making sure the interpreter is not tripping over itself?

It is very old code - from times of ROOT v3. We disable locking, while otherwise we had deadlocks when doing objects streaming in multithreaded environment. But seems to be, now it is no longer necessary.

This comment has been minimized.

Copy link
@pcanal

pcanal Feb 12, 2018

Author Member

Indeed, it was necessary and is not longer needed thanks this this reset/restore update :)

auto readerCount = pState->fReadersCount;

std::unique_lock<MutexT> lock(fMutex);
auto local = fRecurseCounts.GetLocal();
fRecurseCounts.ResetReadCount(local, readerCount);
fReaders += readerCount - 1;
}

// Do something.
::Fatal("Restore()", "Not implemented, contact pcanal@fnal.gov");
//::Fatal("Restore()", "Not implemented, contact pcanal@fnal.gov");
}

namespace ROOT {
Expand Down

0 comments on commit 4deb81d

Please sign in to comment.