Skip to content

Conversation

@bbockelm
Copy link
Contributor

Now, we create TBB tasks for compression whenever TTree::Fill is called and a basket must be compressed. In CMS, we saw significant speedup on KNL and high-core-count Xeons by doing this over the existing basic write IMT (likely because we have some branches that are flushed to disk much more frequently than targeted by the auto-flush routines).

Now, we create TBB tasks for compression whenever TTree::Fill
is called and a basket must be compressed.
reusebasket = basket;
reusebasket->Reset();
reusebasket = basket;
reusebasket->Reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could remove tab characters if any?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tab characters here; that's just how github renders the added spaces, unfortunately.

if (nout>0) {
// The Basket was written so we can now safely reuse it.
fBaskets[where] = 0;
auto do_updates = [=]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that the 'copy' capture ('=') is confusing as it sorta implies that the lamba does not modify the outer content (which obviously is not the case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will add a comment pointing this out and providing the motivation; easier to do that than list all the things we will copy explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using 'reference' capture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lambda is added to the task_group and WriteBasket returns. Hence, the reference becomes invalid. You must actually copy the pointer (as the pointed-to object's lifetime is guaranteed to be longer than the lambda), but not reference to the pointer itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. A comment will have to do :)

fBaskets[where] = 0;
auto do_updates = [=]() {
Int_t nout = basket->WriteBuffer(); // Write buffer
if (nout < 0) {Error("TBranch::WriteBasketImpl", "basket's WriteBuffer failed.\n");}
Copy link
Member

@pcanal pcanal Jan 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding convention (here and other places):

if (nout < 0) {
   Error("TBranch::WriteBasketImpl", "basket's WriteBuffer failed.\n");
}

or

if (nout < 0) Error("TBranch::WriteBasketImpl", "basket's WriteBuffer failed.\n");

fBranchRef->Clear();
}

std::unique_ptr<TBranchIMTHelper> imt_helper;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a unique_ptr rather than an object on the stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid any overheads when IMT is disabled - or the R__USE_IMT is not defined - the FillImpl function expects a nullptr.

Hence, this avoids doing this repeatedly:

branch->FillImpl(doImt ? &imt_helper : nullptr);

Hm - I suppose we could use the uglier syntax and avoid a heap allocation though. What's your preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

humm ... this made me take another look at the code and ... I am getting concerned on the sheer amount of #ifdef IMT that we are ending with ... [For this particular case, we could also have an empty TBranchIMTHelper when IMT is disable and add a bool on/off to test against (rather than nullptr).

#endif
}

void wait() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use camel case for function name if you can.

@pcanal pcanal self-assigned this Jan 12, 2017
private:
std::atomic<Long64_t> fBytes{0}; // Total number of bytes written by this helper.
std::atomic<Int_t> fNerrors{0}; // Total error count of all tasks done by this helper.
std::unique_ptr<tbb::task_group> fGroup;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should lead to a compilation failure when IMT is disable, isn't it?

Int_t nb = lb->GetEntriesFast();

#ifdef R__USE_IMT
if (ROOT::IsImplicitMTEnabled() && fIMTEnabled) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the if (ROOT::IsImplicitMTEnabled()) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One should imply the other - no way to set fIMTEnabled if IsImplicitMTEnabled is false - right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I remembered wrong and Enric corrected me:

The issue here is that, in order to decide whether to create tasks in
TTree::GetEntry, Brian only checks the flag that is local to the tree
(fIMTEnabled). Before his change, there was an extra check of the global
flag (ROOT::IsImplicitMTEnabled). This hierarchy of checks, which we
discussed in the parallelisation meetings and is tested by
ttree_read_imt, allows to disable the creation of parallel tasks only by
setting the global flag to false.

@bbockelm
Copy link
Contributor Author

@pcanal - I believe all the review comments have been addressed. Please take another look.

@bbockelm
Copy link
Contributor Author

Oh, I should note. With this patch, I see the following:

$ ./eventexe 1000 9 99 1 600 1
   ...
1000 events and 101713298 bytes processed.
RealTime=16.591808 seconds, CpuTime=39.970000 seconds
compression level=9, split=99, arg4=1, IMT=1
You write 6.130332 Mbytes/Realtime seconds
You write 2.544741 Mbytes/Cputime seconds

This is a 4-core host; we see a 2.4x speedup. No, it's not perfect - but it probably reflects the fact we can only parallelize the compression, not the serialization.

@Dr15Jones
Copy link
Collaborator

@pcanal any futher items for this?

@Dr15Jones
Copy link
Collaborator

Looks like this was actually merged into the master on Jan 10th. It is not in the 6.08 branch.

@pcanal
Copy link
Member

pcanal commented Feb 6, 2017

This has not yet merge. Working on it.

@pcanal
Copy link
Member

pcanal commented Feb 6, 2017

This (+ a bug fix) has been merged in the master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants