Improve parallelism in TTree::Fill #321

bbockelm · 2017-01-10T22:05:24Z

Now, we create TBB tasks for compression whenever TTree::Fill is called and a basket must be compressed. In CMS, we saw significant speedup on KNL and high-core-count Xeons by doing this over the existing basic write IMT (likely because we have some branches that are flushed to disk much more frequently than targeted by the auto-flush routines).

Now, we create TBB tasks for compression whenever TTree::Fill is called and a basket must be compressed.

pcanal · 2017-01-10T22:23:31Z

tree/tree/src/TBranch.cxx

-      reusebasket = basket;
-      reusebasket->Reset();
+         reusebasket = basket;
+         reusebasket->Reset();


Could remove tab characters if any?

No tab characters here; that's just how github renders the added spaces, unfortunately.

pcanal · 2017-01-10T22:25:53Z

tree/tree/src/TBranch.cxx

-   if (nout>0) {
-      // The Basket was written so we can now safely reuse it.
-      fBaskets[where] = 0;
+   auto do_updates = [=]() {


I have a feeling that the 'copy' capture ('=') is confusing as it sorta implies that the lamba does not modify the outer content (which obviously is not the case).

Ok. I will add a comment pointing this out and providing the motivation; easier to do that than list all the things we will copy explicitly.

What about using 'reference' capture?

This lambda is added to the task_group and WriteBasket returns. Hence, the reference becomes invalid. You must actually copy the pointer (as the pointed-to object's lifetime is guaranteed to be longer than the lambda), but not reference to the pointer itself.

Fair enough. A comment will have to do :)

pcanal · 2017-01-10T22:26:51Z

tree/tree/src/TBranch.cxx

-      fBaskets[where] = 0;
+   auto do_updates = [=]() {
+      Int_t nout  = basket->WriteBuffer();    //  Write buffer
+      if (nout < 0) {Error("TBranch::WriteBasketImpl", "basket's WriteBuffer failed.\n");}


coding convention (here and other places):

if (nout < 0) { Error("TBranch::WriteBasketImpl", "basket's WriteBuffer failed.\n"); }

or

if (nout < 0) Error("TBranch::WriteBasketImpl", "basket's WriteBuffer failed.\n");

pcanal · 2017-01-10T22:28:21Z

tree/tree/src/TTree.cxx

      fBranchRef->Clear();
   }
+
+   std::unique_ptr<TBranchIMTHelper> imt_helper;


why a unique_ptr rather than an object on the stack?

To avoid any overheads when IMT is disabled - or the R__USE_IMT is not defined - the FillImpl function expects a nullptr.

Hence, this avoids doing this repeatedly:

branch->FillImpl(doImt ? &imt_helper : nullptr);

Hm - I suppose we could use the uglier syntax and avoid a heap allocation though. What's your preference?

humm ... this made me take another look at the code and ... I am getting concerned on the sheer amount of #ifdef IMT that we are ending with ... [For this particular case, we could also have an empty TBranchIMTHelper when IMT is disable and add a bool on/off to test against (rather than nullptr).

pcanal · 2017-01-10T22:28:56Z

tree/tree/inc/TBranch.h

+#endif
+    }
+
+    void wait() {


Use camel case for function name if you can.

pcanal · 2017-01-13T22:12:34Z

tree/tree/inc/TBranch.h

+private:
+    std::atomic<Long64_t> fBytes{0};   // Total number of bytes written by this helper.
+    std::atomic<Int_t>    fNerrors{0}; // Total error count of all tasks done by this helper.
+    std::unique_ptr<tbb::task_group> fGroup;


This should lead to a compilation failure when IMT is disable, isn't it?

pcanal · 2017-01-13T22:27:24Z

tree/tree/src/TTree.cxx

   Int_t nb = lb->GetEntriesFast();

 #ifdef R__USE_IMT
-   if (ROOT::IsImplicitMTEnabled() && fIMTEnabled) {


why remove the if (ROOT::IsImplicitMTEnabled()) ?

One should imply the other - no way to set fIMTEnabled if IsImplicitMTEnabled is false - right?

Well I remembered wrong and Enric corrected me:

The issue here is that, in order to decide whether to create tasks in
TTree::GetEntry, Brian only checks the flag that is local to the tree
(fIMTEnabled). Before his change, there was an extra check of the global
flag (ROOT::IsImplicitMTEnabled). This hierarchy of checks, which we
discussed in the parallelisation meetings and is tested by
ttree_read_imt, allows to disable the creation of parallel tasks only by
setting the global flag to false.

bbockelm · 2017-01-14T18:51:41Z

@pcanal - I believe all the review comments have been addressed. Please take another look.

bbockelm · 2017-01-14T19:26:25Z

Oh, I should note. With this patch, I see the following:

$ ./eventexe 1000 9 99 1 600 1
   ...
1000 events and 101713298 bytes processed.
RealTime=16.591808 seconds, CpuTime=39.970000 seconds
compression level=9, split=99, arg4=1, IMT=1
You write 6.130332 Mbytes/Realtime seconds
You write 2.544741 Mbytes/Cputime seconds

This is a 4-core host; we see a 2.4x speedup. No, it's not perfect - but it probably reflects the fact we can only parallelize the compression, not the serialization.

Dr15Jones · 2017-02-06T17:34:01Z

@pcanal any futher items for this?

Dr15Jones · 2017-02-06T18:37:54Z

Looks like this was actually merged into the master on Jan 10th. It is not in the 6.08 branch.

pcanal · 2017-02-06T19:18:36Z

This has not yet merge. Working on it.

pcanal · 2017-02-06T22:12:04Z

This (+ a bug fix) has been merged in the master.

Improve parallelism in TTree::Fill

99e1588

Now, we create TBB tasks for compression whenever TTree::Fill is called and a basket must be compressed.

pcanal reviewed Jan 10, 2017

View reviewed changes

tree/tree/inc/TBranch.h Outdated

#endif

}

void wait() {

Copy link

Member

pcanal Jan 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use camel case for function name if you can.

pcanal self-assigned this Jan 12, 2017

pcanal reviewed Jan 13, 2017

View reviewed changes

bbockelm added 2 commits January 14, 2017 12:43

Code convention and comments for IMT flush.

9d83333

Replace heap allocation with a stack object.

fcf1f0a

pcanal closed this Feb 6, 2017

phsft-bot mentioned this pull request Feb 23, 2018

[cxxmodules] Preload tmva tree player graf #1365

Closed

Improve parallelism in TTree::Fill #321

Improve parallelism in TTree::Fill #321

Uh oh!

Conversation

bbockelm commented Jan 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcanal Jan 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbockelm commented Jan 14, 2017

Uh oh!

bbockelm commented Jan 14, 2017

Uh oh!

Dr15Jones commented Feb 6, 2017

Uh oh!

Dr15Jones commented Feb 6, 2017

Uh oh!

pcanal commented Feb 6, 2017

Uh oh!

pcanal commented Feb 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pcanal Jan 10, 2017 •

edited

Loading

pcanal commented Feb 6, 2017 •

edited

Loading