-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Improve parallelism in TTree::Fill #321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Now, we create TBB tasks for compression whenever TTree::Fill is called and a basket must be compressed.
| reusebasket = basket; | ||
| reusebasket->Reset(); | ||
| reusebasket = basket; | ||
| reusebasket->Reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could remove tab characters if any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No tab characters here; that's just how github renders the added spaces, unfortunately.
| if (nout>0) { | ||
| // The Basket was written so we can now safely reuse it. | ||
| fBaskets[where] = 0; | ||
| auto do_updates = [=]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that the 'copy' capture ('=') is confusing as it sorta implies that the lamba does not modify the outer content (which obviously is not the case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will add a comment pointing this out and providing the motivation; easier to do that than list all the things we will copy explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using 'reference' capture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lambda is added to the task_group and WriteBasket returns. Hence, the reference becomes invalid. You must actually copy the pointer (as the pointed-to object's lifetime is guaranteed to be longer than the lambda), but not reference to the pointer itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. A comment will have to do :)
tree/tree/src/TBranch.cxx
Outdated
| fBaskets[where] = 0; | ||
| auto do_updates = [=]() { | ||
| Int_t nout = basket->WriteBuffer(); // Write buffer | ||
| if (nout < 0) {Error("TBranch::WriteBasketImpl", "basket's WriteBuffer failed.\n");} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coding convention (here and other places):
if (nout < 0) {
Error("TBranch::WriteBasketImpl", "basket's WriteBuffer failed.\n");
}
or
if (nout < 0) Error("TBranch::WriteBasketImpl", "basket's WriteBuffer failed.\n");
tree/tree/src/TTree.cxx
Outdated
| fBranchRef->Clear(); | ||
| } | ||
|
|
||
| std::unique_ptr<TBranchIMTHelper> imt_helper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a unique_ptr rather than an object on the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid any overheads when IMT is disabled - or the R__USE_IMT is not defined - the FillImpl function expects a nullptr.
Hence, this avoids doing this repeatedly:
branch->FillImpl(doImt ? &imt_helper : nullptr);
Hm - I suppose we could use the uglier syntax and avoid a heap allocation though. What's your preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
humm ... this made me take another look at the code and ... I am getting concerned on the sheer amount of #ifdef IMT that we are ending with ... [For this particular case, we could also have an empty TBranchIMTHelper when IMT is disable and add a bool on/off to test against (rather than nullptr).
tree/tree/inc/TBranch.h
Outdated
| #endif | ||
| } | ||
|
|
||
| void wait() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use camel case for function name if you can.
tree/tree/inc/TBranch.h
Outdated
| private: | ||
| std::atomic<Long64_t> fBytes{0}; // Total number of bytes written by this helper. | ||
| std::atomic<Int_t> fNerrors{0}; // Total error count of all tasks done by this helper. | ||
| std::unique_ptr<tbb::task_group> fGroup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should lead to a compilation failure when IMT is disable, isn't it?
| Int_t nb = lb->GetEntriesFast(); | ||
|
|
||
| #ifdef R__USE_IMT | ||
| if (ROOT::IsImplicitMTEnabled() && fIMTEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove the if (ROOT::IsImplicitMTEnabled()) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One should imply the other - no way to set fIMTEnabled if IsImplicitMTEnabled is false - right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I remembered wrong and Enric corrected me:
The issue here is that, in order to decide whether to create tasks in
TTree::GetEntry, Brian only checks the flag that is local to the tree
(fIMTEnabled). Before his change, there was an extra check of the global
flag (ROOT::IsImplicitMTEnabled). This hierarchy of checks, which we
discussed in the parallelisation meetings and is tested by
ttree_read_imt, allows to disable the creation of parallel tasks only by
setting the global flag to false.
|
@pcanal - I believe all the review comments have been addressed. Please take another look. |
|
Oh, I should note. With this patch, I see the following: This is a 4-core host; we see a 2.4x speedup. No, it's not perfect - but it probably reflects the fact we can only parallelize the compression, not the serialization. |
|
@pcanal any futher items for this? |
|
Looks like this was actually merged into the master on Jan 10th. It is not in the 6.08 branch. |
|
This has not yet merge. Working on it. |
|
This (+ a bug fix) has been merged in the master. |
Now, we create TBB tasks for compression whenever
TTree::Fillis called and a basket must be compressed. In CMS, we saw significant speedup on KNL and high-core-count Xeons by doing this over the existing basic write IMT (likely because we have some branches that are flushed to disk much more frequently than targeted by the auto-flush routines).