-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cxxmodules] Do not generate rdict files if we have a module. #3012
[cxxmodules] Do not generate rdict files if we have a module. #3012
Conversation
Starting build on |
Continuation of PR #2912. |
Build failed on windows10/default. |
@vgvassilev What scenario did we test it with to come to this conclusion? What was the performance gains in the old system on those scenarios and what is the performance difference in the new one? (Memory and run-time)? |
@yamaguchi1024 and @oshadura can you provide some numbers here? |
Build failed on mac1014/cxx17. |
Build failed on ROOT-centos7/noimt. Errors:
|
@pcanal |
4f105ce
to
bfa955a
Compare
Starting build on |
Build failed on mac1014/cxx17. |
Build failed on ROOT-centos7/noimt. Errors:
|
Build failed on ROOT-ubuntu16/rtcxxmod. Errors:
|
bfa955a
to
6ee456b
Compare
Starting build on |
Build failed on ROOT-ubuntu16/rtcxxmod. |
Build failed on ROOT-centos7/noimt. |
Build failed on ROOT-fedora29/python3. |
Build failed on windows10/default. |
6ee456b
to
776c214
Compare
Starting build on |
Build failed on ROOT-ubuntu16/rtcxxmod. Failing tests: |
@vgvassilev yes I have lists of tests from @pcanal, I will check numbers.. |
Build failed on ROOT-fedora29/python3. Warnings:
And 65 more Failing tests:
And 9 more |
Build failed on ROOT-centos7/noimt. Failing tests: |
The getNameAsString interface causes a lot of temporary allocations. The analysis if a decl is a cling-style wrapper can work only on a simple declarations on the global scope. This patch filters out complex declarations (eg in namespaces) and checks only the identifier content. The patch reduces the memory footprint difference shown in #3012.
The getNameAsString interface causes a lot of temporary allocations. The analysis if a decl is a cling-style wrapper can work only on a simple declarations on the global scope. This patch filters out complex declarations (eg in namespaces) and checks only the identifier content. The patch reduces the memory footprint difference shown in root-project/root#3012.
@pcanal, I'd like to get back to this. Rather than not generating the rdict file we could use it only in the cases when it will improve things. What is the best way to detect if we are in |
Build failed on ROOT-ubuntu16/nortcxxmod. Errors:
|
Build failed on ROOT-debian10-i386/cxx14. Errors:
|
Build failed on ROOT-fedora29/python3. Errors:
|
What are the cases where it doesn't?
What is special about the "MakeProject mode"? I don't think we can (really) detect it. What MakeProject does is generate a set of header files, a linkdef file and a build script. Except for the fact that there no function besides the constructor and destructor, it is distinguishable from regular user code. The struct are not even guaranteed to be POD because of inheritance. In addition, the user could take the initial generation and start developing a full library based on it. So even heuristic bases on the way MakeProject generates the code would detect that you are "really" in MakeProject mode. |
See the performance data. It doesn't improve things basically for entire roottest -- on the contrary it shows 2% slowdown.
I was hoping you to be able to tell. You crafted an example on top of roottest's MakeProject which was able to demonstrate the usefulness of rdicts. None of that so far is obvious to me.
Sigh... Are you saying that the 'patched' version of MakeProject had many more headers thus read data beyond some threshold which makes rdicts significantly improve the situation? |
I see. MakeProject was here just a short-cut to producing a large enough library size. Technically I could have send you any large experimental framework instead. |
I think that if we want to harvest this 2% (by skipping the root-pcm sometimes) we are going to have to understand better what in the "user" code (i.e. the set of headers and functionality) and what in our code makes the difference. For example, if in the large example, the gain from the root-pcm is thanks to avoiding a large number of data-member offset calculations and large number of AST nodes loading from the module to get the information, and (still for example), if the cost in the other cases (the 2%) is due to loading the TProtoClass but not using them because the example does not do any I/O, then the conclusion would be that the "cost of doing faster I/O is the 2% slowdown and that their would be 2 options (speed-up the module case even more than now or speed-up the root-pcm case when no I/O is done). [Of course in either we would have the challenge of doing so without costing (more?) for the other user cases] |
It seems that rdict files help when we have significant amount of headers needed for I/O. Can we heuristically determine the threshold size? I think that roottest suffers from performance degradation because the rdict files are almost empty and it does not pay off loading and using them. |
Not everything will be covered by modules, for a long time. Thus we will have to keep the rdict file feature alive for years to come. What's the benefit of optimizing this then - is it only to accelerate roottest? (I'm trying to understand whether it's important to invest dev hours here vs in other areas, taking as main criteria, as always, relevance to our users.) |
To clarify, we are talking about the case where a library has a module. Users do not use make project often and roottest claims to cover a good amount of their workflows. As for the dev hours: they are already invested and we know that only under very specific conditions we need rdicts -- it seems that it depends on the amount of header files we parse. If the number is very high, as in Philippe's example, we should load the rdict -- it slows things by 2%... |
I think it might be hard to come up with a proper heuristic. For example is the threshold "per dictionary" (in which case it might be achiveable) or is it an accumulative cost, i.e. per whole set of library (in which case it is impossible) and is it lightly or highly dependent of the fraction of the generated dicitonary entries and/or header that are actually used?
MakeProject is irrelevant here. It is "only" just to quickly produce an emulation of the "Experiment Framework" type of problem size. The relevant part is (as you noted) the amount of classes needed for I/O. One part that example does not emulate is the (high) number of libraries and dictionary. So yes this type of problem is extremely relevant and no it is actually not directly represented in roottest. (and it is unknown which fraction of the roottests are exercising this code path (i.e. not only loading but also using the rootpcms). Also I suppose there are ways to reduce the up-front cost of the rootpcms (for example delaying their loading until needed) |
In principle the new implementation due to modules should allow us to easily turn off rootpcms and test in the cmssw context. Then I guess the results should be more reliable.
That's what I am after. Do you have something in particular in mind? |
Currently, the rootpcms file is loaded when the library is loaded. Instead it could be open whenever there is a call to TClassTable::GetProtoClass for one of its content. This would require: (a) have a (persistent) data structure that list of the available content of the rootpcms. When opening the rootpcms one should load all the TProtoClasses since opening the file takes time. |
This pull request has a lot of useful information. The current infrastructure is flexible and can disable rootpcms if required for further performance studies. Let's close this now and come back to it if necessary. |
Overview
RDICT files store some useful information (in particular about class offsets) in ROOT files to avoid the potentially expensive call to the interpreter if the information is not the PCH. For example, ROOT's
libGeom
and other third-party code. This is done to circumvent the costly call toShowMembers
which will require parsing.CxxModules diminish that benefit as providing a finer grained control over the information deserialization. We see no observable benefits in generating RDICT files next to our pcm files.
Performance results
We have run several times all tests in
roottest/io
androottest/meta/MakeProject
folders and compared the results with and without generating RDICT (namely with ROOT master with-Druntime_cxxmodules=On
with and without this PR). They are available here and here.Methodology
We have a forwarding
root.exe
which essentially calls/usr/bin/time -v root.exe $@
. We have processed and stored this information in csv files. We have run in:The values of each column in (no)dict* are the sum of each value produced after running each test under roottest/io. The average values are computed in the columns under
dict
andnodict
. The difference is shown indiff
.There are two known test failures wrt to modules whose values are ignored.
The machine specification is
Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
dual core, 8GB RAM and a HDD, run on Ubuntu 18.04.Results Interpretation
RDICT mostly intend to reduce the memory footprint during IO. We see no direct benefits in the memory footprint. The average of the max RSS of the sum of the max RSS of every test is higher. There is also a tendency towards faster execution.