Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cxxmodules] Do not generate rdict files if we have a module. #3012

Conversation

vgvassilev
Copy link
Member

@vgvassilev vgvassilev commented Nov 20, 2018

Overview

RDICT files store some useful information (in particular about class offsets) in ROOT files to avoid the potentially expensive call to the interpreter if the information is not the PCH. For example, ROOT's libGeom and other third-party code. This is done to circumvent the costly call to ShowMembers which will require parsing.

CxxModules diminish that benefit as providing a finer grained control over the information deserialization. We see no observable benefits in generating RDICT files next to our pcm files.

Performance results

We have run several times all tests in roottest/io and roottest/meta/MakeProject folders and compared the results with and without generating RDICT (namely with ROOT master with -Druntime_cxxmodules=On with and without this PR). They are available here and here.

Methodology

We have a forwarding root.exe which essentially calls /usr/bin/time -v root.exe $@. We have processed and stored this information in csv files. We have run in:

  • (no)dict1-cold -- Tests are run immediately after restarting the physical test machine.
  • (no)dict1a-cold -- Tests are run immediately after restarting the physical test machine. Second time.
  • (no)dict2 -- Tests are run once again after running (no)dict1-cold;
  • (no)dict3 -- Tests are run once again after running (no)dict2;
  • (no)dict4 -- Tests are run once again after running (no)dict3;
  • (no)dict5 -- Tests are run once again after running (no)dict4.

The values of each column in (no)dict* are the sum of each value produced after running each test under roottest/io. The average values are computed in the columns under dict and nodict. The difference is shown in diff.

There are two known test failures wrt to modules whose values are ignored.

The machine specification is Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz dual core, 8GB RAM and a HDD, run on Ubuntu 18.04.

Results Interpretation

RDICT mostly intend to reduce the memory footprint during IO. We see no direct benefits in the memory footprint. The average of the max RSS of the sum of the max RSS of every test is higher. There is also a tendency towards faster execution.

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-centos7/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, mac1014/cxx17, windows10/default with flags -Dccache=ON
How to customize builds

@vgvassilev
Copy link
Member Author

Continuation of PR #2912.

@phsft-bot
Copy link
Collaborator

Build failed on windows10/default.
See console output.

@pcanal
Copy link
Member

pcanal commented Nov 20, 2018

The rdict file does not show performance benefits anymore and we are as good as having only a C++ module.

@vgvassilev What scenario did we test it with to come to this conclusion? What was the performance gains in the old system on those scenarios and what is the performance difference in the new one? (Memory and run-time)?

@vgvassilev
Copy link
Member Author

@yamaguchi1024 and @oshadura can you provide some numbers here?

@phsft-bot
Copy link
Collaborator

Build failed on mac1014/cxx17.
See console output.

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-centos7/noimt.
See console output.

Errors:

  • collect2: error: ld returned 1 exit status
  • error: unable to read PCH file /mnt/build/jenkins/workspace/root-pullrequests-build/build/etc/allDict.cxx.pch: 'No such file or directory'

@yamaguchi1024
Copy link
Contributor

@pcanal
Could you tell us which io test you have in mind which can benefit ROOT from having rdict?

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-centos7/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, mac1014/cxx17, windows10/default with flags -Dccache=ON
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on mac1014/cxx17.
See console output.

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-centos7/noimt.
See console output.

Errors:

  • collect2: error: ld returned 1 exit status
  • error: unable to read PCH file /mnt/build/jenkins/workspace/root-pullrequests-build/build/etc/allDict.cxx.pch: 'No such file or directory'

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-ubuntu16/rtcxxmod.
See console output.

Errors:

  • collect2: error: ld returned 1 exit status

@vgvassilev vgvassilev force-pushed the cxxmodules-do-not-generate-rdicts branch from bfa955a to 6ee456b Compare November 21, 2018 15:48
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-centos7/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, mac1014/cxx17, windows10/default with flags -Dccache=ON
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-ubuntu16/rtcxxmod.
See console output.

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-centos7/noimt.
See console output.

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-fedora29/python3.
See console output.

@phsft-bot
Copy link
Collaborator

Build failed on windows10/default.
See console output.

@phsft-bot
Copy link
Collaborator

Build failed on mac1014/cxx17.
See console output.

Warnings:

  • /build/jenkins/workspace/root-pullrequests-build/root/core/newdelete/src/NewDelete.cxx:217:4: warning: 'operator new' should not return a null pointer unless it is declared 'throw()' or 'noexcept' [-Wnew-returns-null]
  • /build/jenkins/workspace/root-pullrequests-build/root/core/newdelete/src/NewDelete.cxx:344:4: warning: 'operator new[]' should not return a null pointer unless it is declared 'throw()' or 'noexcept' [-Wnew-returns-null]

Failing tests:

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-centos7/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, mac1014/cxx17, windows10/default with flags -Dccache=ON
How to customize builds

@oshadura
Copy link
Contributor

@vgvassilev yes I have lists of tests from @pcanal, I will check numbers..

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-fedora29/python3.
See console output.

Warnings:

  • /mnt/build/workspace/root-pullrequests-build/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:309:11: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<void*, long unsigned int>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
  • /mnt/build/workspace/root-pullrequests-build/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:296:13: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<clang::CXXRecordDecl*, clang::SourceLocation>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
  • /mnt/build/workspace/root-pullrequests-build/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:296:13: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<const clang::CXXMethodDecl*, const clang::CXXMethodDecl*>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
  • /mnt/build/workspace/root-pullrequests-build/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:296:13: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<clang::CXXMethodDecl*, const clang::FunctionProtoType*>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
  • /mnt/build/workspace/root-pullrequests-build/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:309:11: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<void*, long unsigned int>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
  • /mnt/build/workspace/root-pullrequests-build/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:309:11: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<void*, long unsigned int>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
  • /mnt/build/workspace/root-pullrequests-build/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:309:11: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<void*, long unsigned int>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
  • /mnt/build/workspace/root-pullrequests-build/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:296:13: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<clang::CXXRecordDecl*, clang::SourceLocation>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
  • /mnt/build/workspace/root-pullrequests-build/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:296:13: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<const clang::CXXMethodDecl*, const clang::CXXMethodDecl*>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
  • /mnt/build/workspace/root-pullrequests-build/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:296:13: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<clang::CXXMethodDecl*, const clang::FunctionProtoType*>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]

And 65 more

Failing tests:

And 9 more

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-centos7/noimt.
See console output.

Failing tests:

vgvassilev added a commit that referenced this pull request Mar 25, 2019
The getNameAsString interface causes a lot of temporary allocations.
The analysis if a decl is a cling-style wrapper can work only on a
simple declarations on the global scope.

This patch filters out complex declarations (eg in namespaces) and
checks only the identifier content.

The patch reduces the memory footprint difference shown in #3012.
FonsRademakers pushed a commit to root-project/cling that referenced this pull request Mar 25, 2019
The getNameAsString interface causes a lot of temporary allocations.
The analysis if a decl is a cling-style wrapper can work only on a
simple declarations on the global scope.

This patch filters out complex declarations (eg in namespaces) and
checks only the identifier content.

The patch reduces the memory footprint difference shown in root-project/root#3012.
@eguiraud eguiraud added the stale label Mar 10, 2020
@vgvassilev
Copy link
Member Author

PS. In run-time, I see an average of 1.20s for module+rdict vs an average of 1.3s for module-rdict (for a 'root.exe -b -l -q' average of .2s) ; and expected this is just an 'initialization' cost when increasing the number of entries read the difference stays around .1s

@pcanal, I'd like to get back to this. Rather than not generating the rdict file we could use it only in the cases when it will improve things. What is the best way to detect if we are in MakeProject mode?

@root-project root-project deleted a comment from phsft-bot May 22, 2020
@root-project root-project deleted a comment from phsft-bot May 22, 2020
@root-project root-project deleted a comment from phsft-bot May 22, 2020
@root-project root-project deleted a comment from phsft-bot May 22, 2020
@root-project root-project deleted a comment from phsft-bot May 22, 2020
@root-project root-project deleted a comment from phsft-bot May 22, 2020
@phsft-bot
Copy link
Collaborator

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-4.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

Errors:

  • [2020-05-22T07:46:44.452Z] CMake Error at /mnt/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:937 (message):

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

Errors:

  • [2020-05-22T07:53:18.489Z] CMake Error at /home/sftnight/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:937 (message):

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-fedora29/python3.
Running on root-fedora29-2.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

Errors:

  • [2020-05-22T12:10:14.485Z] CMake Error at /mnt/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:937 (message):

@pcanal
Copy link
Member

pcanal commented Jun 25, 2020

Rather than not generating the rdict file we could use it only in the cases when it will improve things.

What are the cases where it doesn't?

What is the best way to detect if we are in MakeProject mode?

What is special about the "MakeProject mode"?

I don't think we can (really) detect it. What MakeProject does is generate a set of header files, a linkdef file and a build script. Except for the fact that there no function besides the constructor and destructor, it is distinguishable from regular user code. The struct are not even guaranteed to be POD because of inheritance. In addition, the user could take the initial generation and start developing a full library based on it. So even heuristic bases on the way MakeProject generates the code would detect that you are "really" in MakeProject mode.

@vgvassilev
Copy link
Member Author

Rather than not generating the rdict file we could use it only in the cases when it will improve things.

What are the cases where it doesn't?

See the performance data. It doesn't improve things basically for entire roottest -- on the contrary it shows 2% slowdown.

What is the best way to detect if we are in MakeProject mode?

What is special about the "MakeProject mode"?

I was hoping you to be able to tell. You crafted an example on top of roottest's MakeProject which was able to demonstrate the usefulness of rdicts. None of that so far is obvious to me.

I don't think we can (really) detect it. What MakeProject does is generate a set of header files, a linkdef file and a build script. Except for the fact that there no function besides the constructor and destructor, it is distinguishable from regular user code. The struct are not even guaranteed to be POD because of inheritance. In addition, the user could take the initial generation and start developing a full library based on it. So even heuristic bases on the way MakeProject generates the code would detect that you are "really" in MakeProject mode.

Sigh... Are you saying that the 'patched' version of MakeProject had many more headers thus read data beyond some threshold which makes rdicts significantly improve the situation?

@pcanal
Copy link
Member

pcanal commented Jun 25, 2020

I was hoping you to be able to tell. You crafted an example on top of roottest's MakeProject which was able to demonstrate the usefulness of rdicts. None of that so far is obvious to me.

I see. MakeProject was here just a short-cut to producing a large enough library size. Technically I could have send you any large experimental framework instead.

@pcanal
Copy link
Member

pcanal commented Jun 25, 2020

It doesn't improve things basically for entire roottest -- on the contrary it shows 2% slowdown.

I think that if we want to harvest this 2% (by skipping the root-pcm sometimes) we are going to have to understand better what in the "user" code (i.e. the set of headers and functionality) and what in our code makes the difference.

For example, if in the large example, the gain from the root-pcm is thanks to avoiding a large number of data-member offset calculations and large number of AST nodes loading from the module to get the information, and (still for example), if the cost in the other cases (the 2%) is due to loading the TProtoClass but not using them because the example does not do any I/O, then the conclusion would be that the "cost of doing faster I/O is the 2% slowdown and that their would be 2 options (speed-up the module case even more than now or speed-up the root-pcm case when no I/O is done). [Of course in either we would have the challenge of doing so without costing (more?) for the other user cases]

@vgvassilev
Copy link
Member Author

It seems that rdict files help when we have significant amount of headers needed for I/O. Can we heuristically determine the threshold size? I think that roottest suffers from performance degradation because the rdict files are almost empty and it does not pay off loading and using them.

@Axel-Naumann
Copy link
Member

Not everything will be covered by modules, for a long time. Thus we will have to keep the rdict file feature alive for years to come. What's the benefit of optimizing this then - is it only to accelerate roottest? (I'm trying to understand whether it's important to invest dev hours here vs in other areas, taking as main criteria, as always, relevance to our users.)

@vgvassilev
Copy link
Member Author

To clarify, we are talking about the case where a library has a module. Users do not use make project often and roottest claims to cover a good amount of their workflows.

As for the dev hours: they are already invested and we know that only under very specific conditions we need rdicts -- it seems that it depends on the amount of header files we parse. If the number is very high, as in Philippe's example, we should load the rdict -- it slows things by 2%...

@pcanal
Copy link
Member

pcanal commented Jul 15, 2020

I think it might be hard to come up with a proper heuristic. For example is the threshold "per dictionary" (in which case it might be achiveable) or is it an accumulative cost, i.e. per whole set of library (in which case it is impossible) and is it lightly or highly dependent of the fraction of the generated dicitonary entries and/or header that are actually used?

Users do not use make project

MakeProject is irrelevant here. It is "only" just to quickly produce an emulation of the "Experiment Framework" type of problem size. The relevant part is (as you noted) the amount of classes needed for I/O. One part that example does not emulate is the (high) number of libraries and dictionary. So yes this type of problem is extremely relevant and no it is actually not directly represented in roottest. (and it is unknown which fraction of the roottests are exercising this code path (i.e. not only loading but also using the rootpcms).

Also I suppose there are ways to reduce the up-front cost of the rootpcms (for example delaying their loading until needed)

@vgvassilev
Copy link
Member Author

I think it might be hard to come up with a proper heuristic. For example is the threshold "per dictionary" (in which case it might be achiveable) or is it an accumulative cost, i.e. per whole set of library (in which case it is impossible) and is it lightly or highly dependent of the fraction of the generated dicitonary entries and/or header that are actually used?

Users do not use make project

MakeProject is irrelevant here. It is "only" just to quickly produce an emulation of the "Experiment Framework" type of problem size. The relevant part is (as you noted) the amount of classes needed for I/O. One part that example does not emulate is the (high) number of libraries and dictionary. So yes this type of problem is extremely relevant and no it is actually not directly represented in roottest. (and it is unknown which fraction of the roottests are exercising this code path (i.e. not only loading but also using the rootpcms).

In principle the new implementation due to modules should allow us to easily turn off rootpcms and test in the cmssw context. Then I guess the results should be more reliable.

Also I suppose there are ways to reduce the up-front cost of the rootpcms (for example delaying their loading until needed)

That's what I am after. Do you have something in particular in mind?

@pcanal
Copy link
Member

pcanal commented Jul 16, 2020

Also I suppose there are ways to reduce the up-front cost of the rootpcms (for example delaying their loading until needed)
That's what I am after. Do you have something in particular in mind

Currently, the rootpcms file is loaded when the library is loaded. Instead it could be open whenever there is a call to TClassTable::GetProtoClass for one of its content. This would require:

(a) have a (persistent) data structure that list of the available content of the rootpcms.
(b) extend the TClassTable data structure to record that information
(c) in TClassTable::GetProtoClass if the TProtoClass is no there but there is rootpcms listed, then open the rootpcms and load.

When opening the rootpcms one should load all the TProtoClasses since opening the file takes time.

@amadio amadio removed their request for review November 6, 2020 12:31
@vgvassilev
Copy link
Member Author

This pull request has a lot of useful information. The current infrastructure is flexible and can disable rootpcms if required for further performance studies. Let's close this now and come back to it if necessary.

@vgvassilev vgvassilev closed this May 31, 2021
@vgvassilev vgvassilev deleted the cxxmodules-do-not-generate-rdicts branch May 31, 2021 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants