-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Greatly improve performance #1002
Conversation
filecontents as checksum. This avoids expensive tokenizing and computing the checksum over the content is really fast
(http://insanecoding.blogspot.de/2011/11/how-to-read-in-file-in-c.html) to support older compilers
What are the numbers for the performance improvement? |
Running in main
checkDiffOnly
There might be a bug in how |
I investigated the behaviour and created PR #1005 |
You can use the linux kernel source to test the performance of cppcheck. linux 4.14.1 has 61258 files and 23050598 lines of code. |
Do you think it would make any difference? Feel free to check the performance against any codebase you want. Do you have values to compare against? |
Somehow you did not run old cppcheck correctly. We have incremental analysis already so the second run should be much faster than the first run. |
numbers I get for "main": first run: 3m29,520s |
You are right, when scanning the
I get
Running the same on branch checkDiffOnly:
So obviously for the performance improvement (not the incremental issue) a bigger codebase is needed. Running it against linux kernel showed weird behaviour (the checking was done but the process never returned, bug?) so i ran it against the LLVM source (2309 files) which resulted in (measured 3x, wiped the caches in between) main
checkDiffOnly
To check whether the penalty for the first run (reading file twice) is gone when not using a builddir i ran the same command without
For the incremental issue, i will investigate further, i suspect |
After some more investigation it is clear that the call to If i find time, i will try to create a MWE. After writing this, i noticed: if the headers are not part of the condition but did change, cppcheck would probably have to rescan the file, right? This would not be the case when only looking at the hash of the file and not incorporate the included files somehow. @danmar what do you think? |
yes if a header is changed then it is preferable to rescan the source files that include it. |
Then this PR is invalid. I see two more approaches to solving the problem:
Both approaches are somewhat more work and will need some research and time. |
For future investigations, this MWE contains a stripped down Google Test (to reduce the size, only headers and sources are included), a Property Sheet which references Google Test (adds two include paths) and three minimal files (to resemble a valid Google Test project). There are four configurations (Release/Debug, Win32/x64) and the project is scanned via (using Git Bash shell (cppcheck main built for Release|x64):
First run with empty build dir and filesystem caches:
Second run with build dir filled but filesystem caches wiped:
Third run without wiping filesystem caches:
One can deduct that tokenizing (including 39 headers from GTest) takes around |
by not checking file tokens but instead using the filecontent directly to compute the checksum.
Tokenizing is expensive so by checking if the file itself changed (alongside the toolinfo data) it can be quickly evaluated if the file needs rechecking.
This of course only works if
--cppcheck-build-dir
is used and a previous run generated ananalyzerinfo
file.This basically implements incremental/differential file checking, so in codebases with few files changing, cppcheck will be incredibly fast.
Caveats i see are: