Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add incremental mode to analysis command. #719

Merged
merged 1 commit into from
Jul 24, 2017

Conversation

csordasmarton
Copy link
Contributor

This commit resolves #702 issue.

@dkrupp
Copy link
Member

dkrupp commented Jul 17, 2017

Check why test_analyze_and_parse.py test cases in travis fail.

Copy link
Member

@dkrupp dkrupp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Fix failing regression test cases.
  • Add new test cases for the incremental analysis into tests/functional/analyze_and_parse

buildaction.analyzer_type = analyzer_types.CLANG_SA
elif os.path.basename(f).startswith("clang-tidy_"):
buildaction.analyzer_type = analyzer_types.CLANG_TIDY
buildaction.analyzer_type = analyzer_types.CLANG_SA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we use the buildaction.analyzer_type field in the strore? If not used, remove.


out_file_name = str(self.buildaction.analyzer_type) + \
'_' + analyzed_file_name + '_' + uid + '.plist'
out_file_name = hashlib.md5(file_name).hexdigest() + '.plist'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the hash, keep the source file name also in the plist file name to help the users to identify which plist file belongs to which source file.
So use something like: out_file_name = analyzed_file_name + hashlib.md5(file_name).hexdigest() + '.plist'

@@ -137,6 +137,16 @@ def add_arguments_to_parser(parser):
help="Annotate the ran analysis with a custom name in "
"the created metadata file.")

parser.add_argument('-f', '--force',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to change the parameter name to 'clean' from 'force'. We usually use the force parameter to enforce some potentially dangerous action like (rm -rf) which otherwise would trigger interactive questions from users. If this parameter is omitted, then there is no such question, but the analysis runs in incremental mode.

So the clean mode expresses the fact that we clean the previous results and perform a new analysis.

default=False,
help="Delete analysis reports stored in the output "
"directory. (By default, CodeChecker would keep "
"reports and overwrites only those plist files "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the type of files is a command-line argument, I would not use the word plist here.

Also, let's move this command between --type and --name. The order of arguments matter, and this is more logical to follow the "output" arguments.

Copy link
Contributor

@whisperity whisperity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing semantics:

  • If a file failed to analyze earlier but now we can analyze it, the failure file should be deleted (as now exists).
  • If a file is changed in the project, and a new log file is created (which contains only one build command for the modified file), and analyze is called, analyze says the compilation command is already analyzed. Thus, a changed file can not be reanalyzed without having to reanalyze the full project.
    • If this newer analysis call succeeds, the plist should be rewritten.
    • If it fails, the plist should be deleted, and a failure file should be written.

@@ -139,6 +143,12 @@ def check(check_data):
context.severity_map,
skip_handler)

rh.analyzed_source_file = source
if os.path.exists(rh.analyzer_result_file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The culprit the disability on reanalyzing changed files from newer build.jsons.

@@ -75,7 +78,7 @@ def worker_result_handler(results, metadata, output_path):
source_map[f[:-7]] = sfile.read().strip()
os.remove(f)

metadata['result_source_files'] = source_map
metadata['result_source_files'].update(source_map)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful here about source files that fail to analyze but successfully analyzed earlier on. They should be removed from the source map.

@@ -6,7 +6,7 @@

from abc import ABCMeta
import os
import uuid
import hashlib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports are alphabetically sorted.

help="Delete analysis reports stored in the output "
"directory. (By default, CodeChecker would keep "
"reports and overwrites only those files that "
"were update by the current build command.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing ) missing.

dest="clean",
required=False,
action='store_true',
default=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is explain the help. We use default=argparse.SUPPRESS so the help formatter doesn't create a (default: False) to the help output.

NOTE! If default is suppress, you can't use if args.clean: as it will be an error if --clean is not given, you need to use if 'clean' in args instead!

@csordasmarton csordasmarton force-pushed the incremental_analyze branch 2 times, most recently from cf0176f to 04b18a9 Compare July 21, 2017 14:43
@whisperity whisperity self-requested a review July 21, 2017 15:03
Copy link
Contributor

@whisperity whisperity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @csordasmarton. It works nicely! 🙂

Copy link
Contributor

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General directions look good, I have some questions though.

@@ -32,17 +32,18 @@ def worker_result_handler(results, metadata, output_path):
successful_analysis = defaultdict(int)
failed_analysis = defaultdict(int)
skipped_num = 0
existed_num = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not really like the name existed. Maybe reanalyzed is better.

@@ -100,6 +109,7 @@ def check(check_data):
output_dir, skip_handler = check_data

skipped = False
existed = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

out_file_name = str(self.buildaction.analyzer_type) + \
'_' + analyzed_file_name + '_' + uid + '.plist'
out_file_name = analyzed_file_name + '_' + \
hashlib.md5(file_name).hexdigest() + '.plist'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it sufficient to only include the filename? What about files that compiled multiple times with different compilation commands? Maybe it is okay to commit it like this for now, but I would include a FIXME note here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was puzzled for a bit as I specifically tested by hand if multiple files are analyzed in with different compilation setups in the buildlog and I properly got the plists duplicated with content appropriate for the build environment.

If you see right above in the previous line, the file_name variable is created from the original build command. This variable should be renamed, though, it seems to confuse people who don't read diff with enough context. 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, maybe it is worth to add a FIXME that in the future we might want to filter the commands. E.g. do not consider separate warning settings as separate compilation commands.

@@ -289,11 +302,18 @@ def main(args):
with open(args.skipfile, 'r') as skipfile:
metadata['skip_data'] = [l.strip() for l in skipfile.readlines()]

# Update metadata dictionary with old values.
metadata_file = os.path.join(args.output_path, 'metadata.json')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the current state of this metadata files? The requirement of metadata makes us incompatible with scan-build. Do we need it? Should we make it optional?

Copy link
Contributor

@whisperity whisperity Jul 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no "incompatibility", the store command works without metadata.json if we define works as puts the bugs found in the folder into the database.

The pre-5.8 plist command, as update mode was not supported, badly worked. It was expected from the user to always store a directory of plists into an empty run.

Metadata JSON is needed to connect plist files to source files and source build actions, so that the current storage system (on master) knows when to eliminate bugs linked to a (sourcefile, buildaction) pair – because the new bugs will be stored completely replacing the old ones. With the onset of #709, #724 and the possible upcoming schema changes, the proper requirement on having a metadata.json will be revised.

@@ -21,5 +21,5 @@ tidy_check.cpp:5:5: found assert() with side effect [misc-assert-side-effect]
assert(++i);
^

clang-tidy found 1 defect(s) while analyzing tidy_check.cpp
Found 1 defect(s) while analyzing tidy_check.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be great to have a test case where both sa and tidy finds defects to see that the numbers are added up correctly.

Copy link
Contributor

@whisperity whisperity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xazax-hun Mentioned a comment about a file_name variable, and I agree with making a test on the number adding up.

@csordasmarton csordasmarton force-pushed the incremental_analyze branch 2 times, most recently from 4b95930 to 81739a5 Compare July 24, 2017 14:29
@csordasmarton
Copy link
Contributor Author

csordasmarton commented Jul 24, 2017

I don't think it's a good idea to sum up the defects because the parse processing the plist files one-by-one.

Copy link
Contributor

@whisperity whisperity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Let's roll.

@whisperity whisperity merged commit 760a304 into Ericsson:version6 Jul 24, 2017
@csordasmarton csordasmarton deleted the incremental_analyze branch July 31, 2017 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants