CodeChecker diff works inconsistently in local-local, local-remote, remote-remote cases, especially with tags #3884

Szelethus · 2023-04-17T11:48:19Z

GUI problems

CodeChecker diff is a more complicated feature than it seems to be. The greatest problem with it is that diffing is not done on sets of reports, but rather sets of outstanding reports. Looking at this (cropped) screenshot, this is not evident at all:

Nowhere in the report page (after pressing diff) can you see that the result is the set of outstanding reports:

The icons for "Diff type" make this worse, since nothing explicitly states that we mean "Only outstanding reports in Baseline":

Meaning that it is easy to misinterpret the GUI and mistakenly believe that these are set operations on the set of all reports. The command line interface does a little better as one of the following 3 flags are manditory to CodeChecker cmd diff: --new, --resolved, --unresolved.

Issues with diff

Run diffs (without tags)

Suppose we have the following two files, test1.cpp and test2.cpp.

test1.cpp

Analysis results on test1.cpp are stored in the folder result1, and remotely as RunA.

Analysis:

$ CodeChecker check -b "g++ -c test1.cpp" -o result1 --analyzers clangsa -c

Store:

CodeChecker store result1/ -n RunA --url 0.0.0.0/Default

// test1.cpp
#include "stdlib.h"

void a() {
  int i = 0;
  (void)(10 / i); // division by zero
}

void b() {
  int *i = 0;
  *i = 5; // nullptr dereference
}

void c() {
  int i = 0; // deadstore, this value is never read
  //
  i = 5;
}

// void d() {
//   int *k = new int;
//   (void)k;
// } // memory leak

void e() {
  int *k = new int;
  // codechecker_suppress [all] SUPPRESS ALL
  free(k); // mismatched deallocator, should be delete k;
}

void f() {
  int *k = new int;
  delete k;
  *k = 5; // use after free
}

Analysis results:

Report ID	Checker name	Fixed at date & reason
A	core.DivideZero	-
B	core.NullDereference	-
C	deadcode.DeadStores	-
E	unix.MismatchedDeallocator	fix1, in source
F	cplusplus.NewDelete	fix1, on GUI *

*When stored as RunA, on the GUI, a review status rule is set to false positive.

test2.cpp

Analysis results on test2.cpp are stored in the folder result2, and remotely as RunB.

Analysis:

$ CodeChecker check -b "g++ -c test2.cpp" -o result2 --analyzers clangsa -c

Store:

CodeChecker store result2/ -n RunB --url 0.0.0.0/Default

// test2.cpp
#include "stdlib.h"

void a() {
  int i = 0;
  (void)(10 / i); // division by zero
}

// void b() {
//   int *i = 0;
//   *i = 5; // nullptr dereference
// }

void c() {
  int i = 0; // deadstore, this value is never read
  // codechecker_suppress [all] SUPPRESS ALL
  i = 5;
}

void d() {
  int *k = new int;
  (void)k;
} // memory leak

void e() {
  int *k = new int;
  //
  free(k); // mismatched deallocator, should be delete k;
}

void f() {
  int *k = new int;
  delete k;
  *k = 5; // use after free
}

Analysis results:

Report ID	Checker name	Fixed at date & reason
A	core.DivideZero	-
C	deadcode.DeadStores	fix2, in source
D	cplusplus.NewDeleteLeaks	-
E	unix.MismatchedDeallocator	-
F	cplusplus.NewDelete	-

Summary

Report ID	Checker name	In test1.cpp?	In test2.cpp?
A	core.DivideZero	✔️	✔️
B	core.NullDereference	✔️	❌
C	deadcode.DeadStores	✔️	✔️, source code suppressed
D	cplusplus.NewDeleteLeaks	❌	✔️
E	unix.MismatchedDeallocator	✔️, source code suppressed	✔️
F	cplusplus.NewDelete	✔️, GUI suppressed	✔️, GUI suppressed

Preface to the results

Note that each report has its own instance. Two reports can be identical (to the point of having the same bug hash), but if they are in two different runs, they are not the same, and have their own row in the reports table. As a result, they have their own fixed_at date!

Diffing in theory is done with set operations on the set of outstanding reports. In practice, a report is considered outstanding (or open) if all of the following is true:

Its detection status is new, reopened, or unresolved
Its review status is unreviewed or confirmed

A report is closed (not outstanding) if any of the following is true:

Its review status is false positive or intentional
Its detection status is resolved or off

However, in practice, we don't check these these properties, but whether they have their fixed_at property set. The idea is that outstanding reports do not have this property set, and closed reports do. Any deviation from this (unfortunately unwritten) rule is a bug, so whether fixed_at is set should perfectly reflect whether a report is closed.

Since diffing is done on the set of outstanding reports, filters on the diff page (seen in an image above) don't matter much, reports with false positive or intentional review statuses should never appear, nor reports with resolved or off detection status.

Local-Local results

Commands used for the results:

CodeChecker cmd diff -b result1/ -n result2/ --new --review-status --detection-status
CodeChecker cmd diff -b result1/ -n result2/ --resolved --review-status --detection-status
CodeChecker cmd diff -b result1/ -n result2/ --unresolved --review-status --detection-status

Note that the last two options are there to make sure that all review statuses and detection statuses are included.

Run set 1 (-b)	Run set 2 (-n)	Diff type	Actual result	Expected result
result1/	result2/	NEW (only in Run set 2)	D	D, E
result1/	result2/	RESOLVED (only in Run set 1)	B, C	B, C
result1/	result2/	UNRESOLVED (Run set 1 and Run set 2)	A, E, F	A, F*
result1/	result2/	neither	-	-

*TODO: Consider adding a feature where we query the review status rule even in the local-local case: if we did that, F should be in the neither set.

Local-Remote / Remote-Local results

Commands used for the results:

CodeChecker cmd diff -b RunA -n result2/ --new --review-status --detection-status
CodeChecker cmd diff -b RunA -n result2/ --resolved --review-status --detection-status
CodeChecker cmd diff -b RunA -n result2/ --unresolved --review-status --detection-status

Run set 1 (-b)	Run set 2 (-n)	Diff type	Actual result	Expected result
RunA	result2/	NEW (only in Run set 2)	D, E *	D, E
RunA	result2/	RESOLVED (only in Run set 1)	B, C, E *, F	B, C
RunA	result2/	UNRESOLVED (Run set 1 and Run set 2)	A	A
RunA	result2/	neither	-	F

*FIXME: Its a reasonable suspicion that we don't precalculate the set of outstanding reports AND THEN do set calculations, which could be why E is in two disjointed sets.

CodeChecker cmd diff -b result1/ -n RunB --new --review-status --detection-status
CodeChecker cmd diff -b result1/ -n RunB --resolved --review-status --detection-status
CodeChecker cmd diff -b result1/ -n RunB --unresolved --review-status --detection-status

Run set 1 (-b)	Run set 2 (-n)	Diff type	Actual result	Expected result
result1/	RunB	NEW (only in Run set 2)	C *, D, E, F	D, E
result1/	RunB	RESOLVED (only in Run set 1)	B, C *	B, C
result1/	RunB	UNRESOLVED (Run set 1 and Run set 2)	A	A
result1/	RunB	neither	-	F

*FIXME: Its a reasonable suspicion that we don't precalculate the set of outstanding reports AND THEN do set calculations, which could be why C is in two disjointed sets.

Remote-Remote results:

No filters were applied in the GUI as a result (besides the default, which is no filters at all). I also checked the results on the coammnd line, and got the same thing.

Commands for the cmd:

CodeChecker cmd diff -b RunA -n RunB --new --review-status --detection-status
CodeChecker cmd diff -b RunA -n RunB --resolved --review-status --detection-status
CodeChecker cmd diff -b RunA -n RunB --unresolved --review-status --detection-status

Run set 1 (-b)	Run set 2 (-n)	Diff type	Actual result	Expected result
RunA	RunB	NEW (only in Run set 2)	D, E	D, E
RunA	RunB	RESOLVED (only in Run set 1)	B, C	B, C
RunA	RunB	UNRESOLVED (Run set 1 and Run set 2)	A	A
RunA	RunB	neither	F	F

Discusssion

First if all, all 3 of these should be identical, except for report F (discussed below). The fact that they aren't is already a sign of a bug (likely several bugs).

Here are some docs regarding review status rules. Review status rules are created when a report's review status is categorized in the GUI.

This bit is mostly about what is the difference in between them and source code suppression.

When it comes to false positive (or intentional) review status rules, our current thinking is that if a report was categorized as such, every other report with the same bug hash is a false positive as well. (and as a result, a closed report). Mind that it doesn't matter whether a report's detection date is before or after the rule is set -- every already stored report, and those that will be stored in the future will have its review status set (and its fixed_at date as well). This to implies that in some sense, review status rules are a timeless property.

Source code suppressions, when stored on the server, won't create a review status rule. Reports marked as a FP in the source code only affect the report instance in that particular analysis (and are only closed in that run). This means that if the source code comment itself is removed from run to another, we regard it as a new outstanding report. This also implies that source code suppressions are NOT a timeless property -- reports E and C are not false positives, ONLY in the context of the runs that suppressed them.

We can definitely say that our logic regarding source code suppression doesn't work right at the moment. Aside from the remote-remote case, reports E and C are almost always misplaced.

In short, source code suppressions and review status rule suppressions are handled very differently for diffs, but look identical. The first row is suppressed with a comment, the second one with a review status rule.

This is not necessarily a problem, just something to be aware of.

Tag diffs

Preface to the results

When storing under the same run name but with different tags, the statement above on different report instances no longer hold. A report in tag1 and tag2 don't just look equivalent, they refer to the same bug report instance. For this reason, the fixed_at date can be changed by a subsequent tag!

The following example (among other shortcomings) showcases that the result of diff(tag1, tag2) before storing a new tag onto the run, than after it.

Store commands:

CodeChecker store result1/ -n "SingleRun" --tag tag1 --url 0.0.0.0/5569
CodeChecker store result2/ -n "SingleRun" --tag tag2 --url 0.0.0.0/5569

FIXME: Tags don't seem to work at all. I created a separate issue for this: #3889

Later, the results in result1/ will be stored again under the tag tag3.

CodeChecker store result1/ -n "SingleRun" --tag tag3 --url 0.0.0.0/5569

Summary

Report ID	Checker name	In test1.cpp? (tag1)	In test2.cpp? (tag2)	In test1.cpp? (tag3)
A	core.DivideZero	✔️	✔️	✔️
B	core.NullDereference	✔️	❌	✔️
C	deadcode.DeadStores	✔️	✔️, src suppressed	✔️
D	cplusplus.NewDeleteLeaks	❌	✔️	❌
E	unix.MismatchedDeallocator	✔️, src suppressed	✔️	✔️, src suppressed
F	cplusplus.NewDelete	✔️, GUI suppressed	✔️, GUI suppressed	✔️, GUI suppressed

Local-Remote

CodeChecker cmd diff -b result1/ -n SingleRun --tag tag2 --new --review-status --detection-status
CodeChecker cmd diff -b result1/ -n SingleRun --tag tag2 --resolved --review-status --detection-status
CodeChecker cmd diff -b result1/ -n SingleRun --tag tag2 --unresolved --review-status --detection-status

(before tag3):

Run set 1 (-b)	Run set 2 (-n)	Diff type	Actual result	Expected result
result1/	SingleRun tag2	NEW (only in Run set 2)	D, E	D, E
result1/	SingleRun tag2	RESOLVED (only in Run set 1)	B, C	B, C
result1/	SingleRun tag2	UNRESOLVED (Run set 1 and Run set 2)	A	A
result1/	SingleRun tag2	neither	F	F

CodeChecker cmd diff -b SingleRun --tag tag1 -n result2/ --new --review-status --detection-status
CodeChecker cmd diff -b SingleRun --tag tag1 -n result2/ --resolved --review-status --detection-status
CodeChecker cmd diff -b SingleRun --tag tag1 -n result2/ --unresolved --review-status --detection-status

(before tag3):

Run set 1 (-b)	Run set 2 (-n)	Diff type	Actual result	Expected result
SingleRun tag1	result2/	NEW (only in Run set 2)	-	D, E
SingleRun tag1	result2/	RESOLVED (only in Run set 1)	B, C	B, C
SingleRun tag1	result2/	UNRESOLVED (Run set 1 and Run set 2)	A, D, E*	A
SingleRun tag1	result2/	neither	F	F

*FIXME: How did this get here???
**Report E is no longer suppressed in the command line from tag1 to tag2, so this is a case where the fixed_at property was removed, but should not have been. For architectural reasons, it is impossible for previously stored tags to act as if no subsequent tags were stored on top of it (because the fixed_at date is irrecoverably overwritten). In the reverse case (report C) we could work around this: detection dates never chance (theoretically speaking), and we may be able to express that that the fixed_at date is precisely the detect_at date in tag2, and as such, is still outstanding in tag1.

Remote-Remote

No filters were applied in the GUI as a result (besides the default, which is no filters at all). TODO: Check cmd

(before tag3):

Run set 1 (-b)	Run set 2 (-n)	Diff type	Actual result	Expected result
SingleRun tag1	SingleRun tag2	NEW (only in Run set 2)	D	D, E
SingleRun tag1	SingleRun tag2	RESOLVED (only in Run set 1)	-	B, C
SingleRun tag1	SingleRun tag2	UNRESOLVED (Run set 1 and Run set 2)	A, E	A
SingleRun tag1	SingleRun tag2	neither	B, C, F	F

This should be identical to the non-tag remote-remote results.

(after tag3):

Run set 1 (-b)	Run set 2 (-n)	Diff type	Actual result	Expected result
SingleRun tag1	SingleRun tag2	NEW (only in Run set 2)	-	D, E
SingleRun tag1	SingleRun tag2	RESOLVED (only in Run set 1)	-	B, C
SingleRun tag1	SingleRun tag2	UNRESOLVED (Run set 1 and Run set 2)	D, A, E	A
SingleRun tag1	SingleRun tag2	neither	B, C, F	F

This should be identical to the results where tag3 hasn't been added yet.

Discussion

The problem here is that diffs are done on the latest status of the reports, but should instead be considered in the only in the context of the diff. If tag3 doesn't participate in the diff, we should act as if it never existed (surely when calculating the set of outstanding reports).

The text was updated successfully, but these errors were encountered:

We usually test CodeChecker diff by making API calls that one can find in report_server.py. However, `CodeChecker cmd diff` doesn't invoke these plainly, but rather wraps it in various functions, for which we never had proper tests. Until now! I also made sure to include (almost all) cases discussed here: Ericsson#3884

Source code suppressions seem to work funny (meaning: really buggy) in the context of diffs. An earlier PR, Ericsson#3944 fixed a similar issue in the local-local diff case: we forgot to take source code suppressions into account for the baseline set. The issue is somewhat worse for local-remote and remote-local diffs, as discussed in issue Ericsson#3884. As a feature, this should be comically simple: calculate the set of outstanding reports from the "baseline" and the "new" sets. Suppose these sets are called O(baseline) and O(new). Then, calculating diffs should be this easy: * NEW: O(new) - O(baseline) * RESOLVED: O(baseline) - O(new) * UNRESOLVED: The intersection of O(baseline) and O(new) But, for historical and performance reasons, the way we do this is far, far more convoluted. We use two API calls: `getDiffResultsHash` and `getRunResults`. We use the former call to send the list of bug reports (and more specifically, only the bug hashes of those reports) we have locally to the server, and it returns a set of hashes found on the server, ideally those that would be present in the diff set. We constantly change and adjust these sets of hashes with review status rules, source code suppressions, and after some juggling, we ask for the actual report objects (as opposed to simple hashes) via `getRunResults`. Looking at `get_diff_local_dir_remote_run` in `cmd_line_client.py`, the implementation is super complicated, and its hard to judge whether we truly need this much complexity for something that theoretically so simple. Fixing the specific bug of not handling source code suppressions well could be done, on a high level from two approaches: fixing up the API calls on the server side to not return the the hashes that shouldn't be present in the final result, or patch the client to filter them out. This patch presents the client-only approach. Here is my argument for it. We already branch out for older client versions within (Ericsson#3746), so it would be a chore to fix this for both client versions. You can argue whether fixing this only for later client versions is okay (thats a fair argument), but I think its fair to ask users to upgrade their client if its starts misbehaving. After all, the tool doesn't break, it just work a little funny (meaning: really buggy), which it already does for the local-local case that had to be fixed client side. ANYWAYS. The fix is simple: filter for unreviewed and confirmed reports before calling `getRunResults`. I added new tests for tags just to be sure that it doesn't break anything unrelated, but their behaviour didn't change as a result of this patch.

My previous commit that added cmdline tests targeted local-local, remote-local and local-remote diffs, and remote-remote diffs to a lesser extent. It also added 2 cases related to tags. This PR adds cases from this issue: Ericsson#3884, specifically demonstrating the table found in section "Tag diffs" -> "Remote-Remote" _before_ the addition of a third tag.

Szelethus added bug 🐛 CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands server 🖥️ labels Apr 17, 2023

Szelethus added this to the release 6.22.1 milestone Apr 17, 2023

Szelethus self-assigned this Apr 17, 2023

Szelethus added the WIP 💣 Work In Progress label Apr 17, 2023

Szelethus changed the title ~~Storing new tag changes diff on unrelated tags~~ CodeChecker diff works inconsistently in local-local, local-remote, remote-remote cases, especially with tags Apr 18, 2023

Szelethus removed the WIP 💣 Work In Progress label Apr 18, 2023

vodorok modified the milestones: release 6.22.1, release 6.22.2 Apr 20, 2023

Szelethus mentioned this issue Apr 27, 2023

[NFC] Factor args out of the diff logic for unit tests #3863

Merged

Szelethus mentioned this issue Jun 27, 2023

[test] Add missing tests for cmdline diffing, and display a bug for tag diffs #3868

Merged

bruntib modified the milestones: release 6.22.2, release 6.23.0 Jul 4, 2023

Szelethus mentioned this issue Jul 13, 2023

[cmd] Fix FP annotations in the case of local-remote diffs #3956

Merged

Szelethus mentioned this issue Aug 15, 2023

[test] Add several more cmdline tests for tag diffs #3979

Merged

Szelethus linked a pull request Sep 8, 2023 that will close this issue

[diff] Fix a CLI remote-remote bug where we checked the detection status #3996

Merged

bruntib closed this as completed in #3996 Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeChecker diff works inconsistently in local-local, local-remote, remote-remote cases, especially with tags #3884

CodeChecker diff works inconsistently in local-local, local-remote, remote-remote cases, especially with tags #3884

Szelethus commented Apr 17, 2023 •

edited

Loading

CodeChecker diff works inconsistently in local-local, local-remote, remote-remote cases, especially with tags #3884

CodeChecker diff works inconsistently in local-local, local-remote, remote-remote cases, especially with tags #3884

Comments

Szelethus commented Apr 17, 2023 • edited Loading

GUI problems

Issues with diff

Run diffs (without tags)

test1.cpp

test2.cpp

Summary

Preface to the results

Local-Local results

Local-Remote / Remote-Local results

Remote-Remote results:

Discusssion

Tag diffs

Preface to the results

Summary

Local-Remote

Remote-Remote

Discussion

Szelethus commented Apr 17, 2023 •

edited

Loading