Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeChecker diff works inconsistently in local-local, local-remote, remote-remote cases, especially with tags #3884

Closed
Szelethus opened this issue Apr 17, 2023 · 0 comments · Fixed by #3996
Assignees
Labels
bug 🐛 CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands server 🖥️

Comments

@Szelethus
Copy link
Collaborator

Szelethus commented Apr 17, 2023

GUI problems

CodeChecker diff is a more complicated feature than it seems to be. The greatest problem with it is that diffing is not done on sets of reports, but rather sets of outstanding reports. Looking at this (cropped) screenshot, this is not evident at all:
image

Nowhere in the report page (after pressing diff) can you see that the result is the set of outstanding reports:

image

The icons for "Diff type" make this worse, since nothing explicitly states that we mean "Only outstanding reports in Baseline":

image

Meaning that it is easy to misinterpret the GUI and mistakenly believe that these are set operations on the set of all reports. The command line interface does a little better as one of the following 3 flags are manditory to CodeChecker cmd diff: --new, --resolved, --unresolved.

Issues with diff

Run diffs (without tags)

Suppose we have the following two files, test1.cpp and test2.cpp.

test1.cpp

Analysis results on test1.cpp are stored in the folder result1, and remotely as RunA.

Analysis:

$ CodeChecker check -b "g++ -c test1.cpp" -o result1 --analyzers clangsa -c

Store:

CodeChecker store result1/ -n RunA --url 0.0.0.0/Default
// test1.cpp
#include "stdlib.h"

void a() {
  int i = 0;
  (void)(10 / i); // division by zero
}

void b() {
  int *i = 0;
  *i = 5; // nullptr dereference
}

void c() {
  int i = 0; // deadstore, this value is never read
  //
  i = 5;
}

// void d() {
//   int *k = new int;
//   (void)k;
// } // memory leak

void e() {
  int *k = new int;
  // codechecker_suppress [all] SUPPRESS ALL
  free(k); // mismatched deallocator, should be delete k;
}

void f() {
  int *k = new int;
  delete k;
  *k = 5; // use after free
}

Analysis results:

Report ID Checker name Fixed at date & reason
A core.DivideZero -
B core.NullDereference -
C deadcode.DeadStores -
E unix.MismatchedDeallocator fix1, in source
F cplusplus.NewDelete fix1, on GUI *

*When stored as RunA, on the GUI, a review status rule is set to false positive.

test2.cpp

Analysis results on test2.cpp are stored in the folder result2, and remotely as RunB.

Analysis:

$ CodeChecker check -b "g++ -c test2.cpp" -o result2 --analyzers clangsa -c

Store:

CodeChecker store result2/ -n RunB --url 0.0.0.0/Default
// test2.cpp
#include "stdlib.h"

void a() {
  int i = 0;
  (void)(10 / i); // division by zero
}

// void b() {
//   int *i = 0;
//   *i = 5; // nullptr dereference
// }

void c() {
  int i = 0; // deadstore, this value is never read
  // codechecker_suppress [all] SUPPRESS ALL
  i = 5;
}

void d() {
  int *k = new int;
  (void)k;
} // memory leak

void e() {
  int *k = new int;
  //
  free(k); // mismatched deallocator, should be delete k;
}

void f() {
  int *k = new int;
  delete k;
  *k = 5; // use after free
}

Analysis results:

Report ID Checker name Fixed at date & reason
A core.DivideZero -
C deadcode.DeadStores fix2, in source
D cplusplus.NewDeleteLeaks -
E unix.MismatchedDeallocator -
F cplusplus.NewDelete -

Summary

Report ID Checker name In test1.cpp? In test2.cpp?
A core.DivideZero ✔️ ✔️
B core.NullDereference ✔️
C deadcode.DeadStores ✔️ ✔️, source code suppressed
D cplusplus.NewDeleteLeaks ✔️
E unix.MismatchedDeallocator ✔️, source code suppressed ✔️
F cplusplus.NewDelete ✔️, GUI suppressed ✔️, GUI suppressed

Preface to the results

Note that each report has its own instance. Two reports can be identical (to the point of having the same bug hash), but if they are in two different runs, they are not the same, and have their own row in the reports table. As a result, they have their own fixed_at date!

Diffing in theory is done with set operations on the set of outstanding reports. In practice, a report is considered outstanding (or open) if all of the following is true:

  • Its detection status is new, reopened, or unresolved
  • Its review status is unreviewed or confirmed

A report is closed (not outstanding) if any of the following is true:

  • Its review status is false positive or intentional
  • Its detection status is resolved or off

However, in practice, we don't check these these properties, but whether they have their fixed_at property set. The idea is that outstanding reports do not have this property set, and closed reports do. Any deviation from this (unfortunately unwritten) rule is a bug, so whether fixed_at is set should perfectly reflect whether a report is closed.

Since diffing is done on the set of outstanding reports, filters on the diff page (seen in an image above) don't matter much, reports with false positive or intentional review statuses should never appear, nor reports with resolved or off detection status.

Local-Local results

Commands used for the results:

CodeChecker cmd diff -b result1/ -n result2/ --new --review-status --detection-status
CodeChecker cmd diff -b result1/ -n result2/ --resolved --review-status --detection-status
CodeChecker cmd diff -b result1/ -n result2/ --unresolved --review-status --detection-status

Note that the last two options are there to make sure that all review statuses and detection statuses are included.

Run set 1 (-b) Run set 2 (-n) Diff type Actual result Expected result
result1/ result2/ NEW (only in Run set 2) D D, E
result1/ result2/ RESOLVED (only in Run set 1) B, C B, C
result1/ result2/ UNRESOLVED (Run set 1 and Run set 2) A, E, F A, F*
result1/ result2/ neither - -

*TODO: Consider adding a feature where we query the review status rule even in the local-local case: if we did that, F should be in the neither set.

Local-Remote / Remote-Local results

Commands used for the results:

CodeChecker cmd diff -b RunA -n result2/ --new --review-status --detection-status
CodeChecker cmd diff -b RunA -n result2/ --resolved --review-status --detection-status
CodeChecker cmd diff -b RunA -n result2/ --unresolved --review-status --detection-status
Run set 1 (-b) Run set 2 (-n) Diff type Actual result Expected result
RunA result2/ NEW (only in Run set 2) D, E * D, E
RunA result2/ RESOLVED (only in Run set 1) B, C, E *, F B, C
RunA result2/ UNRESOLVED (Run set 1 and Run set 2) A A
RunA result2/ neither - F

*FIXME: Its a reasonable suspicion that we don't precalculate the set of outstanding reports AND THEN do set calculations, which could be why E is in two disjointed sets.

CodeChecker cmd diff -b result1/ -n RunB --new --review-status --detection-status
CodeChecker cmd diff -b result1/ -n RunB --resolved --review-status --detection-status
CodeChecker cmd diff -b result1/ -n RunB --unresolved --review-status --detection-status
Run set 1 (-b) Run set 2 (-n) Diff type Actual result Expected result
result1/ RunB NEW (only in Run set 2) C *, D, E, F D, E
result1/ RunB RESOLVED (only in Run set 1) B, C * B, C
result1/ RunB UNRESOLVED (Run set 1 and Run set 2) A A
result1/ RunB neither - F

*FIXME: Its a reasonable suspicion that we don't precalculate the set of outstanding reports AND THEN do set calculations, which could be why C is in two disjointed sets.

Remote-Remote results:

No filters were applied in the GUI as a result (besides the default, which is no filters at all). I also checked the results on the coammnd line, and got the same thing.

Commands for the cmd:

CodeChecker cmd diff -b RunA -n RunB --new --review-status --detection-status
CodeChecker cmd diff -b RunA -n RunB --resolved --review-status --detection-status
CodeChecker cmd diff -b RunA -n RunB --unresolved --review-status --detection-status
Run set 1 (-b) Run set 2 (-n) Diff type Actual result Expected result
RunA RunB NEW (only in Run set 2) D, E D, E
RunA RunB RESOLVED (only in Run set 1) B, C B, C
RunA RunB UNRESOLVED (Run set 1 and Run set 2) A A
RunA RunB neither F F

Discusssion

First if all, all 3 of these should be identical, except for report F (discussed below). The fact that they aren't is already a sign of a bug (likely several bugs).

Here are some docs regarding review status rules. Review status rules are created when a report's review status is categorized in the GUI.

This bit is mostly about what is the difference in between them and source code suppression.

When it comes to false positive (or intentional) review status rules, our current thinking is that if a report was categorized as such, every other report with the same bug hash is a false positive as well. (and as a result, a closed report). Mind that it doesn't matter whether a report's detection date is before or after the rule is set -- every already stored report, and those that will be stored in the future will have its review status set (and its fixed_at date as well). This to implies that in some sense, review status rules are a timeless property.

Source code suppressions, when stored on the server, won't create a review status rule. Reports marked as a FP in the source code only affect the report instance in that particular analysis (and are only closed in that run). This means that if the source code comment itself is removed from run to another, we regard it as a new outstanding report. This also implies that source code suppressions are NOT a timeless property -- reports E and C are not false positives, ONLY in the context of the runs that suppressed them.

We can definitely say that our logic regarding source code suppression doesn't work right at the moment. Aside from the remote-remote case, reports E and C are almost always misplaced.

In short, source code suppressions and review status rule suppressions are handled very differently for diffs, but look identical. The first row is suppressed with a comment, the second one with a review status rule.
image
This is not necessarily a problem, just something to be aware of.

Tag diffs

Preface to the results

When storing under the same run name but with different tags, the statement above on different report instances no longer hold. A report in tag1 and tag2 don't just look equivalent, they refer to the same bug report instance. For this reason, the fixed_at date can be changed by a subsequent tag!

The following example (among other shortcomings) showcases that the result of diff(tag1, tag2) before storing a new tag onto the run, than after it.

Store commands:

CodeChecker store result1/ -n "SingleRun" --tag tag1 --url 0.0.0.0/5569
CodeChecker store result2/ -n "SingleRun" --tag tag2 --url 0.0.0.0/5569

FIXME: Tags don't seem to work at all. I created a separate issue for this: #3889

Later, the results in result1/ will be stored again under the tag tag3.

CodeChecker store result1/ -n "SingleRun" --tag tag3 --url 0.0.0.0/5569

Summary

Report ID Checker name In test1.cpp? (tag1) In test2.cpp? (tag2) In test1.cpp? (tag3)
A core.DivideZero ✔️ ✔️ ✔️
B core.NullDereference ✔️ ✔️
C deadcode.DeadStores ✔️ ✔️, src suppressed ✔️
D cplusplus.NewDeleteLeaks ✔️
E unix.MismatchedDeallocator ✔️, src suppressed ✔️ ✔️, src suppressed
F cplusplus.NewDelete ✔️, GUI suppressed ✔️, GUI suppressed ✔️, GUI suppressed

Local-Remote

CodeChecker cmd diff -b result1/ -n SingleRun --tag tag2 --new --review-status --detection-status
CodeChecker cmd diff -b result1/ -n SingleRun --tag tag2 --resolved --review-status --detection-status
CodeChecker cmd diff -b result1/ -n SingleRun --tag tag2 --unresolved --review-status --detection-status

(before tag3):

Run set 1 (-b) Run set 2 (-n) Diff type Actual result Expected result
result1/ SingleRun tag2 NEW (only in Run set 2) D, E D, E
result1/ SingleRun tag2 RESOLVED (only in Run set 1) B, C B, C
result1/ SingleRun tag2 UNRESOLVED (Run set 1 and Run set 2) A A
result1/ SingleRun tag2 neither F F
CodeChecker cmd diff -b SingleRun --tag tag1 -n result2/ --new --review-status --detection-status
CodeChecker cmd diff -b SingleRun --tag tag1 -n result2/ --resolved --review-status --detection-status
CodeChecker cmd diff -b SingleRun --tag tag1 -n result2/ --unresolved --review-status --detection-status

(before tag3):

Run set 1 (-b) Run set 2 (-n) Diff type Actual result Expected result
SingleRun tag1 result2/ NEW (only in Run set 2) - D, E
SingleRun tag1 result2/ RESOLVED (only in Run set 1) B, C B, C
SingleRun tag1 result2/ UNRESOLVED (Run set 1 and Run set 2) A, D*, E** A
SingleRun tag1 result2/ neither F F

*FIXME: How did this get here???
**Report E is no longer suppressed in the command line from tag1 to tag2, so this is a case where the fixed_at property was removed, but should not have been. For architectural reasons, it is impossible for previously stored tags to act as if no subsequent tags were stored on top of it (because the fixed_at date is irrecoverably overwritten). In the reverse case (report C) we could work around this: detection dates never chance (theoretically speaking), and we may be able to express that that the fixed_at date is precisely the detect_at date in tag2, and as such, is still outstanding in tag1.

Remote-Remote

No filters were applied in the GUI as a result (besides the default, which is no filters at all). TODO: Check cmd

(before tag3):

Run set 1 (-b) Run set 2 (-n) Diff type Actual result Expected result
SingleRun tag1 SingleRun tag2 NEW (only in Run set 2) D D, E
SingleRun tag1 SingleRun tag2 RESOLVED (only in Run set 1) - B, C
SingleRun tag1 SingleRun tag2 UNRESOLVED (Run set 1 and Run set 2) A, E A
SingleRun tag1 SingleRun tag2 neither B, C, F F

This should be identical to the non-tag remote-remote results.

(after tag3):

Run set 1 (-b) Run set 2 (-n) Diff type Actual result Expected result
SingleRun tag1 SingleRun tag2 NEW (only in Run set 2) - D, E
SingleRun tag1 SingleRun tag2 RESOLVED (only in Run set 1) - B, C
SingleRun tag1 SingleRun tag2 UNRESOLVED (Run set 1 and Run set 2) D, A, E A
SingleRun tag1 SingleRun tag2 neither B, C, F F

This should be identical to the results where tag3 hasn't been added yet.

Discussion

The problem here is that diffs are done on the latest status of the reports, but should instead be considered in the only in the context of the diff. If tag3 doesn't participate in the diff, we should act as if it never existed (surely when calculating the set of outstanding reports).

@Szelethus Szelethus added bug 🐛 CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands server 🖥️ labels Apr 17, 2023
@Szelethus Szelethus added this to the release 6.22.1 milestone Apr 17, 2023
@Szelethus Szelethus self-assigned this Apr 17, 2023
@Szelethus Szelethus added the WIP 💣 Work In Progress label Apr 17, 2023
@Szelethus Szelethus changed the title Storing new tag changes diff on unrelated tags CodeChecker diff works inconsistently in local-local, local-remote, remote-remote cases, especially with tags Apr 18, 2023
@Szelethus Szelethus removed the WIP 💣 Work In Progress label Apr 18, 2023
Szelethus added a commit to Szelethus/codechecker that referenced this issue Jun 26, 2023
We usually test CodeChecker diff by making API calls that one can find
in report_server.py. However, `CodeChecker cmd diff` doesn't invoke
these plainly, but rather wraps it in various functions, for which we
never had proper tests. Until now!

I also made sure to include (almost all) cases discussed here:
Ericsson#3884
Szelethus added a commit to Szelethus/codechecker that referenced this issue Jun 26, 2023
We usually test CodeChecker diff by making API calls that one can find
in report_server.py. However, `CodeChecker cmd diff` doesn't invoke
these plainly, but rather wraps it in various functions, for which we
never had proper tests. Until now!

I also made sure to include (almost all) cases discussed here:
Ericsson#3884
Szelethus added a commit to Szelethus/codechecker that referenced this issue Jun 26, 2023
We usually test CodeChecker diff by making API calls that one can find
in report_server.py. However, `CodeChecker cmd diff` doesn't invoke
these plainly, but rather wraps it in various functions, for which we
never had proper tests. Until now!

I also made sure to include (almost all) cases discussed here:
Ericsson#3884
Szelethus added a commit to Szelethus/codechecker that referenced this issue Jun 29, 2023
We usually test CodeChecker diff by making API calls that one can find
in report_server.py. However, `CodeChecker cmd diff` doesn't invoke
these plainly, but rather wraps it in various functions, for which we
never had proper tests. Until now!

I also made sure to include (almost all) cases discussed here:
Ericsson#3884
@bruntib bruntib modified the milestones: release 6.22.2, release 6.23.0 Jul 4, 2023
Szelethus added a commit to Szelethus/codechecker that referenced this issue Jul 7, 2023
We usually test CodeChecker diff by making API calls that one can find
in report_server.py. However, `CodeChecker cmd diff` doesn't invoke
these plainly, but rather wraps it in various functions, for which we
never had proper tests. Until now!

I also made sure to include (almost all) cases discussed here:
Ericsson#3884
bruntib pushed a commit to bruntib/codechecker that referenced this issue Jul 12, 2023
We usually test CodeChecker diff by making API calls that one can find
in report_server.py. However, `CodeChecker cmd diff` doesn't invoke
these plainly, but rather wraps it in various functions, for which we
never had proper tests. Until now!

I also made sure to include (almost all) cases discussed here:
Ericsson#3884
Szelethus added a commit to Szelethus/codechecker that referenced this issue Jul 13, 2023
Source code suppressions seem to work funny (meaning: really buggy) in
the context of diffs. An earlier PR, Ericsson#3944 fixed a similar issue in the
local-local diff case: we forgot to take source code suppressions into
account for the baseline set.

The issue is somewhat worse for local-remote and remote-local diffs, as
discussed in issue Ericsson#3884.

As a feature, this should be comically simple: calculate the set of
outstanding reports from the "baseline" and the "new" sets. Suppose
these sets are called O(baseline) and O(new). Then, calculating diffs
should be this easy:

* NEW: O(new) - O(baseline)
* RESOLVED: O(baseline) - O(new)
* UNRESOLVED: The intersection of O(baseline) and O(new)

But, for historical and performance reasons, the way we do this is far,
far more convoluted. We use two API calls: `getDiffResultsHash` and
`getRunResults`.

We use the former call to send the list of bug reports (and more
specifically, only the bug hashes of those reports) we have locally to
the server, and it returns a set of hashes found on the server, ideally
those that would be present in the diff set. We constantly change and
adjust these sets of hashes with review status rules, source code
suppressions, and after some juggling, we ask for the actual report
objects (as opposed to simple hashes) via `getRunResults`.

Looking at `get_diff_local_dir_remote_run` in `cmd_line_client.py`, the
implementation is super complicated, and its hard to judge whether we
truly need this much complexity for something that theoretically so
simple.

Fixing the specific bug of not handling source code suppressions well
could be done, on a high level from two approaches: fixing up the API
calls on the server side to not return the the hashes that shouldn't be
present in the final result, or patch the client to filter them out.

This patch presents the client-only approach. Here is my argument for
it.

We already branch out for older client versions within (Ericsson#3746), so it
would be a chore to fix this for both client versions. You can argue
whether fixing this only for later client versions is okay (thats a fair
argument), but I think its fair to ask users to upgrade their client if
its starts misbehaving. After all, the tool doesn't break, it just work
a little funny (meaning: really buggy), which it already does for the
local-local case that had to be fixed client side.

ANYWAYS.

The fix is simple: filter for unreviewed and confirmed reports before
calling `getRunResults`. I added new tests for tags just to be sure that
it doesn't break anything unrelated, but their behaviour didn't change
as a result of this patch.
Szelethus added a commit to Szelethus/codechecker that referenced this issue Aug 15, 2023
My previous commit that added cmdline tests targeted local-local,
remote-local and local-remote diffs, and remote-remote diffs to a lesser
extent. It also added 2 cases related to tags.

This PR adds cases from this issue: Ericsson#3884, specifically demonstrating
the table found in section "Tag diffs" -> "Remote-Remote" _before_ the
addition of a third tag.
Szelethus added a commit to Szelethus/codechecker that referenced this issue Aug 15, 2023
My previous commit that added cmdline tests targeted local-local,
remote-local and local-remote diffs, and remote-remote diffs to a lesser
extent. It also added 2 cases related to tags.

This PR adds cases from this issue: Ericsson#3884, specifically demonstrating
the table found in section "Tag diffs" -> "Remote-Remote" _before_ the
addition of a third tag.
Szelethus added a commit to Szelethus/codechecker that referenced this issue Aug 15, 2023
My previous commit that added cmdline tests targeted local-local,
remote-local and local-remote diffs, and remote-remote diffs to a lesser
extent. It also added 2 cases related to tags.

This PR adds cases from this issue: Ericsson#3884, specifically demonstrating
the table found in section "Tag diffs" -> "Remote-Remote" _before_ the
addition of a third tag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands server 🖥️
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants