-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CodeChecker diff works inconsistently in local-local, local-remote, remote-remote cases, especially with tags #3884
Labels
Milestone
Comments
Szelethus
added
bug 🐛
CLI 💻
Related to the command-line interface, such as the cmd, store, etc. commands
server 🖥️
labels
Apr 17, 2023
Szelethus
changed the title
Storing new tag changes diff on unrelated tags
CodeChecker diff works inconsistently in local-local, local-remote, remote-remote cases, especially with tags
Apr 18, 2023
Szelethus
added a commit
to Szelethus/codechecker
that referenced
this issue
Jun 26, 2023
We usually test CodeChecker diff by making API calls that one can find in report_server.py. However, `CodeChecker cmd diff` doesn't invoke these plainly, but rather wraps it in various functions, for which we never had proper tests. Until now! I also made sure to include (almost all) cases discussed here: Ericsson#3884
Szelethus
added a commit
to Szelethus/codechecker
that referenced
this issue
Jun 26, 2023
We usually test CodeChecker diff by making API calls that one can find in report_server.py. However, `CodeChecker cmd diff` doesn't invoke these plainly, but rather wraps it in various functions, for which we never had proper tests. Until now! I also made sure to include (almost all) cases discussed here: Ericsson#3884
Szelethus
added a commit
to Szelethus/codechecker
that referenced
this issue
Jun 26, 2023
We usually test CodeChecker diff by making API calls that one can find in report_server.py. However, `CodeChecker cmd diff` doesn't invoke these plainly, but rather wraps it in various functions, for which we never had proper tests. Until now! I also made sure to include (almost all) cases discussed here: Ericsson#3884
Szelethus
added a commit
to Szelethus/codechecker
that referenced
this issue
Jun 29, 2023
We usually test CodeChecker diff by making API calls that one can find in report_server.py. However, `CodeChecker cmd diff` doesn't invoke these plainly, but rather wraps it in various functions, for which we never had proper tests. Until now! I also made sure to include (almost all) cases discussed here: Ericsson#3884
Szelethus
added a commit
to Szelethus/codechecker
that referenced
this issue
Jul 7, 2023
We usually test CodeChecker diff by making API calls that one can find in report_server.py. However, `CodeChecker cmd diff` doesn't invoke these plainly, but rather wraps it in various functions, for which we never had proper tests. Until now! I also made sure to include (almost all) cases discussed here: Ericsson#3884
bruntib
pushed a commit
to bruntib/codechecker
that referenced
this issue
Jul 12, 2023
We usually test CodeChecker diff by making API calls that one can find in report_server.py. However, `CodeChecker cmd diff` doesn't invoke these plainly, but rather wraps it in various functions, for which we never had proper tests. Until now! I also made sure to include (almost all) cases discussed here: Ericsson#3884
Szelethus
added a commit
to Szelethus/codechecker
that referenced
this issue
Jul 13, 2023
Source code suppressions seem to work funny (meaning: really buggy) in the context of diffs. An earlier PR, Ericsson#3944 fixed a similar issue in the local-local diff case: we forgot to take source code suppressions into account for the baseline set. The issue is somewhat worse for local-remote and remote-local diffs, as discussed in issue Ericsson#3884. As a feature, this should be comically simple: calculate the set of outstanding reports from the "baseline" and the "new" sets. Suppose these sets are called O(baseline) and O(new). Then, calculating diffs should be this easy: * NEW: O(new) - O(baseline) * RESOLVED: O(baseline) - O(new) * UNRESOLVED: The intersection of O(baseline) and O(new) But, for historical and performance reasons, the way we do this is far, far more convoluted. We use two API calls: `getDiffResultsHash` and `getRunResults`. We use the former call to send the list of bug reports (and more specifically, only the bug hashes of those reports) we have locally to the server, and it returns a set of hashes found on the server, ideally those that would be present in the diff set. We constantly change and adjust these sets of hashes with review status rules, source code suppressions, and after some juggling, we ask for the actual report objects (as opposed to simple hashes) via `getRunResults`. Looking at `get_diff_local_dir_remote_run` in `cmd_line_client.py`, the implementation is super complicated, and its hard to judge whether we truly need this much complexity for something that theoretically so simple. Fixing the specific bug of not handling source code suppressions well could be done, on a high level from two approaches: fixing up the API calls on the server side to not return the the hashes that shouldn't be present in the final result, or patch the client to filter them out. This patch presents the client-only approach. Here is my argument for it. We already branch out for older client versions within (Ericsson#3746), so it would be a chore to fix this for both client versions. You can argue whether fixing this only for later client versions is okay (thats a fair argument), but I think its fair to ask users to upgrade their client if its starts misbehaving. After all, the tool doesn't break, it just work a little funny (meaning: really buggy), which it already does for the local-local case that had to be fixed client side. ANYWAYS. The fix is simple: filter for unreviewed and confirmed reports before calling `getRunResults`. I added new tests for tags just to be sure that it doesn't break anything unrelated, but their behaviour didn't change as a result of this patch.
Szelethus
added a commit
to Szelethus/codechecker
that referenced
this issue
Aug 15, 2023
My previous commit that added cmdline tests targeted local-local, remote-local and local-remote diffs, and remote-remote diffs to a lesser extent. It also added 2 cases related to tags. This PR adds cases from this issue: Ericsson#3884, specifically demonstrating the table found in section "Tag diffs" -> "Remote-Remote" _before_ the addition of a third tag.
Szelethus
added a commit
to Szelethus/codechecker
that referenced
this issue
Aug 15, 2023
My previous commit that added cmdline tests targeted local-local, remote-local and local-remote diffs, and remote-remote diffs to a lesser extent. It also added 2 cases related to tags. This PR adds cases from this issue: Ericsson#3884, specifically demonstrating the table found in section "Tag diffs" -> "Remote-Remote" _before_ the addition of a third tag.
Szelethus
added a commit
to Szelethus/codechecker
that referenced
this issue
Aug 15, 2023
My previous commit that added cmdline tests targeted local-local, remote-local and local-remote diffs, and remote-remote diffs to a lesser extent. It also added 2 cases related to tags. This PR adds cases from this issue: Ericsson#3884, specifically demonstrating the table found in section "Tag diffs" -> "Remote-Remote" _before_ the addition of a third tag.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
GUI problems
CodeChecker diff is a more complicated feature than it seems to be. The greatest problem with it is that diffing is not done on sets of reports, but rather sets of outstanding reports. Looking at this (cropped) screenshot, this is not evident at all:
Nowhere in the report page (after pressing diff) can you see that the result is the set of outstanding reports:
The icons for "Diff type" make this worse, since nothing explicitly states that we mean "Only outstanding reports in Baseline":
Meaning that it is easy to misinterpret the GUI and mistakenly believe that these are set operations on the set of all reports. The command line interface does a little better as one of the following 3 flags are manditory to
CodeChecker cmd diff
:--new
,--resolved
,--unresolved
.Issues with diff
Run diffs (without tags)
Suppose we have the following two files, test1.cpp and test2.cpp.
test1.cpp
Analysis results on test1.cpp are stored in the folder
result1
, and remotely asRunA
.Analysis:
Store:
Analysis results:
*When stored as RunA, on the GUI, a review status rule is set to false positive.
test2.cpp
Analysis results on test2.cpp are stored in the folder
result2
, and remotely asRunB
.Analysis:
Store:
Analysis results:
Summary
Preface to the results
Note that each report has its own instance. Two reports can be identical (to the point of having the same bug hash), but if they are in two different runs, they are not the same, and have their own row in the reports table. As a result, they have their own fixed_at date!
Diffing in theory is done with set operations on the set of outstanding reports. In practice, a report is considered outstanding (or open) if all of the following is true:
A report is closed (not outstanding) if any of the following is true:
However, in practice, we don't check these these properties, but whether they have their fixed_at property set. The idea is that outstanding reports do not have this property set, and closed reports do. Any deviation from this (unfortunately unwritten) rule is a bug, so whether fixed_at is set should perfectly reflect whether a report is closed.
Since diffing is done on the set of outstanding reports, filters on the diff page (seen in an image above) don't matter much, reports with false positive or intentional review statuses should never appear, nor reports with resolved or off detection status.
Local-Local results
Commands used for the results:
Note that the last two options are there to make sure that all review statuses and detection statuses are included.
*TODO: Consider adding a feature where we query the review status rule even in the local-local case: if we did that, F should be in the neither set.
Local-Remote / Remote-Local results
Commands used for the results:
*FIXME: Its a reasonable suspicion that we don't precalculate the set of outstanding reports AND THEN do set calculations, which could be why E is in two disjointed sets.
*FIXME: Its a reasonable suspicion that we don't precalculate the set of outstanding reports AND THEN do set calculations, which could be why C is in two disjointed sets.
Remote-Remote results:
No filters were applied in the GUI as a result (besides the default, which is no filters at all). I also checked the results on the coammnd line, and got the same thing.
Commands for the cmd:
Discusssion
First if all, all 3 of these should be identical, except for report F (discussed below). The fact that they aren't is already a sign of a bug (likely several bugs).
Here are some docs regarding review status rules. Review status rules are created when a report's review status is categorized in the GUI.
This bit is mostly about what is the difference in between them and source code suppression.
When it comes to false positive (or intentional) review status rules, our current thinking is that if a report was categorized as such, every other report with the same bug hash is a false positive as well. (and as a result, a closed report). Mind that it doesn't matter whether a report's detection date is before or after the rule is set -- every already stored report, and those that will be stored in the future will have its review status set (and its fixed_at date as well). This to implies that in some sense, review status rules are a timeless property.
Source code suppressions, when stored on the server, won't create a review status rule. Reports marked as a FP in the source code only affect the report instance in that particular analysis (and are only closed in that run). This means that if the source code comment itself is removed from run to another, we regard it as a new outstanding report. This also implies that source code suppressions are NOT a timeless property -- reports E and C are not false positives, ONLY in the context of the runs that suppressed them.
We can definitely say that our logic regarding source code suppression doesn't work right at the moment. Aside from the remote-remote case, reports E and C are almost always misplaced.
In short, source code suppressions and review status rule suppressions are handled very differently for diffs, but look identical. The first row is suppressed with a comment, the second one with a review status rule.
This is not necessarily a problem, just something to be aware of.
Tag diffs
Preface to the results
When storing under the same run name but with different tags, the statement above on different report instances no longer hold. A report in tag1 and tag2 don't just look equivalent, they refer to the same bug report instance. For this reason, the fixed_at date can be changed by a subsequent tag!
The following example (among other shortcomings) showcases that the result of diff(tag1, tag2) before storing a new tag onto the run, than after it.
Store commands:
FIXME: Tags don't seem to work at all. I created a separate issue for this: #3889
Later, the results in
result1/
will be stored again under the tagtag3
.Summary
Local-Remote
(before tag3):
(before tag3):
*FIXME: How did this get here???
**Report E is no longer suppressed in the command line from tag1 to tag2, so this is a case where the fixed_at property was removed, but should not have been. For architectural reasons, it is impossible for previously stored tags to act as if no subsequent tags were stored on top of it (because the fixed_at date is irrecoverably overwritten). In the reverse case (report C) we could work around this: detection dates never chance (theoretically speaking), and we may be able to express that that the fixed_at date is precisely the detect_at date in tag2, and as such, is still outstanding in tag1.
Remote-Remote
No filters were applied in the GUI as a result (besides the default, which is no filters at all). TODO: Check cmd
(before tag3):
This should be identical to the non-tag remote-remote results.
(after tag3):
This should be identical to the results where tag3 hasn't been added yet.
Discussion
The problem here is that diffs are done on the latest status of the reports, but should instead be considered in the only in the context of the diff. If
tag3
doesn't participate in the diff, we should act as if it never existed (surely when calculating the set of outstanding reports).The text was updated successfully, but these errors were encountered: