-
Notifications
You must be signed in to change notification settings - Fork 24
Tune python duplication remediation points #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
def calculate_points(issue) | ||
BASE_POINTS + (overage(issue) * POINTS_PER_OVERAGE) | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I thought grades would get worse with this change, but overall they seem to be about the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating these stats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re the comparisons of grades between classic & platform: it looks like these are overall grades across all categories? If we're trying to tune the engines, I think we should be trying to compare this engine specifically to its classic equivalent (duplication category smells). Otherwise I think we're comparing at too high a level to have confidence in a given engine's tuning. This is related to what was discussed in IPM today. I'd love to see the sum of remediation points for duplication issues between classic & platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wfleming I think that's a good point, and so the table actually includes a column comparing only duplication and complexity between Platform and Classic, and omitting style.
The complexity issues have already been tuned, so I expect them to have roughly the same impact. It could be interesting to see precisely how many points per category each repo gets, but I'm not sure it's that much more helpful than comparing issues and overall GPA?
If we think that'd be useful to see though, I can work on some reverse engineering of remediation points and add a table of those.
After updating the stats, it looks like the grades are significantly worse but no incredibly dissimilar from Classic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the grades are significantly worse but no incredibly dissimilar from Classic.
…What? I'm not sure what you mean here. "Significantly worse" & "not incredibly dissimilar" sound contradictory.
If we think that'd be useful to see though, I can work on some reverse engineering of remediation points and add a table of those.
I don't think you need to reverse engineer anything: you can query for duplication issues in the smells
mongo table for the respective snapshots you're looking at, and compare the remediation points for the smell records that seem to correlate (same file/location) between classic & platform. The number of duplication issues I saw in that gist seemed low enough that correlating by hand doesn't seem arduous.
There is a direct relationship between remediation points & repo grade. So if we want to feel confident that duplication is going to result in similar grades on platform as it did on classic, I think the best way to know that is by comparing the solid numbers we get from duplication issues between the two architectures, without all the other categories potentially muddying the waters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the grades are significantly worse but no incredibly dissimilar from Classic.
it looks like the grades are significantly worse than currently on Platform, but not dissimilar from Classic.
There is a direct relationship between remediation points & repo grade.
True. It's important to note here though that we're not only comparing total remediation points, but also the variety of duplication issues found.
I can certainly query the database and make some stats if we're interested in seeing how many duplication-based remediation points are reported for a given repo or given issue on Classic and Platform.
My gut from the work so far is that the complexity-with-duplication grades are reflective of the duplication point differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the grades are significantly worse than currently on Platform, but not dissimilar from Classic.
Ah, thanks, that makes sense.
8426373
to
d759a4a
Compare
@@ -27,7 +27,7 @@ | |||
"path" => "foo.py", | |||
"lines" => { "begin" => 1, "end" => 1 }, | |||
}) | |||
expect(json["remediation_points"]).to eq(54000) | |||
expect(json["remediation_points"]).to eq(3000000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💄 3_000_000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
d759a4a
to
7b4721e
Compare
@codeclimate/review @wfleming I made a script to collect stats that compare python duplication remediation points on classic, current platform, and platform with proposed updates: https://gist.github.com/ABaldwinHunter/da407f6657ef30a05069 The proposed new points formula brings our analysis closer to Classic, but in some cases assigns significantly steeper penalties than on Classic for the same issue. I think classic may add additional remediation cost per occurrence when multiple instances of code are found (they show up as individual smells, but share a fingerprint) - but am currently double-checking. To reduce the impact of threshold differences, I might consider reducing the I've also only run these stats on Flask. I could run them on a few additional repositories. |
Nice to see that new points values are the same number of significant digits as classic! The difference between new platform/classic looks to be a ratio of ~2.17 on average. The remediation points -> GPA algorithm is an exponential step-off (i.e. twice as many points means one letter grade drop), so in terms of resultant grades at least, it seems like more |
ef7d7e1
to
d78e903
Compare
@codeclimate-review Ready for re-review Summary:
|
cc @brynary |
Note: this PR presents a few changes. I like keeping them together because they're finished and feel related, but can see the argument for splicing PR if that'd be cleaner. |
@@ -9,7 +9,7 @@ When you violate DRY, bugs and maintenance problems are sure to follow. Duplicat | |||
## Issue Mass | |||
|
|||
Duplicated code has a calculated mass, which can be thought of as a measure of how much logic has been duplicated. | |||
This issue has a mass of `<%= issue.mass %>`: if you would like to change the minimum mass that will be reported as an issue, please see the details in [`codeclimate-duplication`'s documentation](https://github.com/codeclimate/codeclimate-duplication). | |||
This issue has a mass of `<%= mass %>`: if you would like to change the minimum mass that will be reported as an issue, please see the details in [`codeclimate-duplication`'s documentation](https://github.com/codeclimate/codeclimate-duplication). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're putting this in the description, I think we can drop it from the read up. I thought that got done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I realized #77 hasn't shipped yet.
I have not reviewed this PR or read the context. I just saw one comment @pbrisbin made and it set off a flag... The UI does not and should not attempt to "collapse" multiple Issues of duplication within the same file caused by multiple occurrences of the same code structure within a single file. If a duplication is within a single file, and there are e.g. 2 occurrences, we should produce and render 2 Issues on the page. The count should show "2" in the sidebar also (not 1). This is an important change we made over a year ago and is important. :) /c @pbrisbin @ABaldwinHunter |
@brynary Perfect, thanks for weighing in. It seems I'm remembering when such a feature was discussed or implemented but wasn't around, or don't remember, when it was reverted or rejected. It's certainly easier to not have that UI logic, so that's good. Can you speak generally to duplication fingerprinting? Does each instance get a unique fingerprint (it sounds like yes). Would you consider https://github.com/codeclimate/app/blob/master/app/models/smells_counter.rb#L50 a bug / hold-over from when some sort of duplication collapsing was present in the UI? It also sounds like this is a yes, but the answer could depend on if there are any other cases where duplications might be, well, duplicated and need |
@pbrisbin I'm also leaning toward us saying that each duplication instance gets a unique fingerprint, and then any new instances or removals appear as But curious what B thinks. |
Seems pretty well answered to me:
This discussion is also not meant to decide on a new Right Thing -- we're trying to get to the bottom of how Classic works so we can preserve it. |
@pbrisbin I believe that on Classic each occurrence does get the same fingerprint, but that was mostly incidental relative to the context. I suspect either way would be acceptable for now, but probably safest to use what Classic does to avoid any surprises. |
3648fb6
to
6ff56f3
Compare
@wfleming Ready for re-review! I think I addressed the code concerns you mentioned -
Thing I didn't do and am trying to squeeze by: break into separate PRs and commits. Included clear bulleted summary in the commit. I don't think at this point reverse engineering the commit history is worth while, especially because the changes are intertwine, but can go ahead and take the time to do it if we prefer it? cc @codeclimate/review @brynary |
"fingerprint": fingerprint, | ||
} | ||
end # rubocop:enable Metrics/MethodLength | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is largely a clean transplant from Violation
. Added comments to disable method length check here bc it didn't seem useful.
3722c97
to
e3b46e6
Compare
1. FIX - Use node size as mass, instead of flay score 2. FIX - Report issue for each instance of duplicated code, not just first sexp. 3. UPDATE: Tune Python Remediation Points - Reduce AST threshold from 40 to 32 (classic is 28) - update point formula to match classic computation - don't penalize extra for identical duplication Change from remediations_points = x * score to remediation_points = x + (score-threshold) * y Since remediation points are a function of effort required to fix an issue, we're making a behavioral change to not penalize extra for identical duplication.
e3b46e6
to
fcb1a6d
Compare
Closing because this branch has been harvested and starfished into others. |
3132 (classic is 28)Change from
remediations_points = n * score
to
remediation_points = x + (score-threshold) * y
This change increases parity with classic and overall increases the number of duplication issues reported.
link to grade comparisons
Note: @gordondiggs and I are exploring exactly why there are parser differences between Classic and Platform - there are a number of relevant factors, including likely differences between versions of Python.
Note on mass difference:
The mass of node corresponds to its size. Specifying a minimum threshold tells Code Climate to ignore duplication in nodes below a certain size (e.g. one liners).
The issue's Flay score is the result of its mass * number of occurrences (or number of occurrences ^ 2, if the code is identical).
Comparing issue mass between parser in Platform and Classic:
I've estimated the factor of mass difference to be ~ 1.15
Since the default Python duplication mass threshold on Classic was 28, and
28 * 1.15 = 32.19999
, I've lowered our current default threshold for Python on Platform from 40 to 32.On Classic, Python duplication issues were penalized in terms of remediation points as follows:
1_500_000 + overage * 50_000
where overage = score - threshold
( and score = f(mass) )
I've kept the base points but lowered the per_cost to 30_000 to account for the difference in mass parsing (which gets amplified in the points calculation.
@codeclimate/review