Tune python duplication remediation points #78

ABaldwinHunter · 2016-01-19T22:24:42Z

Reduce AST threshold from 40 to 31 32 (classic is 28)
update point formula to match classic computation

Change from
remediations_points = n * score
to
remediation_points = x + (score-threshold) * y

This change increases parity with classic and overall increases the number of duplication issues reported.

Note: @gordondiggs and I are exploring exactly why there are parser differences between Classic and Platform - there are a number of relevant factors, including likely differences between versions of Python.

Note on mass difference:

The mass of node corresponds to its size. Specifying a minimum threshold tells Code Climate to ignore duplication in nodes below a certain size (e.g. one liners).

The issue's Flay score is the result of its mass * number of occurrences (or number of occurrences ^ 2, if the code is identical).

Comparing issue mass between parser in Platform and Classic:

Platform	Classic	Platform / Classic
42	39	1.07
45	40	1.125
66	57	1.15789
123	109	1.1284
126	93	1.3548
246	218	1.1284

I've estimated the factor of mass difference to be ~ 1.15
Since the default Python duplication mass threshold on Classic was 28, and 28 * 1.15 = 32.19999, I've lowered our current default threshold for Python on Platform from 40 to 32.

On Classic, Python duplication issues were penalized in terms of remediation points as follows:

1_500_000 + overage * 50_000
where overage = score - threshold
( and score = f(mass) )

I've kept the base points but lowered the per_cost to 30_000 to account for the difference in mass parsing (which gets amplified in the points calculation.

@codeclimate/review

ABaldwinHunter · 2016-01-19T22:25:47Z

lib/cc/engine/analyzers/python/main.rb

+
+          def calculate_points(issue)
+            BASE_POINTS + (overage(issue) * POINTS_PER_OVERAGE)
+          end



Note: I thought grades would get worse with this change, but overall they seem to be about the same.

https://gist.github.com/ABaldwinHunter/e280310b4987766efd8c

Updating these stats

Re the comparisons of grades between classic & platform: it looks like these are overall grades across all categories? If we're trying to tune the engines, I think we should be trying to compare this engine specifically to its classic equivalent (duplication category smells). Otherwise I think we're comparing at too high a level to have confidence in a given engine's tuning. This is related to what was discussed in IPM today. I'd love to see the sum of remediation points for duplication issues between classic & platform.

@wfleming I think that's a good point, and so the table actually includes a column comparing only duplication and complexity between Platform and Classic, and omitting style.

The complexity issues have already been tuned, so I expect them to have roughly the same impact. It could be interesting to see precisely how many points per category each repo gets, but I'm not sure it's that much more helpful than comparing issues and overall GPA?

If we think that'd be useful to see though, I can work on some reverse engineering of remediation points and add a table of those.

After updating the stats, it looks like the grades are significantly worse but no incredibly dissimilar from Classic.

it looks like the grades are significantly worse but no incredibly dissimilar from Classic.

…What? I'm not sure what you mean here. "Significantly worse" & "not incredibly dissimilar" sound contradictory.

If we think that'd be useful to see though, I can work on some reverse engineering of remediation points and add a table of those.

I don't think you need to reverse engineer anything: you can query for duplication issues in the smells mongo table for the respective snapshots you're looking at, and compare the remediation points for the smell records that seem to correlate (same file/location) between classic & platform. The number of duplication issues I saw in that gist seemed low enough that correlating by hand doesn't seem arduous.

There is a direct relationship between remediation points & repo grade. So if we want to feel confident that duplication is going to result in similar grades on platform as it did on classic, I think the best way to know that is by comparing the solid numbers we get from duplication issues between the two architectures, without all the other categories potentially muddying the waters.

@wfleming

it looks like the grades are significantly worse but no incredibly dissimilar from Classic.

it looks like the grades are significantly worse than currently on Platform, but not dissimilar from Classic.

There is a direct relationship between remediation points & repo grade.

True. It's important to note here though that we're not only comparing total remediation points, but also the variety of duplication issues found.

I can certainly query the database and make some stats if we're interested in seeing how many duplication-based remediation points are reported for a given repo or given issue on Classic and Platform.

My gut from the work so far is that the complexity-with-duplication grades are reflective of the duplication point differences.

it looks like the grades are significantly worse than currently on Platform, but not dissimilar from Classic.

Ah, thanks, that makes sense.

wfleming · 2016-01-19T22:58:49Z

spec/cc/engine/analyzers/python/main_spec.rb

@@ -27,7 +27,7 @@
        "path" => "foo.py",
        "lines" => { "begin" => 1, "end" => 1 },
      })
-      expect(json["remediation_points"]).to eq(54000)
+      expect(json["remediation_points"]).to eq(3000000)


💄 3_000_000

ABaldwinHunter · 2016-01-21T19:41:02Z

@codeclimate/review @wfleming

I made a script to collect stats that compare python duplication remediation points on classic, current platform, and platform with proposed updates: https://gist.github.com/ABaldwinHunter/da407f6657ef30a05069

The proposed new points formula brings our analysis closer to Classic, but in some cases assigns significantly steeper penalties than on Classic for the same issue. I think classic may add additional remediation cost per occurrence when multiple instances of code are found (they show up as individual smells, but share a fingerprint) - but am currently double-checking.

To reduce the impact of threshold differences, I might consider reducing the per_cost variable further (already reduced from 50_000 to 30_000).

I've also only run these stats on Flask. I could run them on a few additional repositories.

wfleming · 2016-01-21T20:04:10Z

Nice to see that new points values are the same number of significant digits as classic! The difference between new platform/classic looks to be a ratio of ~2.17 on average. The remediation points -> GPA algorithm is an exponential step-off (i.e. twice as many points means one letter grade drop), so in terms of resultant grades at least, it seems like more per_cost tweaking might be valuable.

ABaldwinHunter · 2016-01-22T23:52:12Z

@codeclimate-review

Ready for re-review

Summary:

Update remediation points calculation (matches Classic, with one difference: don't penalize extra for identical code)
Bug fix: Report issue for each file with duplication, instead of only first location
Use mass as node size, not Flay score

ABaldwinHunter · 2016-01-22T23:52:19Z

cc @brynary

ABaldwinHunter · 2016-01-22T23:53:29Z

Note: this PR presents a few changes. I like keeping them together because they're finished and feel related, but can see the argument for splicing PR if that'd be cleaner.

wfleming · 2016-01-23T00:26:09Z

config/contents/duplicated_code.md.erb

@@ -9,7 +9,7 @@ When you violate DRY, bugs and maintenance problems are sure to follow. Duplicat
 ## Issue Mass

 Duplicated code has a calculated mass, which can be thought of as a measure of how much logic has been duplicated.
-This issue has a mass of `<%= issue.mass %>`: if you would like to change the minimum mass that will be reported as an issue, please see the details in [`codeclimate-duplication`'s documentation](https://github.com/codeclimate/codeclimate-duplication).
+This issue has a mass of `<%= mass %>`: if you would like to change the minimum mass that will be reported as an issue, please see the details in [`codeclimate-duplication`'s documentation](https://github.com/codeclimate/codeclimate-duplication).


If we're putting this in the description, I think we can drop it from the read up. I thought that got done?

Oops, I realized #77 hasn't shipped yet.

brynary · 2016-01-25T08:22:20Z

I have not reviewed this PR or read the context. I just saw one comment @pbrisbin made and it set off a flag...

The UI does not and should not attempt to "collapse" multiple Issues of duplication within the same file caused by multiple occurrences of the same code structure within a single file.

If a duplication is within a single file, and there are e.g. 2 occurrences, we should produce and render 2 Issues on the page. The count should show "2" in the sidebar also (not 1).

This is an important change we made over a year ago and is important. :) /c @pbrisbin @ABaldwinHunter

pbrisbin · 2016-01-25T14:19:41Z

@brynary Perfect, thanks for weighing in. It seems I'm remembering when such a feature was discussed or implemented but wasn't around, or don't remember, when it was reverted or rejected. It's certainly easier to not have that UI logic, so that's good.

Can you speak generally to duplication fingerprinting? Does each instance get a unique fingerprint (it sounds like yes). Would you consider https://github.com/codeclimate/app/blob/master/app/models/smells_counter.rb#L50 a bug / hold-over from when some sort of duplication collapsing was present in the UI? It also sounds like this is a yes, but the answer could depend on if there are any other cases where duplications might be, well, duplicated and need uniqing.

ABaldwinHunter · 2016-01-25T15:37:02Z

@pbrisbin I'm also leaning toward us saying that each duplication instance gets a unique fingerprint, and then any new instances or removals appear as new and fixed issues, rather than one improved situation.

But curious what B thinks.

pbrisbin · 2016-01-25T16:29:29Z

Seems pretty well answered to me:

If a duplication is within a single file, and there are e.g. 2 occurrences, we should produce and render 2 Issues on the page. The count should show "2" in the sidebar also (not 1).

This discussion is also not meant to decide on a new Right Thing -- we're trying to get to the bottom of how Classic works so we can preserve it.

brynary · 2016-01-25T18:04:36Z

@pbrisbin I believe that on Classic each occurrence does get the same fingerprint, but that was mostly incidental relative to the context. I suspect either way would be acceptable for now, but probably safest to use what Classic does to avoid any surprises.

ABaldwinHunter · 2016-01-25T19:34:03Z

@wfleming Ready for re-review!

I think I addressed the code concerns you mentioned -

reorganize violation handling of current_ and other_ sexps using an Issue class
use sexp.mass instead of flay score reverse engineering
don't use other_sexps.count in fingerprint to avoid false positives

Thing I didn't do and am trying to squeeze by: break into separate PRs and commits.

Included clear bulleted summary in the commit. I don't think at this point reverse engineering the commit history is worth while, especially because the changes are intertwine, but can go ahead and take the time to do it if we prefer it?

cc @codeclimate/review @brynary

ABaldwinHunter · 2016-01-25T19:41:03Z

lib/cc/engine/analyzers/issue.rb

+            "fingerprint": fingerprint,
+          }
+        end # rubocop:enable Metrics/MethodLength
+


This code is largely a clean transplant from Violation. Added comments to disable method length check here bc it didn't seem useful.

1. FIX - Use node size as mass, instead of flay score 2. FIX - Report issue for each instance of duplicated code, not just first sexp. 3. UPDATE: Tune Python Remediation Points - Reduce AST threshold from 40 to 32 (classic is 28) - update point formula to match classic computation - don't penalize extra for identical duplication Change from remediations_points = x * score to remediation_points = x + (score-threshold) * y Since remediation points are a function of effort required to fix an issue, we're making a behavioral change to not penalize extra for identical duplication.

ABaldwinHunter · 2016-01-27T23:20:24Z

Closing because this branch has been harvested and starfished into others.

ABaldwinHunter reviewed Jan 19, 2016
View reviewed changes

ABaldwinHunter force-pushed the abh-python-tuning branch from 8426373 to d759a4a Compare January 19, 2016 22:57

wfleming reviewed Jan 19, 2016
View reviewed changes

ABaldwinHunter force-pushed the abh-python-tuning branch from d759a4a to 7b4721e Compare January 19, 2016 23:24

ABaldwinHunter force-pushed the abh-python-tuning branch 5 times, most recently from ef7d7e1 to d78e903 Compare January 22, 2016 23:50

wfleming reviewed Jan 23, 2016
View reviewed changes

wfleming mentioned this pull request Jan 23, 2016

include mass in description #77

Closed

ABaldwinHunter force-pushed the abh-python-tuning branch 5 times, most recently from 3648fb6 to 6ff56f3 Compare January 25, 2016 19:30

ABaldwinHunter reviewed Jan 25, 2016
View reviewed changes

ABaldwinHunter force-pushed the abh-python-tuning branch 2 times, most recently from 3722c97 to e3b46e6 Compare January 25, 2016 20:04

ABaldwinHunter force-pushed the abh-python-tuning branch from e3b46e6 to fcb1a6d Compare January 25, 2016 20:07

This was referenced Jan 25, 2016

Report issue for each occurrence of duplication #83

Merged

Use sexp node size for issue mass, not flay score #84

Merged

ABaldwinHunter closed this Jan 27, 2016

ABaldwinHunter deleted the abh-python-tuning branch January 27, 2016 23:20

Tune python duplication remediation points #78

Tune python duplication remediation points #78

Uh oh!

Conversation

ABaldwinHunter commented Jan 19, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ABaldwinHunter commented Jan 21, 2016

Uh oh!

wfleming commented Jan 21, 2016

Uh oh!

ABaldwinHunter commented Jan 22, 2016

Uh oh!

ABaldwinHunter commented Jan 22, 2016

Uh oh!

ABaldwinHunter commented Jan 22, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brynary commented Jan 25, 2016

Uh oh!

pbrisbin commented Jan 25, 2016

Uh oh!

ABaldwinHunter commented Jan 25, 2016

Uh oh!

pbrisbin commented Jan 25, 2016

Uh oh!

brynary commented Jan 25, 2016

Uh oh!

ABaldwinHunter commented Jan 25, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ABaldwinHunter commented Jan 27, 2016

Uh oh!

Uh oh!