Tune python duplication remediation points

ABaldwinHunter · ABaldwinHunter · commit 7b4721ef8bce · 2016-01-19T18:24:52.000-05:00
- Reduce AST threshold from 40 to 32 (classic is 28)
- update point formula to match classic computation

Change from
     remediations_points = x * score
to
     remediation_points = x + (score-threshold) * y

This change increases parity with classic and overall
increases the number of duplication issues reported and
the penalties assigned them for python.

- Reduce AST threshold from 40 to 31 (classic is 28)
- update point formula to match classic computation

Change from
     remediations_points = x * score
to
     remediation_points = x + (score-threshold) * y

This change increases parity with classic and overall
increases the number of duplication issues reported and
the penalties assigned them for python.

Note on mass difference:

The mass of node corresponds to its size. Specifying a minimum threshold
tells Code Climate to ignore duplication in nodes below a certain size
(e.g. one liners).

The issue's Flay score is the result of its **mass** * **number of
occurrences** (or number of occurrences ^ 2, if the code is identical).

Comparing issue **mass** between parser in Platform and Classic:

| Platform  | Classic |  Platform / Classic    |
------------ | --------------- | ------------
 42      |  39   |  1.07
 45      |  40   |  1.125
 66      |  57   |  1.15789
 123      |  109  | 1.1284
  126    |  93  | 1.3548
  246      |  218  | 1.1284

I've estimated the factor of mass difference to be ~ 1.15
Since the default Python duplication mass threshold on Classic was 28,
and  `28 * 1.15 = 32.19999`, I've lowered our current default threshold
for Python on Platform from 40 to 32.

On Classic, Python duplication issues were penalized in terms of
remediation points as follows:

`1_500_000 + overage * 50_000`
where overage = **score** - **threshold**
( and score = f(mass) )

I've kept the base points but lowered the per_cost to 30_000 to account
for the difference in mass parsing (which gets amplified in the points
calculation.
diff --git a/lib/cc/engine/analyzers/python/main.rb b/lib/cc/engine/analyzers/python/main.rb
@@ -12,11 +12,20 @@ module Python
         class Main < CC::Engine::Analyzers::Base
           LANGUAGE = "python"
           DEFAULT_PATHS = ["**/*.py"]
-          DEFAULT_MASS_THRESHOLD = 40
-          BASE_POINTS = 1000
+          DEFAULT_MASS_THRESHOLD = 32
+          BASE_POINTS = 1_500_000
+          POINTS_PER_OVERAGE = 30_000
+
+          def calculate_points(issue)
+            BASE_POINTS + (overage(issue) * POINTS_PER_OVERAGE)
+          end
 
           private
 
+          def overage(issue)
+            issue.mass - mass_threshold
+          end
+
           def process_file(path)
             Node.new(::CC::Engine::Analyzers::Python::Parser.new(File.binread(path), path).parse.syntax_tree, path).format
           end
diff --git a/spec/cc/engine/analyzers/python/main_spec.rb b/spec/cc/engine/analyzers/python/main_spec.rb
@@ -27,7 +27,7 @@
         "path" => "foo.py",
         "lines" => { "begin" => 1, "end" => 1 },
       })
-      expect(json["remediation_points"]).to eq(54000)
+      expect(json["remediation_points"]).to eq(3_000_000)
       expect(json["other_locations"]).to eq([
         {"path" => "foo.py", "lines" => { "begin" => 2, "end" => 2} },
         {"path" => "foo.py", "lines" => { "begin" => 3, "end" => 3} }
@@ -54,7 +54,7 @@
         "path" => "foo.py",
         "lines" => { "begin" => 1, "end" => 1 },
       })
-      expect(json["remediation_points"]).to eq(18000)
+      expect(json["remediation_points"]).to eq(1_920_000)
       expect(json["other_locations"]).to eq([
         {"path" => "foo.py", "lines" => { "begin" => 2, "end" => 2} },
         {"path" => "foo.py", "lines" => { "begin" => 3, "end" => 3} }