New PFI API #5934

michaelgsharp · 2021-09-13T16:54:27Z

Based on the feedback we received (thanks @houghj16 for all your work on that study!), our PFI API was too complex (and lets be honest, getting past the initial cast stops everyone...).

This PR fixes #5625 by adding a new PFI API that is much more user friendly in both how to call it, and how the results are returned. The added tests also make sure the output from the original API and the new API are identical if you match up the features correctly.

@JakeRadMSFT @briacht

codecov · 2021-09-13T18:58:48Z

Codecov Report

Merging #5934 (be2ad61) into main (eba938f) will increase coverage by 0.03%.
The diff coverage is 93.44%.

@@            Coverage Diff             @@
##             main    #5934      +/-   ##
==========================================
+ Coverage   68.20%   68.23%   +0.03%     
==========================================
  Files        1142     1142              
  Lines      242534   242769     +235     
  Branches    25378    25385       +7     
==========================================
+ Hits       165416   165663     +247     
+ Misses      70418    70412       -6     
+ Partials     6700     6694       -6

Flag	Coverage Δ
Debug	`68.23% <93.44%> (+0.03%)`	⬆️
production	`62.89% <88.48%> (+0.03%)`	⬆️
test	`88.61% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
.../Microsoft.ML.Data/Prediction/CalibratorCatalog.cs	`93.33% <0.00%> (-0.49%)`	⬇️
...Microsoft.ML.Data/Scorers/PredictionTransformer.cs	`91.00% <ø> (ø)`
...ansforms/PermutationFeatureImportanceExtensions.cs	`92.30% <89.13%> (-3.57%)`	⬇️
...soft.ML.Tests/PermutationFeatureImportanceTests.cs	`100.00% <100.00%> (ø)`
src/Microsoft.ML.AutoML/Sweepers/Parameters.cs	`85.59% <0.00%> (+0.84%)`	⬆️
src/Microsoft.ML.Maml/MAML.cs	`26.90% <0.00%> (+1.52%)`	⬆️
src/Microsoft.ML.Transforms/MetricStatistics.cs	`100.00% <0.00%> (+1.85%)`	⬆️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs	`62.25% <0.00%> (+2.64%)`	⬆️
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs	`85.03% <0.00%> (+6.29%)`	⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs	`100.00% <0.00%> (+20.51%)`	⬆️

test/Microsoft.ML.TestFramework/BaseTestBaseline.cs

eerhardt · 2021-09-14T15:46:04Z

test/Microsoft.ML.Tests/PermutationFeatureImportanceTests.cs

+                var ml2 = new MLContext(42);
+
                pfi = ML.Regression.PermutationFeatureImportance(model, data);
+                pfiDict = ml2.Regression.PermutationFeatureImportance((ITransformer)model, data);


Why is this cast needed? Shouldn't model already be something that implements ITransformer?

Yes. The issue is that the PFI method (of the same name) takes a ISingleFeaturePredictionTransformer<TModel>, and so the compiler defaults to that method instead of the method that just takes an ITransformer. model in this case is just a single estimator so its the actual type instead of ITransformer. Is there another way to work around that?

eerhardt · 2021-09-14T15:48:18Z

test/Microsoft.ML.Tests/PermutationFeatureImportanceTests.cs

            // X4Rand: 3

+            // Make sure that PFI from the array and the dictionary both have the same value for each feature.
+            Assert.Equal(JsonConvert.SerializeObject(pfi[0]), JsonConvert.SerializeObject(pfiDict["X1"]));


One concern I have with testing like this is that if RegressionMetricsStatistics wasn't convertible to JSON, and just serialized to {} (an empty object), this assert would pass and not really verify anything.

Is there a way we can ensure that these MetricsStatistics classes serialize to JSON correctly if we are going to compare them this way?

Another option would be to write a simple "AssertStatisticsEquals" method that compared all the properties of 2 instances.

Would that be better? I thought about doing that, but each type of statistics, Regression/Binary/Multiclass/Ranking, is a littler different so I would need different methods for each one. Thats why I ended up using this approach.

I'm not sure of a good way to know if they serialize correctly. I have pretty much always assumed Newtonsoft.json would just work, but I'm sure there are cases where it doesn't. In this case it does as I manually checked...

Would just verifying its not an empty string or just empty {} be enough? Or would we need better verification. If its not, I can always just make an equals method for these objects. None of the stats ones have it so I could just make it for them all.

eerhardt · 2021-09-14T15:50:18Z

test/Microsoft.ML.Tests/PermutationFeatureImportanceTests.cs

            Assert.Equal(3, MaxDeltaIndex(pfi, m => m.NegativePrecision.Mean));
            Assert.Equal(1, MinDeltaIndex(pfi, m => m.NegativePrecision.Mean));
-            Assert.Equal(3, MaxDeltaIndex(pfi, m => m.NegativeRecall.Mean));
+            Assert.Equal(0, MaxDeltaIndex(pfi, m => m.NegativeRecall.Mean));


?? Is this change intentional? Your changes shouldn't have affected these numbers.

This may be a good one to sync on, but the problem is that PFI changes the random state. So to make sure that both methods of PFI return the same results I reset the seed to 42 each time. Since that reset didn't happen before it slightly changed the results. Its possible I could find a seed though that would have the same results. Do you think that would be better:?

src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs

src/Microsoft.ML.Data/Prediction/IPredictionTransformer.cs

src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs

src/Microsoft.ML.Data/Prediction/CalibratorCatalog.cs

src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs

eerhardt

This looks good. Should really help out our customers who want to use this API.

Nice work.

michaelgsharp requested a review from a team September 13, 2021 16:54

eerhardt reviewed Sep 14, 2021

View reviewed changes

eerhardt reviewed Sep 21, 2021

View reviewed changes

src/Microsoft.ML.Data/Prediction/CalibratorCatalog.cs Outdated Show resolved Hide resolved

eerhardt reviewed Sep 21, 2021

View reviewed changes

src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs Outdated Show resolved Hide resolved

eerhardt reviewed Sep 21, 2021

View reviewed changes

src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs Outdated Show resolved Hide resolved

eerhardt reviewed Sep 27, 2021

View reviewed changes

src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs Outdated Show resolved Hide resolved

eerhardt approved these changes Sep 27, 2021

View reviewed changes

michaelgsharp added 8 commits September 30, 2021 16:39

pfi

251f7bf

new PFI API

002fa22

new PFI API

d4c8bf1

pfi tests working

25faff5

Updates from PR comments.

657cde4

Changes from PR comments.

9c7556a

Changes from PR comments.

86d301f

Added type name to exception message.

be2ad61

michaelgsharp force-pushed the pfi-test branch from 69300eb to be2ad61 Compare September 30, 2021 23:39

michaelgsharp merged commit 2da3488 into dotnet:main Oct 1, 2021

michaelgsharp deleted the pfi-test branch October 1, 2021 17:32

torronen mentioned this pull request Oct 3, 2021

What is "Slot" (PFI documentation suggestions) #5954

Open

michaelgsharp mentioned this pull request Oct 27, 2021

PFI (Permutation Feature Importance) API needs to be simpler to use #4216

Closed

ericjohannsen mentioned this pull request Feb 14, 2022

Fix Recursive Parsing for PFI #6084

Open

ghost locked as resolved and limited conversation to collaborators Mar 17, 2022

New PFI API #5934

New PFI API #5934

Uh oh!

Conversation

michaelgsharp commented Sep 13, 2021

Uh oh!

codecov bot commented Sep 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

eerhardt Sep 14, 2021

Choose a reason for hiding this comment

Uh oh!

michaelgsharp Sep 16, 2021

Choose a reason for hiding this comment

Uh oh!

eerhardt Sep 14, 2021

Choose a reason for hiding this comment

Uh oh!

michaelgsharp Sep 16, 2021

Choose a reason for hiding this comment

Uh oh!

michaelgsharp Sep 16, 2021

Choose a reason for hiding this comment

Uh oh!

eerhardt Sep 14, 2021

Choose a reason for hiding this comment

Uh oh!

michaelgsharp Sep 16, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eerhardt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Sep 13, 2021 •

edited

Loading