Skip to content

Update Feature Contribution Calculation Samples #3241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 11, 2019

Conversation

rogancarr
Copy link
Contributor

This PR cleans up the samples for FCC and creates a new one specifically for calibrated learners.

Fixes #3233

@codecov
Copy link

codecov bot commented Apr 8, 2019

Codecov Report

Merging #3241 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3241      +/-   ##
==========================================
- Coverage   72.63%   72.62%   -0.01%     
==========================================
  Files         807      805       -2     
  Lines      145129   145091      -38     
  Branches    16220    16220              
==========================================
- Hits       105415   105376      -39     
  Misses      35297    35297              
- Partials     4417     4418       +1
Flag Coverage Δ
#Debug 72.62% <ø> (-0.01%) ⬇️
#production 68.16% <ø> (-0.02%) ⬇️
#test 88.94% <ø> (ø) ⬆️
Impacted Files Coverage Δ
...rosoft.ML.Data/Transforms/ExplainabilityCatalog.cs 100% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️
...OnnxTransformer.StaticPipe/OnnxStaticExtensions.cs
...r.StaticPipe/DnnImageFeaturizerStaticExtensions.cs
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.9% <0%> (+0.2%) ⬆️
src/Microsoft.ML.Maml/MAML.cs 26.21% <0%> (+1.45%) ⬆️

var transformedData = transformer.Transform(data);

// Define a linear trainer.
var linearTrainer = mlContext.Regression.Trainers.Ols();
Copy link
Contributor

@zeahmed zeahmed Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlContext.Regression.Trainers.Ols() [](start = 32, length = 35)

Can we not combined it with the pipeline above? Is there any specific reason for doing so? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to compute the feature contributions on the transformed data, so I separate this out into two steps.


In reply to: 273270793 [](ancestors = 273270793)

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr requested a review from Ivanidzo4ka April 9, 2019 20:39
// Convert training data to IDataView.
var data = mlContext.Data.LoadFromEnumerable(samples);

// Create a pipeline to concatenate the features into a feature vector and normalize it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look a part of FCC. May we start with raw data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to show that FCC works on featurized data, not on the original columns, so I wanted to make it explicit.

// Define a feature contribution calculator for all the features, and don't normalize the contributions.
// These are "trivial estimators" and they don't need to fit to the data, so we can feed a subset.
var simpleScoredDataset = linearModel.Transform(mlContext.Data.TakeRows(transformedData, 1));
var linearFeatureContributionCalculator = mlContext.Transforms.CalculateFeatureContribution(linearModel, normalize: false).Fit(simpleScoredDataset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do Fit(null)? Having simpleScoredDataset is a bit confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing, but we cannot use null. We need to pass in something with a schema.

private class Data
{
public float Label { get; set; }
public float Feature1 { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have spaces between public fields in the other sample below.

yield return data;
}
}
private static double Sigmoid(double x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One line function needs to use => 1.0 / (1.0 + Exp(-x)).

Copy link
Member

@wschin wschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Please address my comments if they make sense to you.

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr merged commit 6f576de into dotnet:master Apr 11, 2019
@rogancarr rogancarr deleted the 3233_fcc_docs branch April 11, 2019 04:22
rogancarr added a commit to rogancarr/machinelearning that referenced this pull request Apr 11, 2019
* Updating samples for FCC

(cherry picked from commit 6f576de)
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants