Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SweepablePipeline #6285

Merged

Conversation

LittleLittleCloud
Copy link
Contributor

@LittleLittleCloud LittleLittleCloud commented Aug 17, 2022

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

Check out the spec for SweepablePipeline in #6218

SweepablePipeline is a combination of MultiModelPipeline and SweepableEstimatorPipeline, which supports a tree-like structure pipeline and support estimator-level search space using nested search space.

In another world, SweepablePipeline puts estimator candidates as part of its search space and makes it transparent to tuner. In this way, it decouples tuners from the detailed implementation of pipelines or trainers, and replacing them with Parameter and SearchSpace. The hyper-parameter optimization process, with the help of SweepablePipeline, can be simplified to the following 3 steps

  • ITuner sample parameter from search space
  • ITrialRunner train model and calculate score from parameter
  • ITuner update associated parameter with score.

Also, it provides a uniform way to create pipeline that includes multiple estimator candidates with search space.

And with this PR, the class that construct AutoML.Net Sweepable API is simplified to

  • ISweepable
    • SweepableEstimator: Estimator with search space
    • SweepablePipeline pipeline with search space

@LittleLittleCloud LittleLittleCloud changed the title [wip] Use SweepablePipeline Use SweepablePipeline Aug 18, 2022
@LittleLittleCloud LittleLittleCloud changed the title Use SweepablePipeline [wip] - Use SweepablePipeline Aug 18, 2022
@codecov
Copy link

codecov bot commented Aug 18, 2022

Codecov Report

Merging #6285 (4281116) into main (8589d25) will increase coverage by 0.05%.
The diff coverage is 99.65%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6285      +/-   ##
==========================================
+ Coverage   68.52%   68.58%   +0.05%     
==========================================
  Files        1170     1170              
  Lines      246931   247158     +227     
  Branches    25669    25675       +6     
==========================================
+ Hits       169220   169512     +292     
+ Misses      70961    70905      -56     
+ Partials     6750     6741       -9     
Flag Coverage Δ
Debug 68.58% <99.65%> (+0.05%) ⬆️
production 63.01% <ø> (+0.01%) ⬆️
test 89.09% <99.65%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
test/Microsoft.ML.AutoML.Tests/AutoFitTests.cs 83.04% <98.64%> (+6.72%) ⬆️
...t/Microsoft.ML.AutoML.Tests/AutoFeaturizerTests.cs 92.45% <100.00%> (+0.96%) ⬆️
...Microsoft.ML.AutoML.Tests/AutoMLExperimentTests.cs 100.00% <100.00%> (ø)
test/Microsoft.ML.AutoML.Tests/DatasetUtil.cs 97.84% <100.00%> (+16.78%) ⬆️
...icrosoft.ML.AutoML.Tests/SweepableExtensionTest.cs 96.00% <100.00%> (+1.26%) ⬆️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.38% <0.00%> (-0.15%) ⬇️
src/Microsoft.ML.Core/Data/ProgressReporter.cs 77.94% <0.00%> (ø)
src/Microsoft.ML.Data/Data/Conversion.cs 79.98% <0.00%> (+0.09%) ⬆️
src/Microsoft.ML.SearchSpace/SearchSpace.cs 72.01% <0.00%> (+0.45%) ⬆️
... and 6 more

@LittleLittleCloud LittleLittleCloud changed the title [wip] - Use SweepablePipeline Use SweepablePipeline Aug 22, 2022

public static AutoMLExperiment SetBinaryClassificationMetric(this AutoMLExperiment experiment, BinaryClassificationMetric metric, string labelColumn = "label", string predictedColumn = "PredictedLabel")
{
var metricManager = new BinaryMetricManager(metric, predictedColumn, labelColumn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predictedColumn, labelColumn

should we flip the order of these parameters to be consistent with the rest of APIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labelColumn should come before predictedColumn in order to be consistent with context.Binary.Evaluation api, I'll update BinaryMetricManager though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved


pipeline = pipeline.Append(Context.Auto().Featurizer(trainData, columnInformation, Features));
return pipeline.Append(Context.Auto().BinaryClassification(label, useSdca: useSdca, useFastTree: useFastTree, useLgbm: useLgbm, useLbfgs: uselbfgs, useFastForest: useFastForest, featureColumnName: Features));
throw new ArgumentException("IMetricManager must be BinaryMetricManager and IDatasetManager must be either TrainTestSplitDatasetManager or CrossValidationDatasetManager");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"IMetricManager must be BinaryMetricManager and IDatasetManager must be either TrainTestSplitDatasetManager or CrossValidationDatasetManager"

nit: I am seeing this message will not be clear if I see it thrown. Maybe you can modify it a little to tell something like,

$"The runner metric manager is of type {_metricManager.GetType()} which expected to be of type BinaryMetricManage"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@@ -136,7 +141,8 @@ internal MulticlassClassificationExperiment(MLContext context, MulticlassExperim
public override ExperimentResult<MulticlassClassificationMetrics> Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator<ITransformer> preFeaturizer = null, IProgress<RunDetail<MulticlassClassificationMetrics>> progressHandler = null)
{
var label = columnInformation.LabelColumnName;
_experiment.SetEvaluateMetric(Settings.OptimizingMetric, label);
TrialResultMonitor<MulticlassClassificationMetrics> monitor = null;
Copy link
Member

@tarekgh tarekgh Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TrialResultMonitor monitor = null;

nit: maybe better move this line down before _experiment.SetMonitor line?
This comment apply to similar places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

}
}

throw new ArgumentException("IMetricManager must be MultiMetricManager and IDatasetManager must be either TrainTestSplitDatasetManager or CrossValidationDatasetManager");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"IMetricManager must be MultiMetricManager and IDatasetManager must be either TrainTestSplitDatasetManager or CrossValidationDatasetManager"

ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@tarekgh
Copy link
Member

tarekgh commented Aug 23, 2022

        else

nit: you don't need the else here.


Refers to: src/Microsoft.ML.AutoML/AutoMLExperiment/AutoMLExperiment.cs:256 in 6de7519. [](commit_id = 6de7519, deletion_comment = False)

throw;
}
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else

else not needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That else will be hit when you get a training error but has a successful trial result(therefore _bestTrialResult is not null). In which case the current best result will be returned instead.

This is to avoid the case of losing all available trial results when encountering an unfatal error, like OOM or so. The more reliable way of doing that is, of course, detecting if exception from trial is fatal or not and continue training if the exception is not fatal. But in that case we need to cover all unfatal cases which is almost impossible and unnecessary. So as a step back, in order not to loss current training result, AutoMLExperiment simply 1) prints out exception and 2) return _currentBestTrial if there's any when encountering any exception. Only when there's no completed trial will AutoMLExperiment throws an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if block is throwing any way. so no need to have explicit else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OIC

Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added minor comments. In general the change LGTM as you explained it to me.

@LittleLittleCloud
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@LittleLittleCloud LittleLittleCloud merged commit 9652e59 into dotnet:main Aug 25, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Sep 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants