-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SweepablePipeline #6285
Use SweepablePipeline #6285
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6285 +/- ##
==========================================
+ Coverage 68.52% 68.58% +0.05%
==========================================
Files 1170 1170
Lines 246931 247158 +227
Branches 25669 25675 +6
==========================================
+ Hits 169220 169512 +292
+ Misses 70961 70905 -56
+ Partials 6750 6741 -9
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
||
public static AutoMLExperiment SetBinaryClassificationMetric(this AutoMLExperiment experiment, BinaryClassificationMetric metric, string labelColumn = "label", string predictedColumn = "PredictedLabel") | ||
{ | ||
var metricManager = new BinaryMetricManager(metric, predictedColumn, labelColumn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labelColumn
should come before predictedColumn
in order to be consistent with context.Binary.Evaluation api, I'll update BinaryMetricManager
though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
|
||
pipeline = pipeline.Append(Context.Auto().Featurizer(trainData, columnInformation, Features)); | ||
return pipeline.Append(Context.Auto().BinaryClassification(label, useSdca: useSdca, useFastTree: useFastTree, useLgbm: useLgbm, useLbfgs: uselbfgs, useFastForest: useFastForest, featureColumnName: Features)); | ||
throw new ArgumentException("IMetricManager must be BinaryMetricManager and IDatasetManager must be either TrainTestSplitDatasetManager or CrossValidationDatasetManager"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"IMetricManager must be BinaryMetricManager and IDatasetManager must be either TrainTestSplitDatasetManager or CrossValidationDatasetManager"
nit: I am seeing this message will not be clear if I see it thrown. Maybe you can modify it a little to tell something like,
$"The runner metric manager is of type {_metricManager.GetType()} which expected to be of type BinaryMetricManage"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
@@ -136,7 +141,8 @@ internal MulticlassClassificationExperiment(MLContext context, MulticlassExperim | |||
public override ExperimentResult<MulticlassClassificationMetrics> Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator<ITransformer> preFeaturizer = null, IProgress<RunDetail<MulticlassClassificationMetrics>> progressHandler = null) | |||
{ | |||
var label = columnInformation.LabelColumnName; | |||
_experiment.SetEvaluateMetric(Settings.OptimizingMetric, label); | |||
TrialResultMonitor<MulticlassClassificationMetrics> monitor = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
} | ||
} | ||
|
||
throw new ArgumentException("IMetricManager must be MultiMetricManager and IDatasetManager must be either TrainTestSplitDatasetManager or CrossValidationDatasetManager"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
throw; | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That else
will be hit when you get a training error but has a successful trial result(therefore _bestTrialResult is not null). In which case the current best result will be returned instead.
This is to avoid the case of losing all available trial results when encountering an unfatal error, like OOM or so. The more reliable way of doing that is, of course, detecting if exception from trial is fatal or not and continue training if the exception is not fatal. But in that case we need to cover all unfatal cases which is almost impossible and unnecessary. So as a step back, in order not to loss current training result, AutoMLExperiment
simply 1) prints out exception and 2) return _currentBestTrial if there's any when encountering any exception. Only when there's no completed trial will AutoMLExperiment
throws an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if block is throwing any way. so no need to have explicit else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OIC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added minor comments. In general the change LGTM as you explained it to me.
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
We are excited to review your PR.
So we can do the best job, please check:
Fixes #nnnn
in your description to cause GitHub to automatically close the issue(s) when your PR is merged.Check out the spec for
SweepablePipeline
in #6218SweepablePipeline
is a combination ofMultiModelPipeline
andSweepableEstimatorPipeline
, which supports a tree-like structure pipeline and support estimator-level search space using nested search space.In another world,
SweepablePipeline
puts estimator candidates as part of its search space and makes it transparent to tuner. In this way, it decouples tuners from the detailed implementation of pipelines or trainers, and replacing them withParameter
andSearchSpace
. The hyper-parameter optimization process, with the help ofSweepablePipeline
, can be simplified to the following 3 stepsITuner
sampleparameter
fromsearch space
ITrialRunner
train model and calculate score fromparameter
ITuner
update associatedparameter
with score.Also, it provides a uniform way to create pipeline that includes multiple estimator candidates with search space.
And with this PR, the class that construct AutoML.Net Sweepable API is simplified to
ISweepable
SweepableEstimator
: Estimator with search spaceSweepablePipeline
pipeline with search space