-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Proposal: Experiment API #6118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Experiment API #6118
Conversation
Codecov Report
@@ Coverage Diff @@
## main #6118 +/- ##
==========================================
+ Coverage 68.22% 68.62% +0.40%
==========================================
Files 1090 1142 +52
Lines 241442 246283 +4841
Branches 25149 25830 +681
==========================================
+ Hits 164719 169009 +4290
- Misses 70156 70638 +482
- Partials 6567 6636 +69
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Added some comments after an initial pass.
// Experiment api. | ||
var pipeline, tuner; | ||
|
||
var experiment = pipeline.CreateExperiment(trainTime = 100, trainDataset = "train.csv", split = "cv", folds = 10, metric = "AUC", tuner = tuner, monitor = monitor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to have an overload that takes in a class ExperimentOptions
where you can define these parameters:
var experimentOptions = new ExperimentOptions
{
TrainTime = 100,
TrainDataset = "train.csv",
Split = "cv"
Folds = 10,
Metric = "AUC",
tuner = tuner,
monitor = monitor
};
var experiment = pipeline.CreateExperiment(experimentOptions);
For the split parameter, it'd be good to have something like an enum of some sort so you have the option of using it like this:
Split = Split.CV
Same thing for the metric, it'd be great to leverage the existing metric classes. For example, for binary classification:
Metric = BinaryClassificationMetrics.AreaUnderRocCurve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hold my opinion on metric class (using existing metric class). Using string will actually be easier because
- mlnet have different metric classes for different scenario, which makes it hard for coding unless we end up with having different api for creating experiments for different scenarios as well
- using string allows us to add metrics that not supported by mlnet. (like forecasting, which doesn't have a evaluate metric)
The only downside is little restriction on input, but that can be solved by documents or having a metric enum instead
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
We are excited to review your PR.
So we can do the best job, please check:
Fixes #nnnn
in your description to cause GitHub to automatically close the issue(s) when your PR is merged.This proposal provides an easy way to create and train an AutoML experiment using sweepable pipeline
#5993
A quick notebook example for Experiment API
AutoMLE2EWithTable.ipynb.txt