Skip to content

add AutoML Sweepable API notebook #69

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 23, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
277 changes: 277 additions & 0 deletions machine-learning/05 - AutoML Sweepable API.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## AutoML Sweepable API\n",
"\n",
"This Notebook shows how to use `Sweepable` API to fully customize the pipeline or search space in your AutoML task. In this notebook, you will learn\n",
"- use built-in `SweepableEstimator` to simplify your work.\n",
"- how to use `AutoML().CreateSweepableEstimator` to create `SweepableEstimator`.\n",
"- how to create `SweepablePipeline` for multiple trainer candidates.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install Nuget packages and add using statement"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"// using nightly-build\n",
"#i \"nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json\"\n",
"#r \"nuget: Plotly.NET.Interactive, 3.0.2\"\n",
"#r \"nuget: Plotly.NET.CSharp, 0.0.1\"\n",
"#r \"nuget: Microsoft.ML.AutoML, 0.20.0-preview.22470.1\"\n",
"#r \"nuget: Microsoft.Data.Analysis, 0.20.0-preview.22470.1\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"using static Microsoft.DotNet.Interactive.Formatting.PocketViewTags;\n",
"using Microsoft.Data.Analysis;\n",
"using System;\n",
"using System.IO;\n",
"using Microsoft.ML;\n",
"using Microsoft.ML.AutoML;\n",
"using Microsoft.ML.AutoML.CodeGen;\n",
"using Microsoft.ML.Trainers.LightGbm;\n",
"using Microsoft.ML.Data;\n",
"using Plotly.NET;\n",
"using Microsoft.ML.Transforms.TimeSeries;\n",
"using Microsoft.ML.SearchSpace;\n",
"using System.Diagnostics;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use built-in sweepable estimators\n",
"\n",
"`AutoML` provides built-in sweepable estimator candidates for binary-classification, multi-class classification and regression. For those scenarios, you can simply use those candidates instead of creating `SweepableEstimator` from scratch."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var regressionTrainerCandidates = context.Auto().Regression();\n",
"var binaryClassificationTrainerCandidates = context.Auto().BinaryClassification();\n",
"var multiclassClassificationTrainerCandidates = context.Auto().MultiClassification();"
]
},
{
"cell_type": "markdown",
"metadata": {
"dotnet_interactive": {
"language": "csharp"
}
},
"source": [
"#### Use `AutoML().CreateSweepableEstimator` to create `SweepableEstimator`\n",
"\n",
"In case the built-in `SweepableEstimator` doesn't satisfy your requirement, you can call `CreateSweepableEstimator` to create a customized `SweepableEstimator`. A `SweepableEstimator` is nothing different than a normal `Estimator` plus `SearchSpace`. The following code shows how to create a sweepable `LightGbm` and `SDCA`.\n",
"\n",
"For simplicity, the built-in search space for `LightGbm` and `SDCA` is used but you can fully customize the search space however way you want. For more details on how to do that, please check [Parameter And SearchSpace](./Parameter%20and%20SearchSpace.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var context = new MLContext();\n",
"var lgbmSearchSpace = new SearchSpace<LgbmOption>();\n",
"var sweepableLgbm = context.Auto().CreateSweepableEstimator((context, param) => {\n",
" var option = new LightGbmRegressionTrainer.Options()\n",
" {\n",
" NumberOfLeaves = param.NumberOfLeaves,\n",
" NumberOfIterations = param.NumberOfTrees,\n",
" MinimumExampleCountPerLeaf = param.MinimumExampleCountPerLeaf,\n",
" LearningRate = param.LearningRate,\n",
" LabelColumnName = \"Label\",\n",
" FeatureColumnName = \"Features\",\n",
" Booster = new GradientBooster.Options()\n",
" {\n",
" SubsampleFraction = param.SubsampleFraction,\n",
" FeatureFraction = param.FeatureFraction,\n",
" L1Regularization = param.L1Regularization,\n",
" L2Regularization = param.L2Regularization,\n",
" },\n",
" MaximumBinCountPerFeature = param.MaximumBinCountPerFeature,\n",
" };\n",
"\n",
" return context.Regression.Trainers.LightGbm(option);\n",
"}, lgbmSearchSpace);\n",
"\n",
"var sdcaSearchSpace = new SearchSpace<SdcaOption>();\n",
"var sweepableSdca = context.Auto().CreateSweepableEstimator((context, param) => {\n",
" return context.Regression.Trainers.Sdca(\"Label\", \"Features\", l1Regularization: param.L1Regularization, l2Regularization: param.L2Regularization);\n",
"}, sdcaSearchSpace);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create `SweepablePipeline` with multiple trainer candidates.\n",
"\n",
"`SweepablePipeline` allows you to put multiple estimators as candidates to a certain stage. During AutoML sweeping, these candidates will be evaluated seperatly and the one with best metric will be picked. Note that the estimator doesn't necessarily need to be a trainer, it can be a trainer, transformer or even a `SweepablePipeline`, as long as they all have the same input and output schema.\n",
"\n",
"The following code shows how to create a `SweepablePipeline` with `sweepableSdca` and `sweepableLgbm` we created above."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var sweepablePipeline = context.Transforms.Concatenate(\"Features\", \"X1\", \"X2\")\n",
" .Append(sweepableSdca, sweepableLgbm);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Config `AutoMLExperiment` using `sweepablePipeline`\n",
"In the next step, we are going to train `sweepablePipeline` on a generated non-linear dataset using `AutoMLExperiment`, which will sweeping both `sdca` and `lightGbm` on configured search space. Considering that `sdca` is a linear classifier, the winning model should be `lightGbm`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var rand = new Random(0);\n",
"var context =new MLContext(seed: 1);\n",
"var x1 = Enumerable.Range(0, 1000).Select(_x => rand.NextSingle() * 100).ToArray();\n",
"var x2 = x1.Select(_x => rand.NextSingle() * 100).ToArray();\n",
"var y = Enumerable.Zip(x1, x2).Select(_x => _x.Second * _x.First + (rand.NextSingle() - 0.5f) * 10).ToArray();\n",
"var df = new DataFrame();\n",
"df[\"X1\"] = DataFrameColumn.Create(\"X1\", x1);\n",
"df[\"X2\"] = DataFrameColumn.Create(\"X2\", x2);\n",
"df[\"Label\"] = DataFrameColumn.Create(\"Label\", y);\n",
"var trainTestSplit = context.Data.TrainTestSplit(df);\n",
"df.Head(10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var monitor = new NotebookMonitor(sweepablePipeline);\n",
"var experiment = context.Auto().CreateExperiment();\n",
"experiment.SetDataset(df, 5)\n",
" .SetPipeline(sweepablePipeline)\n",
" .SetTrainingTimeInSeconds(50)\n",
" .SetRegressionMetric(RegressionMetric.RootMeanSquaredError)\n",
" .SetMonitor(monitor);\n",
"\n",
"// Configure Visualizer\t\t\t\n",
"monitor.SetUpdate(monitor.Display());\n",
"\n",
"var res = await experiment.RunAsync();\n",
"\n",
"// check the type of last trainer for winning model, which should be lightGbm\n",
"(res.Model as TransformerChain<ITransformer>).Last().GetType()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### See also\n",
"- [Training and AutoML](./03-Training%20and%20AutoML.ipynb)\n",
"- [Regression with Taxi Dataset](./E2E-Regression%20with%20Taxi%20Dataset.ipynb)\n",
"- [Classification with Iris Dataset](./E2E-Classification%20with%20Iris%20Dataset.ipynb)\n",
"- [Kaggle with Titanic Dataset](./REF-Kaggle%20with%20Titanic%20Dataset.ipynb)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".NET (C#)",
"language": "C#",
"name": ".net-csharp"
},
"language_info": {
"file_extension": ".cs",
"mimetype": "text/x-csharp",
"name": "C#",
"pygments_lexer": "csharp",
"version": "9.0"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}