@@ -106,52 +106,87 @@ jupyter notebook
106
106
<a name =" samples " ></a >
107
107
# Automated ML SDK Sample Notebooks
108
108
109
- - [ auto-ml-classification-credit-card-fraud.ipynb] ( classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb )
110
- - Dataset: Kaggle's [ credit card fraud detection dataset] ( https://www.kaggle.com/mlg-ulb/creditcardfraud )
111
- - Simple example of using automated ML for classification to fraudulent credit card transactions
112
- - Uses azure compute for training
113
-
114
- - [ auto-ml-regression.ipynb] ( regression/auto-ml-regression.ipynb )
115
- - Dataset: Hardware Performance Dataset
116
- - Simple example of using automated ML for regression
117
- - Uses azure compute for training
118
-
119
- - [ auto-ml-regression-explanation-featurization.ipynb] ( regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb )
109
+ ## Classification
110
+ - ** Classify Credit Card Fraud**
111
+ - Dataset: [ Kaggle's credit card fraud detection dataset] ( https://www.kaggle.com/mlg-ulb/creditcardfraud )
112
+ - ** [ Jupyter Notebook (remote run)] ( classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb ) **
113
+ - run the experiment remotely on AML Compute cluster
114
+ - test the performance of the best model in the local environment
115
+ - ** [ Jupyter Notebook (local run)] ( local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb ) **
116
+ - run experiment in the local environment
117
+ - use Mimic Explainer for computing feature importance
118
+ - deploy the best model along with the explainer to an Azure Kubernetes (AKS) cluster, which will compute the raw and engineered feature importances at inference time
119
+ - ** Predict Term Deposit Subscriptions in a Bank**
120
+ - Dataset: [ UCI's bank marketing dataset] ( https://www.kaggle.com/janiobachmann/bank-marketing-dataset )
121
+ - ** [ Jupyter Notebook] ( classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb ) **
122
+ - run experiment remotely on AML Compute cluster to generate ONNX compatible models
123
+ - view the featurization steps that were applied during training
124
+ - view feature importance for the best model
125
+ - download the best model in ONNX format and use it for inferencing using ONNXRuntime
126
+ - deploy the best model in PKL format to Azure Container Instance (ACI)
127
+ - ** Predict Newsgroup based on Text from News Article**
128
+ - Dataset: [ 20 newsgroups text dataset] ( https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html )
129
+ - ** [ Jupyter Notebook] ( classification-text-dnn/auto-ml-classification-text-dnn.ipynb ) **
130
+ - AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data
131
+ - AutoML will use Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used
132
+ - Bidirectional Long-Short Term neural network (BiLSTM) will be utilized when a CPU compute is used, thereby optimizing the choice of DNN
133
+
134
+ ## Regression
135
+ - ** Predict Performance of Hardware Parts**
120
136
- Dataset: Hardware Performance Dataset
121
- - Shows featurization and excplanation
122
- - Uses azure compute for training
123
-
124
- - [ auto-ml-forecasting-energy-demand.ipynb] ( forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb )
125
- - Dataset: [ NYC energy demand data] ( forecasting-a/nyc_energy.csv )
126
- - Example of using automated ML for training a forecasting model
127
-
128
- - [ auto-ml-classification-credit-card-fraud-local.ipynb] ( local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb )
129
- - Dataset: Kaggle's [ credit card fraud detection dataset] ( https://www.kaggle.com/mlg-ulb/creditcardfraud )
130
- - Simple example of using automated ML for classification to fraudulent credit card transactions
131
- - Uses local compute for training
132
-
133
- - [ auto-ml-classification-bank-marketing-all-features.ipynb] ( classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb )
134
- - Dataset: UCI's [ bank marketing dataset] ( https://www.kaggle.com/janiobachmann/bank-marketing-dataset )
135
- - Simple example of using automated ML for classification to predict term deposit subscriptions for a bank
136
- - Uses azure compute for training
137
-
138
- - [ auto-ml-forecasting-orange-juice-sales.ipynb] ( forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb )
139
- - Dataset: [ Dominick's grocery sales of orange juice] ( forecasting-b/dominicks_OJ.csv )
140
- - Example of training an automated ML forecasting model on multiple time-series
141
-
142
- - [ auto-ml-forecasting-bike-share.ipynb] ( forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb )
143
- - Dataset: forecasting for a bike-sharing
144
- - Example of training an automated ML forecasting model on multiple time-series
145
-
146
- - [ auto-ml-forecasting-function.ipynb] ( forecasting-forecast-function/auto-ml-forecasting-function.ipynb )
147
- - Example of training an automated ML forecasting model on multiple time-series
148
-
149
- - [ auto-ml-forecasting-beer-remote.ipynb] ( forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb )
150
- - Example of training an automated ML forecasting model on multiple time-series
151
- - Beer Production Forecasting
152
-
153
- - [ auto-ml-continuous-retraining.ipynb] ( continuous-retraining/auto-ml-continuous-retraining.ipynb )
154
- - Continuous retraining using Pipelines and Time-Series TabularDataset
137
+ - ** [ Jupyter Notebook] ( regression/auto-ml-regression.ipynb ) **
138
+ - run the experiment remotely on AML Compute cluster
139
+ - get best trained model for a different metric than the one the experiment was optimized for
140
+ - test the performance of the best model in the local environment
141
+ - ** [ Jupyter Notebook (advanced)] ( regression/auto-ml-regression.ipynb ) **
142
+ - run the experiment remotely on AML Compute cluster
143
+ - customize featurization: override column purpose within the dataset, configure transformer parameters
144
+ - get best trained model for a different metric than the one the experiment was optimized for
145
+ - run a model explanation experiment on the remote cluster
146
+ - deploy the model along the explainer and run online inferencing
147
+
148
+ ## Time Series Forecasting
149
+ - ** Forecast Energy Demand**
150
+ - Dataset: [ NYC energy demand data] ( http://mis.nyiso.com/public/P-58Blist.htm )
151
+ - ** [ Jupyter Notebook] ( forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb ) **
152
+ - run experiment remotely on AML Compute cluster
153
+ - use lags and rolling window features
154
+ - view the featurization steps that were applied during training
155
+ - get the best model, use it to forecast on test data and compare the accuracy of predictions against real data
156
+ - ** Forecast Orange Juice Sales (Multi-Series)**
157
+ - Dataset: [ Dominick's grocery sales of orange juice] ( forecasting-orange-juice-sales/dominicks_OJ.csv )
158
+ - ** [ Jupyter Notebook] ( forecasting-orange-juice-sales/dominicks_OJ.csv ) **
159
+ - run experiment remotely on AML Compute cluster
160
+ - customize time-series featurization, change column purpose and override transformer hyper parameters
161
+ - evaluate locally the performance of the generated best model
162
+ - deploy the best model as a webservice on Azure Container Instance (ACI)
163
+ - get online predictions from the deployed model
164
+ - ** Forecast Demand of a Bike-Sharing Service**
165
+ - Dataset: [ Bike demand data] ( forecasting-bike-share/bike-no.csv )
166
+ - ** [ Jupyter Notebook] ( forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb ) **
167
+ - run experiment remotely on AML Compute cluster
168
+ - integrate holiday features
169
+ - run rolling forecast for test set that is longer than the forecast horizon
170
+ - compute metrics on the predictions from the remote forecast
171
+ - ** The Forecast Function Interface**
172
+ - Dataset: Generated for sample purposes
173
+ - ** [ Jupyter Notebook] ( forecasting-forecast-function/auto-ml-forecasting-function.ipynb ) **
174
+ - train a forecaster using a remote AML Compute cluster
175
+ - capabilities of forecast function (e.g. forecast farther into the horizon)
176
+ - generate confidence intervals
177
+ - ** Forecast Beverage Production**
178
+ - Dataset: [ Monthly beer production data] ( forecasting-beer-remote/Beer_no_valid_split_train.csv )
179
+ - ** [ Jupyter Notebook] ( forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb ) **
180
+ - train using a remote AML Compute cluster
181
+ - enable the DNN learning model
182
+ - forecast on a remote compute cluster and compare different model performance
183
+ - ** Continuous Retraining with NOAA Weather Data**
184
+ - Dataset: [ NOAA weather data from Azure Open Datasets] ( https://azure.microsoft.com/en-us/services/open-datasets/ )
185
+ - ** [ Jupyter Notebook] ( continuous-retraining/auto-ml-continuous-retraining.ipynb ) **
186
+ - continuously retrain a model using Pipelines and AutoML
187
+ - create a Pipeline to upload a time series dataset to an Azure blob
188
+ - create a Pipeline to run an AutoML experiment and register the best resulting model in the Workspace
189
+ - publish the training pipeline created and schedule it to run daily
155
190
156
191
<a name =" documentation " ></a >
157
192
See [ Configure automated machine learning experiments] ( https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train ) to learn how more about the the settings and features available for automated machine learning experiments.
0 commit comments