Skip to content

Commit d345ff6

Browse files
committed
update samples from Release-74 as a part of SDK release
1 parent 560dcac commit d345ff6

File tree

6 files changed

+793
-45
lines changed

6 files changed

+793
-45
lines changed

how-to-use-azureml/automated-machine-learning/README.md

Lines changed: 80 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -106,52 +106,87 @@ jupyter notebook
106106
<a name="samples"></a>
107107
# Automated ML SDK Sample Notebooks
108108

109-
- [auto-ml-classification-credit-card-fraud.ipynb](classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb)
110-
- Dataset: Kaggle's [credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
111-
- Simple example of using automated ML for classification to fraudulent credit card transactions
112-
- Uses azure compute for training
113-
114-
- [auto-ml-regression.ipynb](regression/auto-ml-regression.ipynb)
115-
- Dataset: Hardware Performance Dataset
116-
- Simple example of using automated ML for regression
117-
- Uses azure compute for training
118-
119-
- [auto-ml-regression-explanation-featurization.ipynb](regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb)
109+
## Classification
110+
- **Classify Credit Card Fraud**
111+
- Dataset: [Kaggle's credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
112+
- **[Jupyter Notebook (remote run)](classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb)**
113+
- run the experiment remotely on AML Compute cluster
114+
- test the performance of the best model in the local environment
115+
- **[Jupyter Notebook (local run)](local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb)**
116+
- run experiment in the local environment
117+
- use Mimic Explainer for computing feature importance
118+
- deploy the best model along with the explainer to an Azure Kubernetes (AKS) cluster, which will compute the raw and engineered feature importances at inference time
119+
- **Predict Term Deposit Subscriptions in a Bank**
120+
- Dataset: [UCI's bank marketing dataset](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)
121+
- **[Jupyter Notebook](classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb)**
122+
- run experiment remotely on AML Compute cluster to generate ONNX compatible models
123+
- view the featurization steps that were applied during training
124+
- view feature importance for the best model
125+
- download the best model in ONNX format and use it for inferencing using ONNXRuntime
126+
- deploy the best model in PKL format to Azure Container Instance (ACI)
127+
- **Predict Newsgroup based on Text from News Article**
128+
- Dataset: [20 newsgroups text dataset](https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html)
129+
- **[Jupyter Notebook](classification-text-dnn/auto-ml-classification-text-dnn.ipynb)**
130+
- AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data
131+
- AutoML will use Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used
132+
- Bidirectional Long-Short Term neural network (BiLSTM) will be utilized when a CPU compute is used, thereby optimizing the choice of DNN
133+
134+
## Regression
135+
- **Predict Performance of Hardware Parts**
120136
- Dataset: Hardware Performance Dataset
121-
- Shows featurization and excplanation
122-
- Uses azure compute for training
123-
124-
- [auto-ml-forecasting-energy-demand.ipynb](forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb)
125-
- Dataset: [NYC energy demand data](forecasting-a/nyc_energy.csv)
126-
- Example of using automated ML for training a forecasting model
127-
128-
- [auto-ml-classification-credit-card-fraud-local.ipynb](local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb)
129-
- Dataset: Kaggle's [credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
130-
- Simple example of using automated ML for classification to fraudulent credit card transactions
131-
- Uses local compute for training
132-
133-
- [auto-ml-classification-bank-marketing-all-features.ipynb](classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb)
134-
- Dataset: UCI's [bank marketing dataset](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)
135-
- Simple example of using automated ML for classification to predict term deposit subscriptions for a bank
136-
- Uses azure compute for training
137-
138-
- [auto-ml-forecasting-orange-juice-sales.ipynb](forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb)
139-
- Dataset: [Dominick's grocery sales of orange juice](forecasting-b/dominicks_OJ.csv)
140-
- Example of training an automated ML forecasting model on multiple time-series
141-
142-
- [auto-ml-forecasting-bike-share.ipynb](forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb)
143-
- Dataset: forecasting for a bike-sharing
144-
- Example of training an automated ML forecasting model on multiple time-series
145-
146-
- [auto-ml-forecasting-function.ipynb](forecasting-forecast-function/auto-ml-forecasting-function.ipynb)
147-
- Example of training an automated ML forecasting model on multiple time-series
148-
149-
- [auto-ml-forecasting-beer-remote.ipynb](forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb)
150-
- Example of training an automated ML forecasting model on multiple time-series
151-
- Beer Production Forecasting
152-
153-
- [auto-ml-continuous-retraining.ipynb](continuous-retraining/auto-ml-continuous-retraining.ipynb)
154-
- Continuous retraining using Pipelines and Time-Series TabularDataset
137+
- **[Jupyter Notebook](regression/auto-ml-regression.ipynb)**
138+
- run the experiment remotely on AML Compute cluster
139+
- get best trained model for a different metric than the one the experiment was optimized for
140+
- test the performance of the best model in the local environment
141+
- **[Jupyter Notebook (advanced)](regression/auto-ml-regression.ipynb)**
142+
- run the experiment remotely on AML Compute cluster
143+
- customize featurization: override column purpose within the dataset, configure transformer parameters
144+
- get best trained model for a different metric than the one the experiment was optimized for
145+
- run a model explanation experiment on the remote cluster
146+
- deploy the model along the explainer and run online inferencing
147+
148+
## Time Series Forecasting
149+
- **Forecast Energy Demand**
150+
- Dataset: [NYC energy demand data](http://mis.nyiso.com/public/P-58Blist.htm)
151+
- **[Jupyter Notebook](forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb)**
152+
- run experiment remotely on AML Compute cluster
153+
- use lags and rolling window features
154+
- view the featurization steps that were applied during training
155+
- get the best model, use it to forecast on test data and compare the accuracy of predictions against real data
156+
- **Forecast Orange Juice Sales (Multi-Series)**
157+
- Dataset: [Dominick's grocery sales of orange juice](forecasting-orange-juice-sales/dominicks_OJ.csv)
158+
- **[Jupyter Notebook](forecasting-orange-juice-sales/dominicks_OJ.csv)**
159+
- run experiment remotely on AML Compute cluster
160+
- customize time-series featurization, change column purpose and override transformer hyper parameters
161+
- evaluate locally the performance of the generated best model
162+
- deploy the best model as a webservice on Azure Container Instance (ACI)
163+
- get online predictions from the deployed model
164+
- **Forecast Demand of a Bike-Sharing Service**
165+
- Dataset: [Bike demand data](forecasting-bike-share/bike-no.csv)
166+
- **[Jupyter Notebook](forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb)**
167+
- run experiment remotely on AML Compute cluster
168+
- integrate holiday features
169+
- run rolling forecast for test set that is longer than the forecast horizon
170+
- compute metrics on the predictions from the remote forecast
171+
- **The Forecast Function Interface**
172+
- Dataset: Generated for sample purposes
173+
- **[Jupyter Notebook](forecasting-forecast-function/auto-ml-forecasting-function.ipynb)**
174+
- train a forecaster using a remote AML Compute cluster
175+
- capabilities of forecast function (e.g. forecast farther into the horizon)
176+
- generate confidence intervals
177+
- **Forecast Beverage Production**
178+
- Dataset: [Monthly beer production data](forecasting-beer-remote/Beer_no_valid_split_train.csv)
179+
- **[Jupyter Notebook](forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb)**
180+
- train using a remote AML Compute cluster
181+
- enable the DNN learning model
182+
- forecast on a remote compute cluster and compare different model performance
183+
- **Continuous Retraining with NOAA Weather Data**
184+
- Dataset: [NOAA weather data from Azure Open Datasets](https://azure.microsoft.com/en-us/services/open-datasets/)
185+
- **[Jupyter Notebook](continuous-retraining/auto-ml-continuous-retraining.ipynb)**
186+
- continuously retrain a model using Pipelines and AutoML
187+
- create a Pipeline to upload a time series dataset to an Azure blob
188+
- create a Pipeline to run an AutoML experiment and register the best resulting model in the Workspace
189+
- publish the training pipeline created and schedule it to run daily
155190

156191
<a name="documentation"></a>
157192
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.

0 commit comments

Comments
 (0)