Skip to content

update samples from Release-79 as a part of SDK release #1255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions NBSETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ git clone https://github.com/Azure/MachineLearningNotebooks.git
pip install azureml-sdk[notebooks,tensorboard]

# install model explainability component
pip install azureml-sdk[explain]
pip install azureml-sdk[interpret]

# install automated ml components
pip install azureml-sdk[automl]
Expand Down Expand Up @@ -86,7 +86,7 @@ If you need additional Azure ML SDK components, you can either modify the Docker
pip install azureml-sdk[automl]

# install the core SDK and model explainability component
pip install azureml-sdk[explain]
pip install azureml-sdk[interpret]

# install the core SDK and experimental components
pip install azureml-sdk[contrib]
Expand Down
2 changes: 1 addition & 1 deletion configuration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
Expand Down
9 changes: 6 additions & 3 deletions contrib/fairness/fairlearn-azureml-mitigation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"## Introduction\n",
"This notebook shows how to use [Fairlearn (an open source fairness assessment and unfairness mitigation package)](http://fairlearn.github.io) and Azure Machine Learning Studio for a binary classification problem. This example uses the well-known adult census dataset. For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan. Its purpose is purely illustrative of a workflow including a fairness dashboard - in particular, we do **not** include a full discussion of the detailed issues which arise when considering fairness in machine learning. For such discussions, please [refer to the Fairlearn website](http://fairlearn.github.io/).\n",
"\n",
"We will apply the [grid search algorithm](https://fairlearn.github.io/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch) from the Fairlearn package using a specific notion of fairness called Demographic Parity. This produces a set of models, and we will view these in a dashboard both locally and in the Azure Machine Learning Studio.\n",
"We will apply the [grid search algorithm](https://fairlearn.github.io/master/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch) from the Fairlearn package using a specific notion of fairness called Demographic Parity. This produces a set of models, and we will view these in a dashboard both locally and in the Azure Machine Learning Studio.\n",
"\n",
"### Setup\n",
"\n",
Expand Down Expand Up @@ -98,8 +98,11 @@
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_openml\n",
"data = fetch_openml(data_id=1590, as_frame=True)\n",
"from utilities import fetch_openml_with_retries\n",
"\n",
"data = fetch_openml_with_retries(data_id=1590)\n",
" \n",
"# Extract the items we want\n",
"X_raw = data.data\n",
"Y = (data.target == '>50K') * 1\n",
"\n",
Expand Down
7 changes: 5 additions & 2 deletions contrib/fairness/upload-fairness-dashboard.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,11 @@
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_openml\n",
"data = fetch_openml(data_id=1590, as_frame=True)\n",
"from utilities import fetch_openml_with_retries\n",
"\n",
"data = fetch_openml_with_retries(data_id=1590)\n",
" \n",
"# Extract the items we want\n",
"X_raw = data.data\n",
"Y = (data.target == '>50K') * 1"
]
Expand Down
28 changes: 28 additions & 0 deletions contrib/fairness/utilities.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------

"""Utilities for azureml-contrib-fairness notebooks."""

from sklearn.datasets import fetch_openml
import time


def fetch_openml_with_retries(data_id, max_retries=4, retry_delay=60):
"""Fetch a given dataset from OpenML with retries as specified."""
for i in range(max_retries):
try:
print("Download attempt {0} of {1}".format(i + 1, max_retries))
data = fetch_openml(data_id=data_id, as_frame=True)
break
except Exception as e:
print("Download attempt failed with exception:")
print(e)
if i + 1 != max_retries:
print("Will retry after {0} seconds".format(retry_delay))
time.sleep(retry_delay)
retry_delay = retry_delay * 2
else:
raise RuntimeError("Unable to download dataset from OpenML")

return data
6 changes: 3 additions & 3 deletions how-to-use-azureml/automated-machine-learning/automl_env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- pip<=19.3.1
- python>=3.5.2,<3.6.8
- python>=3.5.2,<3.8
- nb_conda
- boto3==1.15.18
- matplotlib==2.1.0
Expand All @@ -21,8 +21,8 @@ dependencies:

- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.18.0
- azureml-widgets~=1.19.0
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.18.0/validated_win32_requirements.txt [--no-deps]
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.19.0/validated_win32_requirements.txt [--no-deps]
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- pip<=19.3.1
- python>=3.5.2,<3.6.8
- python>=3.5.2,<3.8
- nb_conda
- boto3==1.15.18
- matplotlib==2.1.0
Expand All @@ -21,9 +21,9 @@ dependencies:

- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.18.0
- azureml-widgets~=1.19.0
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.18.0/validated_linux_requirements.txt [--no-deps]
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.19.0/validated_linux_requirements.txt [--no-deps]

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ dependencies:
# Currently Azure ML only supports 3.5.2 and later.
- pip<=19.3.1
- nomkl
- python>=3.5.2,<3.6.8
- python>=3.5.2,<3.8
- nb_conda
- boto3==1.15.18
- matplotlib==2.1.0
Expand All @@ -22,8 +22,8 @@ dependencies:

- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.18.0
- azureml-widgets~=1.19.0
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.18.0/validated_darwin_requirements.txt [--no-deps]
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.19.0/validated_darwin_requirements.txt [--no-deps]
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
Expand Down Expand Up @@ -298,7 +298,7 @@
" compute_target=compute_target,\n",
" training_data=train_dataset,\n",
" label_column_name=target_column_name,\n",
" blocked_models = ['LightGBM'],\n",
" blocked_models = ['LightGBM', 'XGBoostClassifier'],\n",
" **automl_settings\n",
" )"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
"import logging\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import json\n",
"import numpy as np\n",
"import pandas as pd\n",
" \n",
Expand All @@ -92,7 +93,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
Expand Down Expand Up @@ -322,6 +323,24 @@
"print(best_run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Show hyperparameters\n",
"Show the model pipeline used for the best run with its hyperparameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run_properties = json.loads(best_run.get_details()['properties']['pipeline_script'])\n",
"print(json.dumps(run_properties, indent = 1)) "
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
from azureml.core.conda_dependencies import CondaDependencies
from azureml.train.estimator import Estimator
from azureml.core.run import Run
from azureml.automl.core.shared import constants


def split_fraction_by_grain(df, fraction, time_column_name,
grain_column_names=None):

if not grain_column_names:
df['tmp_grain_column'] = 'grain'
grain_column_names = ['tmp_grain_column']
Expand All @@ -17,10 +17,10 @@ def split_fraction_by_grain(df, fraction, time_column_name,
.groupby(grain_column_names, group_keys=False))

df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-int(len(dfg) *
fraction)] if fraction > 0 else dfg)
fraction)] if fraction > 0 else dfg)

df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-int(len(dfg) *
fraction):] if fraction > 0 else dfg[:0])
fraction):] if fraction > 0 else dfg[:0])

if 'tmp_grain_column' in grain_column_names:
for df2 in (df, df_head, df_tail):
Expand Down Expand Up @@ -59,11 +59,13 @@ def get_result_df(remote_run):
'primary_metric', 'Score'])
goal_minimize = False
for run in children:
if('run_algorithm' in run.properties and 'score' in run.properties):
if run.get_status().lower() == constants.RunState.COMPLETE_RUN \
and 'run_algorithm' in run.properties and 'score' in run.properties:
# We only count in the completed child runs.
summary_df[run.id] = [run.id, run.properties['run_algorithm'],
run.properties['primary_metric'],
float(run.properties['score'])]
if('goal' in run.properties):
if ('goal' in run.properties):
goal_minimize = run.properties['goal'].split('_')[-1] == 'min'

summary_df = summary_df.T.sort_values(
Expand Down Expand Up @@ -118,7 +120,6 @@ def run_multiple_inferences(summary_df, train_experiment, test_experiment,
compute_target, script_folder, test_dataset,
lookback_dataset, max_horizon, target_column_name,
time_column_name, freq):

for run_name, run_summary in summary_df.iterrows():
print(run_name)
print(run_summary)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,22 +1,24 @@
import argparse
import azureml.train.automl
from azureml.core import Run
from azureml.core import Dataset, Run
from sklearn.externals import joblib


parser = argparse.ArgumentParser()
parser.add_argument(
'--target_column_name', type=str, dest='target_column_name',
help='Target Column Name')
parser.add_argument(
'--test_dataset', type=str, dest='test_dataset',
help='Test Dataset')

args = parser.parse_args()
target_column_name = args.target_column_name
test_dataset_id = args.test_dataset

run = Run.get_context()
# get input dataset by name
test_dataset = run.input_datasets['test_data']
ws = run.experiment.workspace

df = test_dataset.to_pandas_dataframe().reset_index(drop=True)
# get the input dataset by id
test_dataset = Dataset.get_by_id(ws, id=test_dataset_id)

X_test_df = test_dataset.drop_columns(columns=[target_column_name]).to_pandas_dataframe().reset_index(drop=True)
y_test_df = test_dataset.with_timestamp_columns(None).keep_columns(columns=[target_column_name]).to_pandas_dataframe()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,29 +1,32 @@
from azureml.train.estimator import Estimator
from azureml.core import ScriptRunConfig


def run_rolling_forecast(test_experiment, compute_target, train_run, test_dataset,
target_column_name, inference_folder='./forecast'):
def run_rolling_forecast(test_experiment, compute_target, train_run,
test_dataset, target_column_name,
inference_folder='./forecast'):
train_run.download_file('outputs/model.pkl',
inference_folder + '/model.pkl')

inference_env = train_run.get_environment()

est = Estimator(source_directory=inference_folder,
entry_script='forecasting_script.py',
script_params={
'--target_column_name': target_column_name
},
inputs=[test_dataset.as_named_input('test_data')],
compute_target=compute_target,
environment_definition=inference_env)
config = ScriptRunConfig(source_directory=inference_folder,
script='forecasting_script.py',
arguments=['--target_column_name',
target_column_name,
'--test_dataset',
test_dataset.as_named_input(test_dataset.name)],
compute_target=compute_target,
environment=inference_env)

run = test_experiment.submit(est,
tags={
'training_run_id': train_run.id,
'run_algorithm': train_run.properties['run_algorithm'],
'valid_score': train_run.properties['score'],
'primary_metric': train_run.properties['primary_metric']
})
run = test_experiment.submit(config,
tags={'training_run_id':
train_run.id,
'run_algorithm':
train_run.properties['run_algorithm'],
'valid_score':
train_run.properties['score'],
'primary_metric':
train_run.properties['primary_metric']})

run.log("run_algorithm", run.tags['run_algorithm'])
return run
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
Expand Down
Loading