Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds more examples to customise AutoPyTorch. #124

Merged
merged 10 commits into from
Mar 16, 2021
6 changes: 4 additions & 2 deletions .github/workflows/examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ jobs:
echo "::set-output name=BEFORE::$(git status --porcelain -b)"
- name: Run tests
run: |
python examples/example_tabular_classification.py
python examples/example_tabular_regression.py
python examples/tabular/20_basics/example_tabular_classification.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure that the documentation also see this change?
For example, we have to change this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

python examples/tabular/20_basics/example_tabular_regression.py
python examples/tabular/40_advanced/example_custom_configuration_space.py
python examples/tabular/40_advanced/example_resampling_strategy.py
python examples/example_image_classification.py
41 changes: 25 additions & 16 deletions autoPyTorch/api/tabular_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,27 +27,36 @@ class TabularClassificationTask(BaseTask):
"""
Tabular Classification API to the pipelines.
Args:
seed (int): seed to be used for reproducibility.
n_jobs (int), (default=1): number of consecutive processes to spawn.
logging_config (Optional[Dict]): specifies configuration
for logging, if None, it is loaded from the logging.yaml
ensemble_size (int), (default=50): Number of models added to the ensemble built by
seed (int):
seed to be used for reproducibility.
n_jobs (int), (default=1):
number of consecutive processes to spawn.
logging_config (Optional[Dict]):
specifies configuration for logging, if None, it is loaded from the logging.yaml
ensemble_size (int), (default=50):
Number of models added to the ensemble built by
Ensemble selection from libraries of models.
Models are drawn with replacement.
ensemble_nbest (int), (default=50): only consider the ensemble_nbest
ensemble_nbest (int), (default=50):
only consider the ensemble_nbest
models to build the ensemble
max_models_on_disc (int), (default=50): maximum number of models saved to disc.
max_models_on_disc (int), (default=50):
maximum number of models saved to disc.
Also, controls the size of the ensemble as any additional models will be deleted.
Must be greater than or equal to 1.
temporary_directory (str): folder to store configuration output and log file
output_directory (str): folder to store predictions for optional test set
delete_tmp_folder_after_terminate (bool): determines whether to delete the temporary directory,
when finished
include_components (Optional[Dict]): If None, all possible components are used.
Otherwise specifies set of components to use.
exclude_components (Optional[Dict]): If None, all possible components are used.
Otherwise specifies set of components not to use. Incompatible with include
components
temporary_directory (str):
folder to store configuration output and log file
output_directory (str):
folder to store predictions for optional test set
delete_tmp_folder_after_terminate (bool):
determines whether to delete the temporary directory, when finished
include_components (Optional[Dict]):
If None, all possible components are used. Otherwise
specifies set of components to use.
exclude_components (Optional[Dict]):
If None, all possible components are used. Otherwise
specifies set of components not to use. Incompatible
with include components
"""
def __init__(
self,
Expand Down
2 changes: 1 addition & 1 deletion autoPyTorch/datasets/resampling_strategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def holdout_validation(val_share: float, indices: np.ndarray, **kwargs: Any) ->

def stratified_holdout_validation(val_share: float, indices: np.ndarray, **kwargs: Any) \
-> Tuple[np.ndarray, np.ndarray]:
train, val = train_test_split(indices, test_size=val_share, shuffle=False, stratify=kwargs["stratify"])
train, val = train_test_split(indices, test_size=val_share, shuffle=True, stratify=kwargs["stratify"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need random_state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but this we should do via a different PR. I had raised an issue for this in Lucas' repository LMZimmer/Auto-PyTorch_refactor#10.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, could you post in the issue page?

return train, val


Expand Down
82 changes: 74 additions & 8 deletions autoPyTorch/utils/hyperparameter_search_space_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,25 @@
from autoPyTorch.pipeline.components.base_component import autoPyTorchComponent


class HyperparameterSearchSpaceUpdate():
class HyperparameterSearchSpaceUpdate:
"""
Allows specifying update to the search space of a
particular hyperparameter.

Args:
node_name (str):
The name of the node in the pipeline
hyperparameter (str):
The name of the hyperparameter
value_range (Union[List, Tuple]):
In case of categorical hyperparameter, defines the new categorical choices.
In case of numerical hyperparameter, defines the new range
in the form of (LOWER, UPPER)
default_value (Union[int, float, str]):
nabenabe0928 marked this conversation as resolved.
Show resolved Hide resolved
New default value for the hyperparameter
log (bool) (default=False):
In case of numerical hyperparameters, whether to sample on a log scale
"""
def __init__(self, node_name: str, hyperparameter: str, value_range: Union[List, Tuple],
default_value: Union[int, float, str], log: bool = False) -> None:
self.node_name = node_name
Expand All @@ -16,6 +34,15 @@ def __init__(self, node_name: str, hyperparameter: str, value_range: Union[List,
self.default_value = default_value

def apply(self, pipeline: List[Tuple[str, Union[autoPyTorchComponent, autoPyTorchChoice]]]) -> None:
"""
Applies the update to the appropriate hyperparameter of the pipeline
Args:
pipeline (List[Tuple[str, Union[autoPyTorchComponent, autoPyTorchChoice]]]):
The named steps of the current autopytorch pipeline

Returns:
nabenabe0928 marked this conversation as resolved.
Show resolved Hide resolved
None
"""
[node[1]._apply_search_space_update(name=self.hyperparameter,
new_value_range=self.value_range,
log=self.log,
Expand All @@ -29,30 +56,69 @@ def __str__(self) -> str:
(" log" if self.log else ""))


class HyperparameterSearchSpaceUpdates():
class HyperparameterSearchSpaceUpdates:
""" Contains a collection of HyperparameterSearchSpaceUpdate """
def __init__(self, updates: Optional[List[HyperparameterSearchSpaceUpdate]] = None) -> None:
self.updates = updates if updates is not None else []

def apply(self, pipeline: List[Tuple[str, Union[autoPyTorchComponent, autoPyTorchChoice]]]) -> None:
"""
Iteratively applies updates to the pipeline

Args:
pipeline: (List[Tuple[str, Union[autoPyTorchComponent, autoPyTorchChoice]]]):
The named steps of the current autoPyTorch pipeline

Returns:
nabenabe0928 marked this conversation as resolved.
Show resolved Hide resolved
None
"""
for update in self.updates:
update.apply(pipeline)

def append(self, node_name: str, hyperparameter: str, value_range: Union[List, Tuple],
default_value: Union[int, float, str], log: bool = False) -> None:
"""
Add a new update

Args:
nabenabe0928 marked this conversation as resolved.
Show resolved Hide resolved
node_name (str):
The name of the node in the pipeline
hyperparameter (str):
The name of the hyperparameter
value_range (Union[List, Tuple]):
In case of categorical hyperparameter, defines the new categorical choices.
In case of numerical hyperparameter, defines the new range
in the form of (LOWER, UPPER)
default_value (Union[int, float, str]):
New default value for the hyperparameter
log (bool) (default=False):
In case of numerical hyperparameters, whether to sample on a log scale

Returns:
None
"""
self.updates.append(HyperparameterSearchSpaceUpdate(node_name=node_name,
hyperparameter=hyperparameter,
value_range=value_range,
default_value=default_value,
log=log))

def save_as_file(self, path: str) -> None:
"""
Save the updates as a file to reuse later

Args:
nabenabe0928 marked this conversation as resolved.
Show resolved Hide resolved
path (str): path of the file

Returns:
None
"""
with open(path, "w") as f:
with open(path, "w") as f:
for update in self.updates:
print(update.node_name, update.hyperparameter, # noqa: T001
str(update.value_range), "'{}'".format(update.default_value)
if isinstance(update.default_value, str) else update.default_value,
(" log" if update.log else ""), file=f)
for update in self.updates:
print(update.node_name, update.hyperparameter, # noqa: T001
str(update.value_range), "'{}'".format(update.default_value)
if isinstance(update.default_value, str) else update.default_value,
(" log" if update.log else ""), file=f)
nabenabe0928 marked this conversation as resolved.
Show resolved Hide resolved


def parse_hyperparameter_search_space_updates(updates_file: Optional[str]
Expand Down
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,9 @@

sphinx_gallery_conf = {
# path to the examples
'examples_dirs': '../examples',
'examples_dirs': ['../examples/tabular/20_basics', '../examples/tabular/40_advanced'],
# path where to save gallery generated examples
'gallery_dirs': 'examples',
'gallery_dirs': ['basics_tabular', 'advanced_tabular'],
#TODO: fix back/forward references for the examples.
#'doc_module': ('autoPyTorch'),
#'reference_url': {
Expand Down
8 changes: 8 additions & 0 deletions examples/tabular/20_basics/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. _examples_tabular_basics:


==============================
Basic Tabular Dataset Examples
==============================

Basic examples for using *Auto-PyTorch* on tabular datasets
Original file line number Diff line number Diff line change
Expand Up @@ -22,32 +22,10 @@
import sklearn.model_selection

from autoPyTorch.api.tabular_classification import TabularClassificationTask
from autoPyTorch.utils.hyperparameter_search_space_update import HyperparameterSearchSpaceUpdates


def get_search_space_updates():
"""
Search space updates to the task can be added using HyperparameterSearchSpaceUpdates
Returns:
HyperparameterSearchSpaceUpdates
"""
updates = HyperparameterSearchSpaceUpdates()
updates.append(node_name="data_loader",
hyperparameter="batch_size",
value_range=[16, 512],
default_value=32)
updates.append(node_name="lr_scheduler",
hyperparameter="CosineAnnealingLR:T_max",
value_range=[50, 60],
default_value=55)
updates.append(node_name='network_backbone',
hyperparameter='ResNetBackbone:dropout',
value_range=[0, 0.5],
default_value=0.2)
return updates
nabenabe0928 marked this conversation as resolved.
Show resolved Hide resolved


if __name__ == '__main__':

############################################################################
# Data Loading
# ============
Expand All @@ -62,16 +40,23 @@ def get_search_space_updates():
# Build and fit a classifier
# ==========================
api = TabularClassificationTask(
delete_tmp_folder_after_terminate=False,
search_space_updates=get_search_space_updates()
temporary_directory='./tmp/autoPyTorch_example_tmp_01',
output_directory='./tmp/autoPyTorch_example_out_01',
# To maintain logs of the run, set the next two as False
delete_tmp_folder_after_terminate=True,
delete_output_folder_after_terminate=True
)

############################################################################
# Search for an ensemble of machine learning algorithms
# =====================================================
api.search(
ravinkohli marked this conversation as resolved.
Show resolved Hide resolved
X_train=X_train,
y_train=y_train,
X_test=X_test.copy(),
y_test=y_test.copy(),
optimize_metric='accuracy',
total_walltime_limit=500,
total_walltime_limit=300,
func_eval_time_limit=50
)

Expand All @@ -82,4 +67,5 @@ def get_search_space_updates():
y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print(score)
# Print the final ensemble built by AutoPyTorch
print(api.show_models())
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,15 @@
Tabular Regression
======================

The following example shows how to fit a sample classification model
The following example shows how to fit a sample regression model
with AutoPyTorch
"""
import os
import tempfile as tmp
import typing
import warnings

from sklearn.datasets import make_regression

from autoPyTorch.data.tabular_feature_validator import TabularFeatureValidator
import sklearn.datasets
import sklearn.model_selection

os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()
os.environ['OMP_NUM_THREADS'] = '1'
Expand All @@ -23,54 +21,16 @@
warnings.simplefilter(action='ignore', category=UserWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)

from sklearn import model_selection, preprocessing

from autoPyTorch.api.tabular_regression import TabularRegressionTask
from autoPyTorch.datasets.tabular_dataset import TabularDataset
from autoPyTorch.utils.hyperparameter_search_space_update import HyperparameterSearchSpaceUpdates


def get_search_space_updates():
"""
Search space updates to the task can be added using HyperparameterSearchSpaceUpdates
Returns:
HyperparameterSearchSpaceUpdates
"""
updates = HyperparameterSearchSpaceUpdates()
updates.append(node_name="data_loader",
hyperparameter="batch_size",
value_range=[16, 512],
default_value=32)
updates.append(node_name="lr_scheduler",
hyperparameter="CosineAnnealingLR:T_max",
value_range=[50, 60],
default_value=55)
updates.append(node_name='network_backbone',
hyperparameter='ResNetBackbone:dropout',
value_range=[0, 0.5],
default_value=0.2)
return updates


if __name__ == '__main__':

############################################################################
# Data Loading
# ============

# Get the training data for tabular regression
# X, y = datasets.fetch_openml(name="cholesterol", return_X_y=True)

# Use dummy data for now since there are problems with categorical columns
X, y = make_regression(
n_samples=5000,
n_features=4,
n_informative=3,
n_targets=1,
shuffle=True,
random_state=0
)

X_train, X_test, y_train, y_test = model_selection.train_test_split(
X, y = sklearn.datasets.fetch_openml(name='boston', return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X,
y,
random_state=1,
Expand All @@ -89,16 +49,23 @@ def get_search_space_updates():
# Build and fit a regressor
# ==========================
api = TabularRegressionTask(
delete_tmp_folder_after_terminate=False,
search_space_updates=get_search_space_updates()
temporary_directory='./tmp/autoPyTorch_example_tmp_02',
output_directory='./tmp/autoPyTorch_example_out_02',
# To maintain logs of the run, set the next two as False
delete_tmp_folder_after_terminate=True,
delete_output_folder_after_terminate=True
)

############################################################################
# Search for an ensemble of machine learning algorithms
# =====================================================
api.search(
X_train=X_train,
y_train=y_train_scaled,
X_test=X_test.copy(),
y_test=y_test_scaled.copy(),
optimize_metric='r2',
total_walltime_limit=500,
total_walltime_limit=300,
func_eval_time_limit=50,
traditional_per_total_budget=0
)
Expand All @@ -114,3 +81,5 @@ def get_search_space_updates():
score = api.score(y_pred, y_test)

print(score)
# Print the final ensemble built by AutoPyTorch
print(api.show_models())
Loading