Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forecasting docs #442

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Time series forecasting (#434)
* new target scaler, allow NoNorm for MLP Encpder

* allow sampling full sequences

* integrate SeqBuilder to SequenceCollector

* restore SequenceBuilder to reduce memory usage

* move scaler to network

* lag sequence

* merge encoder and decoder as a single pipeline

* faster lag_seq builder

* maint

* new init, faster DeepAR inference in trainer

* more losses types

* maint

* new Transformer models,  allow RNN to do deepAR inference

* maint

* maint

* maint

* maint

* reduced search space for Transformer

* reduced init design

* maint

* maint

* maint

* maint

* faster forecasting

* maint

* allow singel fidelity

* maint

* fix budget num_seq

* faster sampler and lagger

* maint

* maint

* maint deepAR

* maint

* maint

* cross validation

* allow holdout for smaller datasets

* smac4ac to smac4hpo

* maint

* maint

* allow to change decoder search space

* more resampling strategy, more options for MLP

* reduced NBEATS

* subsampler for val loader

* rng for dataloader sampler

* maint

* remove generator as it cannot be pickled

* allow lower fidelity to evaluate less test instances

* fix dummy forecastro isues

* maint

* add gluonts as requirement

* more data for val set for larger dataset

* maint

* maint

* fix nbeats decoder

* new dataset interface

* resolve conflict

* maint

* allow encoder to receive input from different sources

* multi blocks hp design

* maint

* correct hp updates

* first trial on nested conjunction

* maint

* fit for deep AR model (needs to be reverted when the issue in ConfigSpace is fixed!!!)

* adjust backbones to fit new structure

* further API changes

* tft temporal fusion decoder

* construct network

* cells for networks

* forecasting backbones

* maint

* maint

* move tft layer to backbone

* maint

* quantile loss

* maint

* maint

* maint

* maint

* maint

* maint

* forecasting init configs

* add forbidden

* maint

* maint

* maint

* remove shift data

* maint

* maint

* copy dataset_properties for each refit iteration

* maint and new init

* Tft forecating with features (#6)

* time feature transform

* tft with time-variing features

* transform features allowed for all architecture

* repair mask for temporal fusion layer

* maint

* fix loss computation in QuantileLoss

* fixed scaler computation

* maint

* fix dataset

* adjust window_size to seasonality

* maint scaling

* fix uncorrect Seq2Seq scaling

* fix sampling for seq2seq

* maint

* fix scaling in NBEATS

* move time feature computation to dataset

* maint

* fix feature computation

* maint

* multi-variant feature validator

* maint

* validator for multi-variant series

* feature validator

* multi-variant datasets

* observed targets

* stucture adjustment

* refactory ts tasks and preprocessing

* allow nan in targets

* preprocessing for time series

* maint

* forecasting pipeline

* maint

* embedding and maint

* move targets to the tail of the features

* maint

* static features

* adjsut scaler to static features

* remove static features from forward dict

* test transform

* maint

* test sets

* adjust dataset to allow future known features

* maint

* maint

* flake8

* synchronise with development

* recover timeseries

* maint

* maint

* limit memory usage tae

* revert test api

* test for targets

* not allow sparse forecasting target

* test for data validator

* test for validations

* test on TimeSeriesSequence

* maint

* test for resampling

* test for dataset 1

* test for datasets

* test on tae

* maint

* all evaluator to evalaute test sets

* tests on losses

* test for metrics

* forecasting preprocessing

* maint

* finish test for preprocessing

* test for data loader

* tests for dataloader

* maint

* test for target scaling 1

* test for target scaer

* test for training loss

* maint

* test for network backbone

* test for backbone base

* test for flat encoder

* test for seq encoder

* test for seqencoder

* maint

* test for recurrent decoders

* test for network

* maint

* test for architecture

* test for pipelines

* fixed sampler

* maint sampler

* resolve conflict between embedding and net encoder

* fix scaling

* allow transform for test dataloader

* maint dataloader

* fix updates

* fix dataset

* tests on api, initial design on multi-variant

* maint

* fix dataloader

* move test with for loop to unittest.subtest

* flake 8 and update requirement

* mypy

* validator for pd dataframe

* allow series idx for api

* maint

* examples for forecasting

* fix mypy

* properly memory limitation for forecasting example

* fix pre-commit

* maint dataloader

* remove unused auto-regressive arguments

* fix pre-commit

* maint

* maint mypy

* mypy!!!

* pre-commit

* mypyyyyyyyyyyyyyyyyyyyyyyyy

* maint

* move forcasting requirements to extras_require

* bring eval_test to tae

* make rh2epm consistent with SMAC4HPO

* remove smac4ac from smbo

* revert changes in network

* revert changes in trainer

* revert format changes

* move constant_forecasting to constatn

* additional annotate for base pipeline

* move forecasting check to tae

* maint time series refit dataset

* fix test

* workflow for extra requirements

* docs for time series dataset

* fix pre-commit

* docs for dataset

* maint docstring

* merge target scaler to one file

* fix forecasting init cfgs

* remove redudant pipeline configs

* maint

* SMAC4HPO instead of SMAC4AC in smbo (will be reverted further if study shows that SMAC4HPO is superior to SMAC4AC)

* fixed docstrign for RNN and Transformer Decoder

* uniformed docstrings for smbo and base task

* correct encoder to decoder in decoder.init

* fix doc strings

* add license and docstrings for NBEATS heads

* allow memory limit to be None

* relax test load for forecasting

* fix docs

* fix pre-commit

* make test compatible with py37

* maint docstring

* split forecasting_eval_train_function from eval_train_function

* fix namespace for test_api from train_evaluator to tae

* maint test api for forecasting

* decrease number of ensemble size of test_time_series_forecasting to reduce test time

* flatten all the prediction for forecasting pipelines

* pre-commit fix

* fix docstrings and typing

* maint time series dataset docstrings

* maint warning message in time_series_forecasting_train_evaluator

* fix lines that are overlength

Co-authored-by: NHML23117 <nhmldeng@login03.css.lan>
Co-authored-by: Deng Difan <deng@p200300cd070f1f50dabbc1fffe9c6aa9.dip0.t-ipconnect.de>
  • Loading branch information
3 people committed Jun 28, 2022
commit cbd09635218891b450910ab17162423613c5cd41
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:

- name: Install dependencies
run: |
pip install -e .[docs,examples]
pip install -e .[docs,examples,forecasting]

- name: Make docs
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/long_regression_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
- name: Install test dependencies
run: |
python -m pip install --upgrade pip
pip install -e .[test]
pip install -e .[forecasting,test]

- name: Run tests
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ jobs:
run: |
git submodule update --init --recursive
python -m pip install --upgrade pip
pip install -e .[test]
pip install -e .[forecasting,test]

- name: Dist install
if: matrix.kind == 'dist'
Expand All @@ -98,7 +98,7 @@ jobs:

python setup.py sdist
last_dist=$(ls -t dist/autoPyTorch-*.tar.gz | head -n 1)
pip install $last_dist[test]
pip install $last_dist[forecasting,test]

- name: Store repository status
id: status-before
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,4 +256,4 @@ Please refer to the branch `TPAMI.2021.3067763` to reproduce the paper *Auto-PyT

## Contact

Auto-PyTorch is developed by the [AutoML Group of the University of Freiburg](http://www.automl.org/).
Auto-PyTorch is developed by the [AutoML Groups of the University of Freiburg and Hannover](http://www.automl.org/).
79 changes: 55 additions & 24 deletions autoPyTorch/api/base_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,12 @@
from autoPyTorch import metrics
from autoPyTorch.automl_common.common.utils.backend import Backend, create
from autoPyTorch.constants import (
FORECASTING_BUDGET_TYPE,
FORECASTING_TASKS,
REGRESSION_TASKS,
STRING_TO_OUTPUT_TYPES,
STRING_TO_TASK_TYPES,
TIMESERIES_FORECASTING,
)
from autoPyTorch.data.base_validator import BaseInputValidator
from autoPyTorch.data.utils import DatasetCompressionSpec
Expand Down Expand Up @@ -77,7 +80,8 @@ def _pipeline_predict(pipeline: BasePipeline,
X: Union[np.ndarray, pd.DataFrame],
batch_size: int,
logger: PicklableClientLogger,
task: int) -> np.ndarray:
task: int,
task_type: str = "") -> np.ndarray:
@typing.no_type_check
def send_warnings_to_log(
message, category, filename, lineno, file=None, line=None):
Expand All @@ -87,7 +91,7 @@ def send_warnings_to_log(
X_ = X.copy()
with warnings.catch_warnings():
warnings.showwarning = send_warnings_to_log
if task in REGRESSION_TASKS:
if task in REGRESSION_TASKS or task in FORECASTING_TASKS:
# Voting regressor does not support batch size
prediction = pipeline.predict(X_)
else:
Expand All @@ -101,13 +105,13 @@ def send_warnings_to_log(
prediction,
np.sum(prediction, axis=1)
))

if len(prediction.shape) < 1 or len(X_.shape) < 1 or \
X_.shape[0] < 1 or prediction.shape[0] != X_.shape[0]:
logger.warning(
"Prediction shape for model %s is %s while X_.shape is %s",
pipeline, str(prediction.shape), str(X_.shape)
)
if STRING_TO_TASK_TYPES.get(task_type, -1) != TIMESERIES_FORECASTING:
if len(prediction.shape) < 1 or len(X_.shape) < 1 or \
X_.shape[0] < 1 or prediction.shape[0] != X_.shape[0]:
logger.warning(
"Prediction shape for model %s is %s while X_.shape is %s",
pipeline, str(prediction.shape), str(X_.shape)
)
return prediction


Expand Down Expand Up @@ -218,6 +222,8 @@ def __init__(
self.search_space: Optional[ConfigurationSpace] = None
self._dataset_requirements: Optional[List[FitRequirement]] = None
self._metric: Optional[autoPyTorchMetric] = None
self._metrics_kwargs: Dict = {}

self._scoring_functions: Optional[List[autoPyTorchMetric]] = None
self._logger: Optional[PicklableClientLogger] = None
self.dataset_name: Optional[str] = None
Expand Down Expand Up @@ -737,7 +743,7 @@ def _do_dummy_prediction(self) -> None:
stats=stats,
memory_limit=memory_limit,
disable_file_output=self._disable_file_output,
all_supported_metrics=self._all_supported_metrics
all_supported_metrics=self._all_supported_metrics,
)

status, _, _, additional_info = ta.run(num_run, cutoff=self._time_for_task)
Expand Down Expand Up @@ -822,7 +828,7 @@ def _do_traditional_prediction(self, time_left: int, func_eval_time_limit_secs:
stats=stats,
memory_limit=memory_limit,
disable_file_output=self._disable_file_output,
all_supported_metrics=self._all_supported_metrics
all_supported_metrics=self._all_supported_metrics,
)
dask_futures.append([
classifier,
Expand Down Expand Up @@ -906,8 +912,8 @@ def _search(
optimize_metric: str,
dataset: BaseDataset,
budget_type: str = 'epochs',
min_budget: int = 5,
max_budget: int = 50,
min_budget: Union[int, float] = 5,
max_budget: Union[int, float] = 50,
total_walltime_limit: int = 100,
func_eval_time_limit_secs: Optional[int] = None,
enable_traditional_pipeline: bool = True,
Expand All @@ -920,7 +926,8 @@ def _search(
disable_file_output: Optional[List[Union[str, DisableFileOutputParameters]]] = None,
load_models: bool = True,
portfolio_selection: Optional[str] = None,
dask_client: Optional[dask.distributed.Client] = None
dask_client: Optional[dask.distributed.Client] = None,
**kwargs: Any
) -> 'BaseTask':
"""
Search for the best pipeline configuration for the given dataset.
Expand Down Expand Up @@ -1048,7 +1055,14 @@ def _search(
Additionally, the keyword 'greedy' is supported,
which would use the default portfolio from
`AutoPyTorch Tabular <https://arxiv.org/abs/2006.13799>`_

kwargs: Any
additional arguments that are customed by some specific task.
For instance, forecasting tasks require:
min_num_test_instances (int): minimal number of instances used to initialize a proxy validation set
suggested_init_models (List[str]): A set of initial models suggested by the users. Their
hyperparameters are determined by the default configurations
custom_init_setting_path (str): The path to the initial hyperparameter configurations set by
the users
Returns:
self

Expand Down Expand Up @@ -1110,7 +1124,10 @@ def _search(
self.search_space = self.get_search_space(dataset)

# Incorporate budget to pipeline config
if budget_type not in ('epochs', 'runtime'):
if budget_type not in ('epochs', 'runtime') and (
budget_type in FORECASTING_BUDGET_TYPE
and STRING_TO_TASK_TYPES[self.task_type] != TIMESERIES_FORECASTING
):
raise ValueError("Budget type must be one ('epochs', 'runtime')"
f" yet {budget_type} was provided")
self.pipeline_options['budget_type'] = budget_type
Expand Down Expand Up @@ -1216,6 +1233,7 @@ def _search(
precision=precision,
logger_port=self._logger_port,
pynisher_context=self._multiprocessing_context,
metrics_kwargs=self._metrics_kwargs,
)
self._stopwatch.stop_task(ensemble_task_name)

Expand All @@ -1229,7 +1247,6 @@ def _search(
if time_left_for_smac <= 0:
self._logger.warning(" Not starting SMAC because there is no time left")
else:

_proc_smac = AutoMLSMBO(
config_space=self.search_space,
dataset_name=str(dataset.dataset_name),
Expand Down Expand Up @@ -1259,6 +1276,8 @@ def _search(
search_space_updates=self.search_space_updates,
portfolio_selection=portfolio_selection,
pynisher_context=self._multiprocessing_context,
task_type=self.task_type,
**kwargs,
)
try:
run_history, self._results_manager.trajectory, budget_type = \
Expand Down Expand Up @@ -1323,19 +1342,30 @@ def _get_fit_dictionary(
dataset: BaseDataset,
split_id: int = 0
) -> Dict[str, Any]:
X_test = dataset.test_tensors[0].copy() if dataset.test_tensors is not None else None
y_test = dataset.test_tensors[1].copy() if dataset.test_tensors is not None else None
if dataset.test_tensors is not None:
X_test = dataset.test_tensors[0].copy() if dataset.test_tensors[0] is not None else None
y_test = dataset.test_tensors[1].copy() if dataset.test_tensors[1] is not None else None
else:
X_test = None
y_test = None

X_train = dataset.train_tensors[0].copy() if dataset.train_tensors[0] is not None else None
y_train = dataset.train_tensors[1].copy()
X: Dict[str, Any] = dict({'dataset_properties': dataset_properties,
'backend': self._backend,
'X_train': dataset.train_tensors[0].copy(),
'y_train': dataset.train_tensors[1].copy(),
'X_train': X_train,
'y_train': y_train,
'X_test': X_test,
'y_test': y_test,
'train_indices': dataset.splits[split_id][0],
'val_indices': dataset.splits[split_id][1],
'split_id': split_id,
'num_run': self._backend.get_next_num_run(),
})
if STRING_TO_TASK_TYPES[self.task_type] == TIMESERIES_FORECASTING:
warnings.warn("Currently Time Series Forecasting tasks do not allow computing metrics "
"during training. It will be automatically set as False")
self.pipeline_options["metrics_during_training"] = False
X.update(self.pipeline_options)
return X

Expand Down Expand Up @@ -1398,7 +1428,7 @@ def refit(
# could alleviate the problem in algorithms that depend on
# the ordering of the data.
X = self._get_fit_dictionary(
dataset_properties=dataset_properties,
dataset_properties=copy.copy(dataset_properties),
dataset=dataset,
split_id=split_id)
fit_and_suppress_warnings(self._logger, model, X, y=None)
Expand Down Expand Up @@ -1630,7 +1660,7 @@ def fit_pipeline(
exclude=exclude_components,
search_space_updates=search_space_updates,
pipeline_config=pipeline_options,
pynisher_context=self._multiprocessing_context
pynisher_context=self._multiprocessing_context,
)

run_info, run_value = tae.run_wrapper(
Expand Down Expand Up @@ -1722,7 +1752,8 @@ def predict(

all_predictions = joblib.Parallel(n_jobs=n_jobs)(
joblib.delayed(_pipeline_predict)(
models[identifier], X_test, batch_size, self._logger, STRING_TO_TASK_TYPES[self.task_type]
models[identifier], X_test, batch_size, self._logger, STRING_TO_TASK_TYPES[self.task_type],
self.task_type
)
for identifier in self.ensemble_.get_selected_model_identifiers()
)
Expand Down
Loading