Skip to content

Commit 3d2e21e

Browse files
author
Github Actions
committed
Difan Deng: Time series forecasting (#434)
1 parent f1f2435 commit 3d2e21e

File tree

41 files changed

+1150
-30529
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1150
-30529
lines changed
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"metadata": {
7+
"collapsed": false
8+
},
9+
"outputs": [],
10+
"source": [
11+
"%matplotlib inline"
12+
]
13+
},
14+
{
15+
"cell_type": "markdown",
16+
"metadata": {},
17+
"source": [
18+
"\n# Time Series Forecasting\n\nThe following example shows how to fit a sample forecasting model\nwith AutoPyTorch. This is only a dummmy example because of the limited size of the dataset.\nThus, it could be possible that the AutoPyTorch model does not perform as well as a dummy predictor\n"
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": null,
24+
"metadata": {
25+
"collapsed": false
26+
},
27+
"outputs": [],
28+
"source": [
29+
"import os\nimport tempfile as tmp\nimport warnings\nimport copy\n\nos.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()\nos.environ['OMP_NUM_THREADS'] = '1'\nos.environ['OPENBLAS_NUM_THREADS'] = '1'\nos.environ['MKL_NUM_THREADS'] = '1'\n\nwarnings.simplefilter(action='ignore', category=UserWarning)\nwarnings.simplefilter(action='ignore', category=FutureWarning)\n\nfrom sktime.datasets import load_longley\ntargets, features = load_longley()\n\nforecasting_horizon = 3\n\n# each series represent an element in the List\n# we take the last forecasting_horizon as test targets. The itme before that as training targets\n# Normally the value to be forecasted should follow the training sets\ny_train = [targets[: -forecasting_horizon]]\ny_test = [targets[-forecasting_horizon:]]\n\n# same for features. For uni-variant models, X_train, X_test can be omitted\nX_train = [features[: -forecasting_horizon]]\n# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown\n# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,\n# we could also omit X_test\nknown_future_features = list(features.columns)\nX_test = [features[-forecasting_horizon:]]\n\nstart_times = [targets.index.to_timestamp()[0]]\nfreq = '1Y'\n\nfrom autoPyTorch.api.time_series_forecasting import TimeSeriesForecastingTask"
30+
]
31+
},
32+
{
33+
"cell_type": "markdown",
34+
"metadata": {},
35+
"source": [
36+
"## Build and fit a forecaster\n\n"
37+
]
38+
},
39+
{
40+
"cell_type": "code",
41+
"execution_count": null,
42+
"metadata": {
43+
"collapsed": false
44+
},
45+
"outputs": [],
46+
"source": [
47+
"api = TimeSeriesForecastingTask()"
48+
]
49+
},
50+
{
51+
"cell_type": "markdown",
52+
"metadata": {},
53+
"source": [
54+
"## Search for an ensemble of machine learning algorithms\n\n"
55+
]
56+
},
57+
{
58+
"cell_type": "code",
59+
"execution_count": null,
60+
"metadata": {
61+
"collapsed": false
62+
},
63+
"outputs": [],
64+
"source": [
65+
"api.search(\n X_train=X_train,\n y_train=copy.deepcopy(y_train),\n X_test=X_test,\n optimize_metric='mean_MASE_forecasting',\n n_prediction_steps=forecasting_horizon,\n memory_limit=16 * 1024, # Currently, forecasting models need much more memories than it actually requires\n freq=freq,\n start_times=start_times,\n func_eval_time_limit_secs=50,\n total_walltime_limit=60,\n min_num_test_instances=1000, # proxy validation sets. This only works for the tasks with more than 1000 series\n known_future_features=known_future_features,\n)\n\n\nfrom autoPyTorch.datasets.time_series_dataset import TimeSeriesSequence\n\ntest_sets = []\n\n# We could construct test sets from scratch\nfor feature, future_feature, target, start_time in zip(X_train, X_test,y_train, start_times):\n test_sets.append(\n TimeSeriesSequence(X=feature.values,\n Y=target.values,\n X_test=future_feature.values,\n start_time=start_time,\n is_test_set=True,\n # additional information required to construct a new time series sequence\n **api.dataset.sequences_builder_kwargs\n )\n )\n# Alternatively, if we only want to forecast the value after the X_train, we could directly ask datamanager to\n# generate a test set:\n# test_sets2 = api.dataset.generate_test_seqs()\n\npred = api.predict(test_sets)"
66+
]
67+
}
68+
],
69+
"metadata": {
70+
"kernelspec": {
71+
"display_name": "Python 3",
72+
"language": "python",
73+
"name": "python3"
74+
},
75+
"language_info": {
76+
"codemirror_mode": {
77+
"name": "ipython",
78+
"version": 3
79+
},
80+
"file_extension": ".py",
81+
"mimetype": "text/x-python",
82+
"name": "python",
83+
"nbconvert_exporter": "python",
84+
"pygments_lexer": "ipython3",
85+
"version": "3.8.12"
86+
}
87+
},
88+
"nbformat": 4,
89+
"nbformat_minor": 0
90+
}
Binary file not shown.
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
"""
2+
======================
3+
Time Series Forecasting
4+
======================
5+
6+
The following example shows how to fit a sample forecasting model
7+
with AutoPyTorch. This is only a dummmy example because of the limited size of the dataset.
8+
Thus, it could be possible that the AutoPyTorch model does not perform as well as a dummy predictor
9+
"""
10+
import os
11+
import tempfile as tmp
12+
import warnings
13+
import copy
14+
15+
os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()
16+
os.environ['OMP_NUM_THREADS'] = '1'
17+
os.environ['OPENBLAS_NUM_THREADS'] = '1'
18+
os.environ['MKL_NUM_THREADS'] = '1'
19+
20+
warnings.simplefilter(action='ignore', category=UserWarning)
21+
warnings.simplefilter(action='ignore', category=FutureWarning)
22+
23+
from sktime.datasets import load_longley
24+
targets, features = load_longley()
25+
26+
forecasting_horizon = 3
27+
28+
# each series represent an element in the List
29+
# we take the last forecasting_horizon as test targets. The itme before that as training targets
30+
# Normally the value to be forecasted should follow the training sets
31+
y_train = [targets[: -forecasting_horizon]]
32+
y_test = [targets[-forecasting_horizon:]]
33+
34+
# same for features. For uni-variant models, X_train, X_test can be omitted
35+
X_train = [features[: -forecasting_horizon]]
36+
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
37+
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
38+
# we could also omit X_test
39+
known_future_features = list(features.columns)
40+
X_test = [features[-forecasting_horizon:]]
41+
42+
start_times = [targets.index.to_timestamp()[0]]
43+
freq = '1Y'
44+
45+
from autoPyTorch.api.time_series_forecasting import TimeSeriesForecastingTask
46+
############################################################################
47+
# Build and fit a forecaster
48+
# ==========================
49+
api = TimeSeriesForecastingTask()
50+
51+
############################################################################
52+
# Search for an ensemble of machine learning algorithms
53+
# =====================================================
54+
api.search(
55+
X_train=X_train,
56+
y_train=copy.deepcopy(y_train),
57+
X_test=X_test,
58+
optimize_metric='mean_MASE_forecasting',
59+
n_prediction_steps=forecasting_horizon,
60+
memory_limit=16 * 1024, # Currently, forecasting models need much more memories than it actually requires
61+
freq=freq,
62+
start_times=start_times,
63+
func_eval_time_limit_secs=50,
64+
total_walltime_limit=60,
65+
min_num_test_instances=1000, # proxy validation sets. This only works for the tasks with more than 1000 series
66+
known_future_features=known_future_features,
67+
)
68+
69+
70+
from autoPyTorch.datasets.time_series_dataset import TimeSeriesSequence
71+
72+
test_sets = []
73+
74+
# We could construct test sets from scratch
75+
for feature, future_feature, target, start_time in zip(X_train, X_test,y_train, start_times):
76+
test_sets.append(
77+
TimeSeriesSequence(X=feature.values,
78+
Y=target.values,
79+
X_test=future_feature.values,
80+
start_time=start_time,
81+
is_test_set=True,
82+
# additional information required to construct a new time series sequence
83+
**api.dataset.sequences_builder_kwargs
84+
)
85+
)
86+
# Alternatively, if we only want to forecast the value after the X_train, we could directly ask datamanager to
87+
# generate a test set:
88+
# test_sets2 = api.dataset.generate_test_seqs()
89+
90+
pred = api.predict(test_sets)
Binary file not shown.
Loading
Loading
Loading
Loading
Loading

development/_sources/examples/20_basics/example_image_classification.rst.txt

Lines changed: 13 additions & 15 deletions
Large diffs are not rendered by default.

development/_sources/examples/20_basics/example_tabular_classification.rst.txt

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ Search for an ensemble of machine learning algorithms
134134
.. code-block:: none
135135
136136
137-
<autoPyTorch.api.tabular_classification.TabularClassificationTask object at 0x7fdc1b398100>
137+
<autoPyTorch.api.tabular_classification.TabularClassificationTask object at 0x7f70872c5490>
138138
139139
140140
@@ -166,22 +166,21 @@ Print the final ensemble performance
166166
.. code-block:: none
167167
168168
{'accuracy': 0.8670520231213873}
169-
| | Preprocessing | Estimator | Weight |
170-
|---:|:-------------------------------------------------------------------------------------------------|:----------------------------------------------------------------|---------:|
171-
| 0 | None | CBLearner | 0.32 |
172-
| 1 | SimpleImputer,Variance Threshold,MinorityCoalescer,OneHotEncoder,QuantileTransformer,KitchenSink | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.2 |
173-
| 2 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,SRC | embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.2 |
174-
| 3 | SimpleImputer,Variance Threshold,MinorityCoalescer,OneHotEncoder,NoScaler,KitchenSink | embedding,ResNetBackbone,FullyConnectedHead,nn.Sequential | 0.12 |
175-
| 4 | SimpleImputer,Variance Threshold,MinorityCoalescer,OneHotEncoder,QuantileTransformer,KitchenSink | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.08 |
176-
| 5 | SimpleImputer,Variance Threshold,MinorityCoalescer,OneHotEncoder,QuantileTransformer,KitchenSink | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.08 |
169+
| | Preprocessing | Estimator | Weight |
170+
|---:|:--------------------------------------------------------------------------------------|:----------------------------------------------------------------|---------:|
171+
| 0 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,MinMaxScaler,FastICA | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.56 |
172+
| 1 | SimpleImputer,Variance Threshold,MinorityCoalescer,OneHotEncoder,Normalizer,KernelPCA | embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential | 0.38 |
173+
| 2 | SimpleImputer,Variance Threshold,NoCoalescer,NoEncoder,StandardScaler,PCA | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.02 |
174+
| 3 | None | CBLearner | 0.02 |
175+
| 4 | None | SVMLearner | 0.02 |
177176
autoPyTorch results:
178177
Dataset name: Australian
179178
Optimisation Metric: accuracy
180179
Best validation score: 0.8713450292397661
181-
Number of target algorithm runs: 21
182-
Number of successful target algorithm runs: 19
180+
Number of target algorithm runs: 28
181+
Number of successful target algorithm runs: 27
183182
Number of crashed target algorithm runs: 0
184-
Number of target algorithms that exceeded the time limit: 2
183+
Number of target algorithms that exceeded the time limit: 1
185184
Number of target algorithms that exceeded the memory limit: 0
186185
187186
@@ -191,7 +190,7 @@ Print the final ensemble performance
191190
192191
.. rst-class:: sphx-glr-timing
193192

194-
**Total running time of the script:** ( 5 minutes 20.003 seconds)
193+
**Total running time of the script:** ( 5 minutes 37.043 seconds)
195194

196195

197196
.. _sphx_glr_download_examples_20_basics_example_tabular_classification.py:

development/_sources/examples/20_basics/example_tabular_regression.rst.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ Search for an ensemble of machine learning algorithms
125125
.. code-block:: none
126126
127127
128-
<autoPyTorch.api.tabular_regression.TabularRegressionTask object at 0x7fdca57c8b50>
128+
<autoPyTorch.api.tabular_regression.TabularRegressionTask object at 0x7f7113dcb220>
129129
130130
131131
@@ -167,12 +167,12 @@ Print the final ensemble performance
167167
| 2 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.1 |
168168
| 3 | None | LGBMLearner | 0.04 |
169169
autoPyTorch results:
170-
Dataset name: 7a5ffe66-f075-11ec-8806-a30cbc8a0bb8
170+
Dataset name: a964bba4-f6c8-11ec-87fd-b1d4bc580917
171171
Optimisation Metric: r2
172172
Best validation score: 0.8670098636440993
173-
Number of target algorithm runs: 24
173+
Number of target algorithm runs: 23
174174
Number of successful target algorithm runs: 22
175-
Number of crashed target algorithm runs: 1
175+
Number of crashed target algorithm runs: 0
176176
Number of target algorithms that exceeded the time limit: 1
177177
Number of target algorithms that exceeded the memory limit: 0
178178
@@ -183,7 +183,7 @@ Print the final ensemble performance
183183
184184
.. rst-class:: sphx-glr-timing
185185

186-
**Total running time of the script:** ( 5 minutes 35.570 seconds)
186+
**Total running time of the script:** ( 5 minutes 42.728 seconds)
187187

188188

189189
.. _sphx_glr_download_examples_20_basics_example_tabular_regression.py:

0 commit comments

Comments
 (0)