Timeseries Forecasting

This sample explores different methods for timeseries forecasting, including

Statistical algorithms - ARIMA and Auto ARIMA
Machine Learning - Random Forest and Azure AutoML
Off-the-shelf solutions - Azure Data Explorer and Power BI

It also demonstrates how to use Azure Machine Learning Service to train, register, and deploy a forecasting model as a web service.

We use the hourly NYC energy demand dataset between 2012 and 2017. The dataset csv is copied from here.

Set up

Use a Azure Data Science VM or any VM in Azure to set up a dev environment as documented here.
In the Azure ML Conda environment, in addition to the basic AzureML packages, also install the following packages:
- matplotlib
- azure-storage-blob
- pyramid-arima
- azureml-sdk[explain,automl]

Explore and clean the data

Use the process_data Jupyter notebook to explore the data. This notebook illustrates how to fill the time series with missing timeslots, remove outliers, and aggregate the data to handle the respective seasonality for different forecasting granularity -

hourly patterns repeated daily
daily patterns repeated weekly
monthly patterns repeated yearly

ARIMA

Use the arima Juputer notebook to explore how to test stationarity, and if data is not stationary, how to remove trend and seasonality to forecast on the residual and then add trend and seasonality back in the forecast.

Determining the parameters for ARIMA requires a lot of trial and error, even with the help of ACF (auto correlation function) and PACF (partial auto correlation function) graphs. Auto ARIMA tries different parameters automatically and often produces much better results with far less effort. It's also not necessary to make the data stationary for Auto ARIMA.

Machine Learning

With machine learning, we transform the data out of the timeseries domain into, for example, regression problems. It's not necessary to convert data to stationary for machine learning. The machine_learning Jupyter notebook explores Random Forest for forecasting by manually adding features such as lags and day of week. The sample dataset does include weather data which is often very helpful in this type of forecasting. We didn't use weather data in this case because we want to mimic datasets that don't have weather data available.

Azure AutoML forecasting is capable of fitting different ML models and choosing the best model with stack or voting ensemble. It's also not necessary to manually calcuate the lags for AutoML.

Putting it all together

Once you are happy exploring the data and models locally, you can use python scripts to operationalize the machine learning pipeline.

01_process_data.py - cleans and aggregates data for different forecasting granularity
02_submit_training.py - trains Azure AutoML forecasting models in Azure ML compute. Azure ML automatically tracks all the experiment runs and scales compute resources when running training jobs
03_register_and_deploy.py - once you are happy with a model, register it with Azure ML, and use Azure ML to automatically create a web service for forecasting
04_forecast_from_webservice.py - demonstrates how to call the deployed web service for forecasting

An Azure DevOps build pipeline, also as code in azure-pipelines.yml, submits a job to train the model in Azure ML.

An Azure DevOps release pipeline registers and deploys a model in Azure Container Instance so you can make a REST call for forecasting as described in 04_forecast_from_webservice.py.

Off-the-shelve solutions

Azure Data Explorer has powerful built-in time series analysis capabilities. timeseries.kql contains sample scripts that analyze the sample dataset for forecasting. For example, below chart is the output of Azure Data Explorer decomposing the sample data:

Power BI also has built-in capabilities for time series forecasting. The sample pbix file demonstrates how to use Power BI to forecast on the sample data:

Other considerations

Multiple series - single model or multiple models

If you have multiple series of data, most likely you'll need to train a model for each series. This could lead to hundreds or thousands of models which could be difficult to maintain. Consider grouping them based on similarity to reduce the number of models.

If you treat different series as different grains (for example, energy demand for power plant A and plant B), training just one model could possibly work if A and B share the same time range, granularity, scale of target value, and other features such as weather.

Other algorithms

The following statistical, machine learning, and deep learning models, although not demonstrated in this example, have also proven effective in time series forecasting use cases.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.vscode		.vscode
AzureDataExplorer		AzureDataExplorer
PowerBI		PowerBI
data		data
notebooks		notebooks
.gitignore		.gitignore
01_process_data.py		01_process_data.py
02_submit_training.py		02_submit_training.py
03_register_and_deploy.py		03_register_and_deploy.py
04_forecast_from_webservice.py		04_forecast_from_webservice.py
README.md		README.md
azure-pipelines.yml		azure-pipelines.yml
get_data.py		get_data.py
my_aml_workspace.json		my_aml_workspace.json
score.py		score.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Timeseries Forecasting

Set up

Explore and clean the data

ARIMA

Machine Learning

Putting it all together

Off-the-shelve solutions

Other considerations

Multiple series - single model or multiple models

Other algorithms

About

Releases

Packages

Languages

kkader/timeseries-forecast

Folders and files

Latest commit

History

Repository files navigation

Timeseries Forecasting

Set up

Explore and clean the data

ARIMA

Machine Learning

Putting it all together

Off-the-shelve solutions

Other considerations

Multiple series - single model or multiple models

Other algorithms

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages