- Introduction
- Time Series Graphics
- Time Series Decomposition
- Benchmark Forecasting Methods
- Time Series Regression Models
- Exponential Smoothing
- ARIMA Models
- Dynamic regression models
- Prophet Model
- Vector Autoregressions
- Neural Networks
Q. What is forecasting?
Answer
Forecasting is about predicting the future as accurately as possible, given all of the information available, including historical data and knowledge of any future events that might impact the forecasts.
Q. What do you mean by time series?
Answer
Anything that is observed sequentially over time is a time series. The observations can be at regular intervals of time (e.g. hourly, daily, monthly etc.) or irregular intervals.
Q. State the difference between univariate and multivariate time series?
Answer
Univariate time series refers to a single variable tracked over time, such as daily temperature readings in a city. In contrast, multivariate time series encompasses multiple variables measured over the same period, like daily temperature, humidity, and wind speed.
Q. Define following terms:
- Short-term forecasts
- Medium-term forecasts
- Long-term forecasts
Answer
Short-term forecasts
It is needed for the scheduling of personnel, production and transportation. As part of the scheduling process, forecasts of demand are often also required.
Medium-term forecasts
It is needed to determine future resource requirements, in order to purchase raw materials, hire personnel, or buy machinery and equipment.
Long-term forecasts
It is used in strategic planning. Such decisions must take account of market opportunities, environmental factors and internal resources.
Q. List down factors on which predictability of an event or quantity depends?
Answer
The predictability of an event or a quantity depends on several factors including
- how well we understand the factors that contribute to it
- how much data is available
- how similar the future is to the past
- whether the forecasts can affect the thing we are trying to forecast
Q. Is it correct to assume that forecasts are not possible in a changing environment?
Answer
No it is not correct. Forecasts rarely assume that the environment is unchanging. What is normally assumed is that the way in which the environment is changing will continue into the future.
Q. How would you approach forecasting if there is no available data, or if the data you have is not relevant to the forecasts?
Answer
In this scenario we can use qualitative forecasting methods. These methods are not purely guesswork—there are well-developed structured approaches to obtaining good forecasts without using historical data.
Q. When can we use Quantitative methods for forecasting use-cases?
Answer
Quantitative forecasting can be applied when two conditions are satisfied:
- Numerical information about the past is available
- It is reasonable to assume that some aspects of the past patterns will continue into the future
Q. State the difference between exogenous and endogenous variables?
Answer
Exogenous Variables: Determined outside the model and not influenced by other variables within it; serve as inputs.
Endogenous Variables: Determined by relationships within the model; influenced by other variables and represent the outcomes the model aims to explain.
Q. Which is the most common plot in time series EDA?
Answer
Time plot - the observations are plotted against the time of observation, with consecutive observations joined by straight lines.
Time Plot |
Q. What is the difference between seasonal plot and time plot?
Answer
A seasonal plot is similar to a time plot except that the data are plotted against individual seasons in which the data were observed.
Seasonal Plot |
Q. What is the difference between seasonal plot and time plot?
Answer
A seasonal plot is similar to a time plot except that the data are plotted against individual seasons in which the data were observed.
Seasonal Plot |
Q. What is the benefit of seasonal plots?
Answer
A seasonal plot allows the underlying seasonal pattern to be seen more clearly, and is especially useful in identifying years in which the pattern changes.
Q. What is seasonal subseries plots?
Answer
An alternative plot that emphasises the seasonal patterns is where the data for each season are collected together in separate mini time plots.
Seasonal Subseries Plots |
Q. What is seasonal subseries plots?
Answer
An alternative plot that emphasises the seasonal patterns is where the data for each season are collected together in separate mini time plots.
Seasonal Subseries Plots |
Q. Can we use scatter plots for time series EDA?
Answer
Yes, scatterplot helps us to visualise the relationship between the variables. For example we can study the relationship between demand and temperature by plotting one series against the other.
Q. What is the difference between correlation and autocorrelation?
Answer
Correlation measures the extent of a linear relationship between two variables, autocorrelation measures the linear relationship between lagged values of a time series.
Q. What is the autocorrelation function (ACF)?
Answer
The ACF is a plot of autocorrelation between a variable and itself separated by specified lags.
Autocorrelation Function |
Q. Write the expression for autocorrelation?
Answer
$$ r_k = \frac{\sum_{t=k+1}^{T}(y_t - \bar{y})(y_{t-k} - \bar{y})}{\sum_{t=1}^{T}(y_t - \bar{y})^2} $$
Where
Note that
Q. Define following terms:
- Trend
- Seasonal
- Cyclic
Answer
Trend
A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear.
Seasonal
A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. Seasonality is always of a fixed and known period
Cyclic
A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency. These fluctuations are usually due to economic conditions, and are often related to the "business cycle".
Q. How can we check for trend in time series data using ACF plots?
Answer
When data have a trend, the autocorrelations for small lags tend to be large and positive because observations nearby in time are also nearby in value. So the ACF of a trended time series tends to have positive values that slowly decrease as the lags increase.
ACF plot for data with trend |
Q. How can we check for seasonality in time series data using ACF plots?
Answer
When data are seasonal, the autocorrelations will be larger for the seasonal lags (at multiples of the seasonal period) than for other lags.
ACF plot for data with seasonal |
Q. How does the ACF plot looks like if data has both trend and seasonality?
Answer
The slow decrease in the ACF as the lags increase is due to the trend, while the “scalloped” shape is due to the seasonality.
ACF plot for data with seasonal and trend |
Q. What does white noise mean in time series?
Q. What are the statistical properties of white noise?
Answer
- For white noise series, autocorrelation to be close to zero.
- For a white noise series, we expect
$95%$ of the spikes in the ACF to lie within$±1.96/\sqrt{T}$ where$T$ is the length of the time series.
Q. How could you check if the given time series is white noise?
Answer
We can use the fact that for a white noise series, we expect
It is common to plot these bounds on a graph of the ACF (the blue dashed lines above). If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise.
Q. What is time series decomposition?
Answer
Splitting a time series into several components, each representing an underlying pattern category.
- A trend-cycle component (
$T_t$ ) - A seasonal component (
$S_t$ ) - A residual component (
$R_t$ )
Q. Can a given time series posses more than one seasonal component?
Answer
For some time series (e.g., those that are observed at least daily), there can be more than one seasonal component, corresponding to the different seasonal periods.
Q. What are the benefits of time series decomposition?
Answer
- It helps improve understanding of the time series
- It can also be used to improve forecast accuracy.
Q. What kind of adjustments we can do with time series data to simplify the patterns in it?
Answer
We can do four kinds of adjustments:
- calendar adjustments
- population adjustments
- inflation adjustments
- mathematical transformations.
Q. Why is it recommended to make adjustments or transformations to time series data before decomposing it?
Answer
The purpose of adjustments and transformations is to simplify the patterns in the historical data by removing known sources of variation, or by making the pattern more consistent across the whole data set. Simpler patterns are usually easier to model and lead to more accurate forecasts.
Q. Why is it recommended to make adjustments or transformations to time series data before decomposing it?
Answer
The purpose of adjustments and transformations is to simplify the patterns in the historical data by removing known sources of variation, or by making the pattern more consistent across the whole data set. Simpler patterns are usually easier to model and lead to more accurate forecasts.
Q. What are some common mathematical transformations that can be applied to time series data?
Answer
We can apply following transformations to time series depending on the scenarios
- Logarithmic transformations
- Power transformations
- Box-cox transformations
Q. What are the benefits of using mathematical transformations?
Answer
Mathematical transformations are techniques used to modify data in ways that make it more suitable for analysis. They can help stabilize variance, reduce skewness, and make relationships within the data more linear or normally distributed.
Q. What is log transformation?
Answer
If we denote the original observations as
Q. In which scenarios we should use log transformations?
Answer
Log transformations reduce right-skewness and stabilizes variance, especially in cases where data values are growing exponentially. It is commonly used when data ranges over several orders of magnitude.
Q. What is power transformations?
Answer
It uses mapping as:
$$ y' = y^p $$
It Increases or decreases the rate of change for different data values. Depending on the power ( p ) (for example, ( p = 2 ) for a square transformation or ( p = -1 ) for a reciprocal transformation), this transformation can reduce skewness or stabilize variance.
Q. What are Box-Cox transformations?
Answer
A useful family of transformations, that includes both logarithms and power transformations, is the family of Box-Cox transformations, which depend on the parameter
$$ y(\lambda) = \begin{cases} \frac{y^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \ \ln(y) & \text{if } \lambda = 0 \end{cases} $$
where:
- ( y ) is the original data,
- ( \lambda ) is the transformation parameter.
For ( \lambda = 0 ), the natural logarithm of ( y ) is used instead, which is a common transformation when data has exponential growth or multiplicative seasonality. The value of ( \lambda ) can be estimated to best transform the data for further analysis.
Q. What are additive and multiplicative models in time series decomposition?
Answer
In time series decomposition, additive and multiplicative models are used to break down a time series into its components: trend, seasonality, and residual (or noise).
Q. Explain additive model in time series decomposition?
Answer
In an additive model, the time series value at any given time ( Y_t ) is the sum of three components:
- Trend (( T_t )) – the long-term increase or decrease in the data.
- Seasonal (( S_t )) – the repeating short-term cycle in the data (such as monthly or yearly seasonality).
- Residual (( R_t )) – the remaining random variation or noise.
$$ Y_t = T_t + S_t + R_t $$
Q. When should we use additive model for time series decomposition?
Answer
An additive model is appropriate when the trend and seasonal variations are relatively constant over time. This means that the amplitude (size) of seasonal variations does not change with the level of the time series. For instance, if sales increase by a constant amount each year, an additive model might be suitable.
Q. Explain multiplicative model in time series decomposition?
Answer
In an multiplicative model, the time series value at any given time ( Y_t ) is the sum of three components:
- Trend (( T_t )) – the long-term increase or decrease in the data.
- Seasonal (( S_t )) – the repeating short-term cycle in the data (such as monthly or yearly seasonality).
- Residual (( R_t )) – the remaining random variation or noise.
$$ Y_t = T_t \times S_t \times R_t $$
Q. When should we use multiplicative model for time series decomposition?
Answer
A multiplicative model is appropriate when the seasonal variations change proportionally to the trend level. In this case, seasonal fluctuations grow or shrink as the trend rises or falls. For example, if sales increase by a certain percentage each year, a multiplicative model would be more suitable.
Q. How do we determine whether to use an additive or multiplicative model?
Answer
- Additive Model: Use when seasonality is roughly constant, regardless of the level of the trend.
- Multiplicative Model: Use when seasonality varies proportionally with the trend.
Q. How does a log transformation allow additive decomposition to approximate a multiplicative decomposition?
Answer
An alternative to using a multiplicative decomposition is to first transform the data until the variation in the series appears to be stable over time, then use an additive decomposition. When a log transformation has been used, this is equivalent to using a multiplicative decomposition on the original data because
$$ y_t = S_t \times T_t \times R_t $$
On taking
$$ \log{y_t} = \log{S_t} + \log{T_t} + \log{R_t} $$
Q. What is seasonally adjusted data?
Answer
If the seasonal component is removed from the original data, the resulting values are the "seasonally adjusted" data. For an additive decomposition, the seasonally adjusted data are given by
Q. Explain moving average smoothing in time series decomposition?
Answer
A moving average of order
$$ \hat{T}{t} = \frac{1}{m}\sum{j=-k}{k}y_{t+j} $$
where
Q. How does order
Answer
The order of the moving average determines the smoothness of the trend-cycle estimate.
Q. In an m-order moving average, is symmetry important?
Answer
In an m-order moving average, symmetry is important because it ensures that each data point is treated equally, minimizing lag and providing a more accurate representation of the trend at a given time.
Q. Explain weighted moving averages?
Answer
Combinations of moving averages result in weighted moving averages. In general, a weighted m-MA can be written as:
$$ \hat{T}t = \sum^{k}{j=-k}a_j y_{t+j} $$
where
Q. What is the major advantage of using weighted moving averages over m-MA?
Answer
A major advantage of weighted moving averages is that they yield a smoother estimate of the trend-cycle. Instead of observations entering and leaving the calculation at full weight, their weights slowly increase and then slowly decrease, resulting in a smoother curve.
Q. How can we use m-MA for time series decomposition?
Answer
An m-Moving Average (m-MA) can be used for time series decomposition by helping to separate the trend and seasonal components from the data.
- Identify the Seasonal Period:
- Determine the period of seasonality in your data, such as daily, monthly, or quarterly, depending on the time series. The chosen value of
$m$ is usually equal to this seasonal period.
- Calculate the Moving Average:
- Apply an m-point moving average to smooth the data. The choice of m depends on the frequency of seasonality. For example, with monthly data and yearly seasonality, you’d use a 12-point moving average.
- For symmetric smoothing, use a centered moving average, where you calculate the average over equal numbers of points before and after a central point
- Extract the Trend Component:
- The resulting moving average values represent the trend component, which shows the general direction of the series (upward, downward, or flat).
- Isolate the Seasonal Component:
- To find the seasonal component, divide the original time series values by the trend (for a multiplicative model) or subtract the trend values (for an additive model).
- This can be done across several periods to get the average seasonal pattern, which smooths out random variations.
- Calculate the Residual Component:
- After extracting the trend and seasonal components, the residual (or irregular) component can be determined by subtracting the seasonal component from the detrended data in the additive model, or dividing it in the multiplicative model.
- The residual component represents the noise or random variation left in the data after removing the trend and seasonality.
Q. What are the limitations of classical time series decomposition?
Answer
Classical time series decomposition has these main limitations:
- Missing Trend Estimates: It doesn’t estimate the trend-cycle for the first and last few data points.
- Over-Smoothing: Rapid changes in data are often smoothed out, losing detail.
- Fixed Seasonality: Assumes seasonality is constant over time, which fails with evolving patterns.
- Outlier Sensitivity: Not robust to unusual or extreme values, which can skew the results.
Q. How does STL decomposition work?
Answer
STL is a versatile and robust method for decomposing time series. STL is an acronym for "Seasonal and Trend decomposition using Loess", while loess is a method for estimating nonlinear relationships. STL was designed to handle data that exhibit non-linear patterns and allows for changing seasonality, unlike classical decomposition methods.
Here's how STL decomposition works:
- Loess (Locally Estimated Scatterplot Smoothing): This is a non-parametric technique that uses local weighted regression to smooth parts of the data. STL uses Loess to estimate both the trend and seasonal components.
- Seasonal Estimation:
- STL first removes the rough trend by applying a Loess smoother to the entire series. This detrended series is then used to estimate the seasonal component, again using Loess smoothing but focusing on the seasonal cycle's length.
- The seasonality is allowed to change over time, and STL handles this by breaking the series into cycles and smoothing each separately.
- Trend Estimation:
- Once the seasonal component is subtracted from the original data, what remains (original minus seasonal) is used to estimate the trend using another Loess smoother.
- This step focuses on longer periods than the seasonal estimation to capture the overall direction or trend without short-term fluctuations.
- Residual Calculation:
- The residual component is simply calculated by subtracting both the estimated seasonal and trend components from the original time series.
- Iterative Procedure:
- STL performs these steps iteratively, refining the estimates of trend and seasonality to minimize the residual component. This iterative approach allows STL to adapt to complex and changing patterns in the data.
Q. What are the advantages of using STL over classical decomposition?
Answer
STL has several advantages over classical decomposition:
- The seasonal component is allowed to change over time, and the rate of change can be controlled by the user.
- The smoothness of the trend-cycle can also be controlled by the user.
- It can be robust to outliers (i.e., the user can specify a robust decomposition), so that occasional unusual observations will not affect the estimates of the trend-cycle and seasonal components.
Q. What are the limitations of STL decomposition?
Answer
STL (Seasonal-Trend Decomposition using Loess) is a powerful decomposition method, but it does have some limitations:
- Computational Intensity: It requires significant computational resources, especially for large datasets.
- Lack of Forecasting Capability: STL doesn't directly provide forecasting models; it's primarily for decomposition.
- It does not handle trading day or calendar variation automatically, and it only provides facilities for additive decompositions.
Q. How can we use time series decomposition to measure the strength of trend and seasonality in a time series?
Answer
A time series decomposition can be used to measure the strength of trend and seasonality in a time series:
$$ y_t = T_t + S_t + R_t $$
Strength of Trend
For strongly trended data, the seasonally adjusted data should have much more variation than the remainder component.
$$ F_T = max(0, 1 - \frac{\text{Var}(R_t)}{\text{Var}(T_t + R_t)}) $$
This will give a measure of the strength of the trend between 0 and 1.
Strength of seasonality
The strength of seasonality is defined similarly, but with respect to the detrended data rather than the seasonally adjusted data:
$$ F_S = max(0, 1 - \frac{\text{Var}(R_t)}{\text{Var}(S_t + R_t)}) $$
A series with seasonal strength
Q. What are some simple forecasting methods?
Answer
- Mean method
- Naive Method
- Seasonal Naive Method
- Drift Method
Q. Explain mean method in time series forecasting?
Answer
Mean method assumes that forecasts of all future values are equal to average of historical data. If we let the historical data denoted by
$$ \hat{y}_{T+h|T} = \bar{y} = (y_1 + ... + y_T)/T $$
Mean method forecasts applied to clay brick production in Australia. |
Q. How does naive method works in forecasting?
Answer
For naïve forecasts, we simply set all forecasts to be the value of the last observation. That is,
$$ \hat{y}_{T+h|T} = y_T $$
Note that naive method is also called random walk forecasts.
Naïve forecasts applied to clay brick production in Australia. |
Q. How does seasonal naive method works in forecasting?
Answer
We set each forecast to be equal to the last observed value from the same season (e.g., the same month of the previous year).
$$ \hat{y}{T+h|T} = y{T+h - m(k+1)} $$
Where m = the seasonal period, and
Seasonal naïve forecasts applied to clay brick production in Australia. |
Q. How does drift method works in forecasting?
Answer
A variation on the naïve method is to allow the forecasts to increase or decrease over time, where the amount of change over time (called the drift) is set to be the average change seen in the historical data.
Forecast for time
$$ \hat{y}{T+h|T} = y_T + \frac{h}{T-1} \sum{t=2}^{T}(y_t - y_{t-1}) = Y_T + h(\frac{y_t - y_1}{T - 1}) $$
Where m = the seasonal period, and
Drift forecasts applied to clay brick production in Australia. |
Q. What do you mean by residual in time series model?
Answer
The "residuals" in a time series model are what is left over after fitting a model. The residuals are equal to the difference between the observations and the corresponding fitted values:
$$ e_t = y_t - \hat{y}_t $$
Residuals are useful in checking whether a model has adequately captured the information in the data.
Q. What properties should innovation residuals have to indicate a good forecasting method?
Answer
A good forecasting method will yield innovation residuals with the following properties:
Essential properties:
- The innovation residuals are uncorrelated. If there are correlations between innovation residuals, then there is information left in the residuals which should be used in computing forecasts.
- The innovation residuals have zero mean. If they have a mean other than zero, then the forecasts are biased. Good to have:
- The innovation residuals have constant variance. This is known as “homoscedasticity”.
- The innovation residuals are normally distributed.
Q. What properties should innovation residuals have to indicate a good forecasting method?
Answer
A good forecasting method will yield innovation residuals with the following properties:
Essential properties:
- The innovation residuals are uncorrelated. If there are correlations between innovation residuals, then there is information left in the residuals which should be used in computing forecasts.
- The innovation residuals have zero mean. If they have a mean other than zero, then the forecasts are biased. Good to have:
- The innovation residuals have constant variance. This is known as “homoscedasticity”.
- The innovation residuals are normally distributed.
Q. How do you determine the prediction interval for forecasted values?
Answer
Most time series models produce normally distributed forecasts — that is, we assume that the distribution of possible future values follows a normal distribution. A prediction interval gives an interval within which we expect
$$ \hat{y}_{T+h|T} \pm 1.96\hat{\sigma}_h $$
where
More generally, a prediction interval can be written as
$$ \hat{y}_{T+h|T} \pm c\hat{\sigma}_h $$
where the multiplier
Q. How do you determine the prediction interval for forecasted values?
Answer
Most time series models produce normally distributed forecasts — that is, we assume that the distribution of possible future values follows a normal distribution. A prediction interval gives an interval within which we expect
$$ \hat{y}_{T+h|T} \pm 1.96\hat{\sigma}_h $$
where
More generally, a prediction interval can be written as
$$ \hat{y}_{T+h|T} \pm c\hat{\sigma}_h $$
where the multiplier
Q. Express the standard deviation of the forecast distribution in case of one step prediction?
Answer
When forecasting one step ahead, the standard deviation of the forecast distribution can be estimated using the standard deviation of the residuals given by
$$ \hat{\sigma} = \sqrt{\frac{1}{T - K - M}\sum_{t=1}^{T}e_{t}^2} $$
where
Q. What happens with prediction intervals in case of multi-step forecasting?
Answer
Prediction intervals usually increase in length as the forecast horizon increases. The further ahead we forecast, the more uncertainty is associated with the forecast, and thus the wider the prediction intervals.
Q. For benchmark methods write the standard deviation expression for h-step forecast distribution?
Answer
For the four benchmark methods, it is possible to mathematically derive the forecast standard deviation under the assumption of uncorrelated residuals.
$$
\begin{table}[H]
\centering
\begin{tabular}{|c|c|}
\hline
\textbf{Forecasting Method} & \textbf{h-step forecast standard deviation} \ \hline
Mean &
Q. If the residuals from a fitted forecasting model do not exhibit a normal distribution, how would you establish prediction intervals for the forecasted values?
Answer
When assuming a normal distribution for residuals is not suitable, an alternative approach is to use bootstrapping. This method assumes that the residuals are uncorrelated and have a consistent variance.
Q. Can time series decomposition be utilized for forecasting, and if so, what is the method for doing so?
Answer
Yes, Time series decomposition can be a useful step in producing forecasts. Assuming an additive decomposition, the decomposed time series can be written as
$$ y_t = \hat{S}_t + \hat{A}_t $$
Where
To forecast a decomposed time series, we forecast the seasonal component,
To forecast the seasonally adjusted component, any non-seasonal forecasting method may be used. For example, the drift method, or Holt’s method, or a non-seasonal ARIMA model may be used.
Q. How does forecast errors differ from residuals?
Answer
Forecast errors are different from residuals in two ways. First, residuals are calculated on the training set while forecast errors are calculated on the test set. Second, residuals are based on one-step forecasts while forecast errors can involve multi-step forecasts.
Q. What are different techniques for measuring forecast accuracy?
Answer
We can measure forecast accuracy by summarising the forecast errors in following ways:
Scale-dependent errors
- Mean absolute error(MAE)
- Root mean squared error(RMSE)
Percentage errors
- Mean absolute percentage error(MAPE)
Scaled Errors
Scaled errors were proposed by Hyndman & Koehler (2006) as an alternative to using percentage errors when comparing forecast accuracy across series with different units. They proposed scaling the errors based on the training MAE from a simple forecast method.
Q. Is it feasible to apply the cross-validation technique to evaluate the accuracy of forecasts?
Answer
Yes, In this procedure, there are a series of test sets, each consisting of a single observation. The corresponding training set consists only of observations that occurred prior to the observation that forms the test set.
Cross-validation for time series forecasts |
With time series forecasting, one-step forecasts may not be as relevant as multi-step forecasts. In this case, the cross-validation procedure based on a rolling forecasting origin can be modified to allow multi-step errors to be used. Suppose that we are interested in models that produce good 4-step-ahead forecasts. Then the corresponding diagram is shown below:
Cross-validation for time series multi-step forecasts |
Q. What assumptions do we make when using a linear regression model for forecasting?
Answer
When using a linear regression model, we assume:
-
Model Approximation: The model is a reasonable approximation to reality. This implies that the relationship between the forecast variable and predictor variables is linear.
-
Assumptions about the Errors:
- Mean Zero: The errors have a mean of zero to avoid systematic bias in forecasts.
- No Autocorrelation: The errors are not autocorrelated, ensuring forecasts are efficient without missed information in the data.
- Unrelated to Predictors: The errors are unrelated to predictor variables, suggesting that all relevant information is captured within the model's systematic part.
- Normal Distribution with Constant Variance: It is helpful for the errors to be normally distributed with constant variance (( \sigma^2 )) to facilitate prediction interval calculations.
Q. Explain least squares principle?
Answer
The least squares principle provides a way of choosing the coefficients effectively by minimising the sum of the squared errors. That is, we choose the values of
$$ \sum_{t=1}^T \eta_{t}^2 = \sum_{t=1}^T(y_t - \beta_0 - \beta_1x_{1, t} - ... - \beta_k x_{k, t})^2 $$
This is called least squares estimation because it gives the least value for the sum of squared errors.
Q. What are some typical predictors used in time series regression models?
Answer
There are several useful predictors that occur frequently when using regression for time series data.
- Trend: It is common for time series data to be trending. A linear trend can be modelled by simply using
$x_{1, t}=t$ as predictor,
$$ y_t = \beta_0 + \beta_1t + \eta_t $$
- Dummy variables
- Public holiday
- Seasonal dummy variables
- Day of the week
- Weekends
- Week of the month
- Month
- Quarter
- Intervention variables: It is often necessary to model interventions that may have affected the variable to be forecast.
- Competitor activity
- Advertising expenditure
- Industrial action
- Trading days: The number of trading days in a month can vary considerably and can have a substantial effect on sales data.
- number of Mondays/Sundays in month
- Distributed lags
- Rolling stats
Q. What is Akaike's Information Criterion (AIC)?
Answer
AIC is defined as:
$$ \text{AIC} = T\log(\frac{SSE}{T}) + 2(k+2) $$
$$ \text{SSE} = \sum_{t=1}^T \eta_{t}^2 $$
where
The
Q. How can the AIC score be interpreted?
Answer
The model with the minimum value of the AIC is often the best model for forecasting.
Q. Why do we need to adjust bias in AIC score?
Answer
For small values of$$ \text{AIC}_c = \text{AIC} + \frac{2(k+2)(k+3)}{(T-k-3)} $$
Q. What is Bayesian Information Criterion (BIC)?
Answer
Schwarz’s Bayesian Information Criterion (usually abbreviated to BIC, SBIC or SC):
$$ \text{BIC} = T\log{\frac{SSE}{T}} + (k+2)\log(T) $$
Q. How does AIC differs from BIC?
Answer
BIC penalizes the number of parameters more heavily than the AIC. Although If the value of
Q. What are exponential smoothing methods?
Answer
Exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older.
Q. What is simple exponential smoothing method?
Answer
Simple Exponential Weighted Average, or simple exponential smoothing, is a forecasting method that assigns larger weights to more recent observations and gradually decreases the weights for older observations.
$$ \hat{y}{T+1 | T} = \alpha y_T + \alpha(1-\alpha)y{T-1}+\alpha(1-\alpha)^2y_{T-2}+... $$
Where
Q. What occurs in simple exponential smoothing when (\alpha = 1)?
Answer
For the extreme case where
Q. Write the component form of simple exponential smoothing?
Answer
For simple exponential smoothing, the only component included is the level,
The component form of simple exponential smoothing is given by:
$$ \text{Forecast equation} \hat{y}_{t+h | t} = l_t \
\text{Smoothing equation} l_t = \alpha y_t + (1-\alpha)l_{t-1} $$
where
Q. Why can’t we use the exponential smoothing method for data with seasonality and trend?
Answer
Simple exponential smoothing has a "flat" forecast function:
$$ \hat{y}{T+h | T} = \hat{y}{T+1 | T} = l_T, h = 2, 3, .. $$
That is, all forecasts take the same value, equal to the last level component.
Q. How does Holt's linear trend method work?
Answer
Holt extended simple exponential smoothing to allow the forecasting of data with a trend. This method involves a forecast equation and two smoothing equations (one for the level and one for the trend):
$$ \text{Forecast equation} \hat{y}_{t+h | t} = l_t + hb_t \
\text{Level equation} l_t = \alpha y_t + (1-\alpha)(l_{t-1} + b_{t-1}) \
\text{Trend equation} b_t = \beta^* (l_t - l_{t-1}) + (1-\beta^*)b_{t-1} $$
where
Q. What is the main issue with Holt's linear trend method?
Answer
The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future. Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons.
Q. Explain working of damped trend methods?
Answer
In conjunction with the smoothing parameters
$$ \text{Forecast equation} \hat{y}_{t+h | t} = l_t + (\phi + \phi^2 + ... + \phi^h)b_t \
\text{Level equation} l_t = \alpha y_t + (1-\alpha)(l_{t-1} + \phi b_{t-1}) \
\text{Trend equation} b_t = \beta^* (l_t - l_{t-1}) + (1-\beta^*)\phi b_{t-1} $$
If
Q. What is Holt-Winter's method?
Answer
Holt and Winters extended Holt’s method to capture seasonality. The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations:
- Level
$l_t$ - Trend
$b_t$ - Seasonal Component
$s_t$
Holt-Winters’ additive method
$$ \hat{y}{t+h|t} = \ell_t + h b_t + s{t + h - m(k+1)} $$
$$ \ell_t = \alpha (y_t - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1}) $$
$$ b_t = \beta^* (\ell_t - \ell_{t-1}) + (1 - \beta^*) b_{t-1} $$
$$ s_t = \gamma (y_t - \ell_{t-1} - b_{t-1}) + (1 - \gamma) s_{t-m} $$
Where
Q. In the damped Holt-Winters method with multiplicative seasonality, what role does the parameter ( \phi ) play, and what would be the effect on the trend if ( \phi = 1 ) versus ( \phi < 1 )?
Answer
The parameter ( \phi ) in the damped Holt-Winters method controls the damping of the trend component over time. When ( \phi = 1 ), there is no damping, meaning the trend continues to grow linearly (or decline) indefinitely, as in the traditional Holt-Winters method. In contrast, if ( \phi < 1 ), the trend is damped, causing the effect of the trend to decrease exponentially over time, which often leads to more stable and realistic long-term forecasts for seasonal data.
This damping effect is beneficial for situations where a continuously increasing or decreasing trend is not expected to persist indefinitely, helping the model produce more accurate and robust forecasts for seasonal data with a moderated trend component.
Q. What are state space models in the context of time series analysis?
Answer
State space models in time series analysis consist of two main components: a measurement equation and state equations. The measurement equation describes the observed data, capturing the relationship between the observed values and the underlying states of the model. The state equations describe the dynamics of these unobserved components, such as the level, trend, and seasonal variations, detailing how they evolve over time. This framework allows for a comprehensive modeling of time series data, accommodating changes in different states to better forecast future values.
Q. What is difference between methods and models?
Answer
- Methods are algorithms that return point forecasts.
- A statistical model is a stochastic (or random) data generating process that can produce an entire forecast distribution.
Q. What is the forecast error in simple exponential smoothing model?
Answer
Forecast error is given by:
$$ e_t = y_t - \hat{y}_{t | t-1} $$
$$ e_t = y_t - l_{t-1} $$
Q. What is the forecast error in simple exponential smoothing model?
Answer
Forecast error is given by:
$$ e_t = y_t - \hat{y}_{t | t-1} $$
$$ e_t = y_t - l_{t-1} $$
Q. Write the expression for SES with additive errors?
Answer
Forecast error is given by:
- Measurement equation: It captures relationship between observations and states $$ y_t = l_{t-1} + \eta_t $$
- State equation: Evolution of states through time $$ l_t = l_{t-1} + \alpha \eta_t $$
Q. For an additive error model, maximising the likelihood (assuming normally distributed errors) gives the same results as minimising the sum of squared errors?
Answer
True
Q. For an multiplicative error model, maximising the likelihood (assuming normally distributed errors) gives the same results as minimising the sum of squared errors?
Answer
False
Q. Write the expression of
Answer
For ETS models, Akaike’s Information Criterion (AIC) is defined as
$$ AIC = -2\log{(L)} + 2k $$
where L is the likelihood of the model and k is the total number of parameters and initial states that have been estimated (including the residual variance).
The AIC corrected for small sample bias (
$$ AIC_c = AIC + \frac{2k(k+1)}{T - k -1} $$
and the Bayesian Information Criterion (BIC) is
$$ BIC = AIC + k[\log{(T)} - 2] $$
Q. What is the main difference between ARIMA models and exponential smoothing models?
Answer
Exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.
Q. What What do you mean by stationary time series?
Answer
A stationary time series is one whose statistical properties do not depend on the time at which the series is observed. Thus, time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times.
Q. What What do you mean by stationary time series?
Answer
A stationary time series is characterized by statistical properties such as mean, variance, and autocorrelation that are constant over time. This means that the values of the series are not dependent on the time at which they are observed.
Q. When is a time series considered non-stationary?
Answer
Non-stationary time series exhibit trends or seasonal patterns that affect their statistical properties across different time periods. For instance, a time series with a trend will have a mean that changes over time, and a series with seasonality will show variations at regular intervals, influencing the series' behavior and making standard analyses challenging without adjustments to account for these dependencies.
Q. Is white noise series stationary?
Answer
Yes, white noise series is stationary — it does not matter when you observe it, it should look much the same at any point in time.
Q. Is a time series that exhibits cyclic behavior but lacks any trend or seasonality considered stationary?
Answer
Yes, a time series with cyclic behaviour (but with no trend or seasonality) is stationary. This is because the cycles are not of a fixed length, so before we observe the series we cannot be sure where the peaks and troughs of the cycles will be.
Q. How can we make non-stationary time series stationary?
Answer
To make a non-stationary time series stationary — compute the differences between consecutive observations. This is known as differencing.
Q. How does differencing transform a non-stationary time series into a stationary one?
Answer
Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and therefore eliminating (or reducing) trend and seasonality.
Q. How can we stabilize the variance of a time series?
Answer
Transformations such as logarithms can help to stabilise the variance of a time series.
Q. How can one determine if a time series is stationary?
Answer
An ACF (Autocorrelation Function) plot can help identify whether a time series is stationary. For a stationary time series, the ACF typically declines to zero fairly rapidly, indicating a lack of long-term correlations. In contrast, the ACF of non-stationary time series tends to decrease slowly, suggesting persistent correlations over time. Additionally, in non-stationary data, the first lag autocorrelation, ( r_1 ), is often significantly large and positive.
Q. What is second order differencing?
Answer
Second-order differencing is a technique used in time series analysis to make a non-stationary series stationary. It involves applying differencing twice to the original time series data.
- The first-order difference of a time series is calculated by subtracting the previous observation from the current observation:
$$ Y'{t} = Y_t - Y{t-1} $$
- The second-order difference is calculated as:
$$ Y''{t} = Y'{t} - Y'{t-1} = Y_t - 2Y{t-1} + Y_{t-2} $$
Q. When is Second-Order Differencing Used?
Answer
Second-order differencing is useful when the data has a second-degree trend or when the first-order differencing does not sufficiently stabilize the mean of the series. This makes the series suitable for various forecasting methods, such as ARIMA models, which assume the data is stationary.
Q. What is seasonal differencing?
Answer
A seasonal difference is the difference between an observation and the previous observation from the same season. So
$$ y'{t} = y_t - y{t-m} $$
where m = the number of seasons.
Note that these are also called lag-m differences, as we subtract the observation after a lag of
Q. Why is it important to avoid applying more differencing than necessary in time series analysis?
Answer
Applying more differences than necessary in time series analysis can lead to false dynamics or autocorrelations that aren't actually present in the original data. This can distort the true structure of the series, potentially leading to incorrect interpretations and inaccurate forecasts. Therefore, it’s essential to use the minimum number of differences needed to achieve stationarity, as excessive differencing can artificially introduce complexity and obscure the real patterns in the data.
Q. How can we determine the required order of differencing?
Answer
We can conduct some statistical tests to determine the required order of differencing:
- Augmented dickey fuller test: Null hypothesis is data are non-stationary and non-seasonal
- KPSS test: Null hypothesis is data are stationary and non-seasonal
Q. How can the $d$th-order difference be expressed using backshift notation?
Answer
In general, a $d$th-order difference can be written as:
$$ (1-B)^d y_t $$
Q. What does autoregression indicates?
Answer
The term autoregression indicates that it is a regression of the variable against itself.
Q. What are the differences between a linear regression model and an autoregression model?
Answer
In a multiple regression model, the variable of interest is forecasted using a linear combination of predictors. In contrast, an autoregression model forecasts the variable of interest by using a linear combination of its past values.
Q. What are the differences between a linear regression model and an autoregression model?
Answer
In a linear regression model, the variable of interest is forecasted using a linear combination of predictors. In contrast, an autoregression model forecasts the variable of interest by using a linear combination of its past values.
Q. State the expression of autoregressive model of order
Answer
An autoregressive model of order
$$ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + ... + \phi_p y_{t-p} + \eta_t $$
where
Q. Why are autoregressive models typically restricted to stationary data?
Answer
Autoregressive models are typically restricted to stationary data because their assumptions and predictions rely on the properties of stationarity. In stationary data, statistical properties like mean, variance, and autocorrelation remain constant over time, allowing the autoregressive model to accurately capture relationships based on past values. If the data is non-stationary, these properties may change over time, leading to unreliable forecasts and model instability.
Q. What are the parameter constraints for stationarity in AR(1) and AR(2) models?
Answer
For the model to remain stationary, certain constraints are placed on the parameters:
- For an AR(1) model: (-1 < \phi_1 < 1).
- For an AR(2) model: (-1 < \phi_2 < 1), (\phi_1 + \phi_2 < 1), and (\phi_2 - \phi_1 < 1).
Q. How does moving average model works?
Answer
A moving average model uses past forecast errors in a regression-like model:
$$ y_t = c + \eta_t + \theta_1 \eta_{t-1} + \theta_2 \eta_{t-2} + ... + \theta_q \eta_{t-q} $$
where
Q. What is the difference between moving average model and moving average smoothing?
Answer
A moving average model is used for forecasting future values, while moving average smoothing is used for estimating the trend-cycle of past values.
Q. In the context of ARIMA what does integration mean?
Answer
Integration is the reverse of differencing.
Q. What does ARIMA stands for?
Answer
ARIMA is an acronym for AutoRegressive Integrated Moving Average
Q. What is non seasonal ARIMA model?
Answer
Non seasonal ARIMA model combines differencing with autoregression and a moving average model.
The full model ARIMA(p, d, q) can be written as:
$$ y't = c + \phi_1 y'{t-1} + ... + \phi_p y'{t-p} + \theta_1 \eta{t-1} + ... + \theta_q \eta_{t-q} + \eta_t $$
where
Here,
Q. What is partial autocorrelation?
Answer
Partial autocorrelation measures the relationship between
Q. Why do we need partial autocorrelation instead of just autocorrelation?
Answer
Autocorrelations measure the relationship between ( y_t ) and ( y_{t-k} ) for different values of ( k ). If ( y_t ) and ( y_{t-1} ) are correlated, then ( y_{t-1} ) and ( y_{t-2} ) will also be correlated. Consequently, ( y_t ) and ( y_{t-2} ) might appear correlated—not due to any unique information in ( y_{t-2} ), but because both are connected to ( y_{t-1} ). This indirect connection does not necessarily indicate a direct influence of ( y_{t-2} ) on ( y_t ) for forecasting purposes.
Q. How can we determine the parameters of ARIMA models?
Answer
Using maximum likelihood estimation (MLE)
Q. How does seasonal ARIMA (SARIMA) works?
Answer
The SARIMA (Seasonal ARIMA) model is an extension of the ARIMA model that accounts for seasonal variations in time series data. It combines both non-seasonal and seasonal factors in a multiplicative model. The general equation for a SARIMA model is typically expressed as follows:
SARIMA Expression |
Q. How does the value of (d) affect the prediction interval in ARIMA models?
Answer
Higher values of (d) (e.g., (d = 1) or (d = 2)) mean that the data is being differenced more, which can lead to a loss of information about the level of the original series. As a result, while forecasts may be less biased, the prediction intervals can become wider due to increased uncertainty.
Q. State the difference between ARIMA and ETS models?
Answer
Feature | ARIMA | ETS |
---|---|---|
Model Structure | Uses past values and errors to model autocorrelations. | Explicitly models error, trend, and seasonality components. |
Components | Defined by orders ( p ), ( d ), ( q ) for non-seasonal; ( P ), ( D ), ( Q ) for seasonal. | Models can be additive or multiplicative for each component. |
Data Requirements | Requires data to be stationary. | No stationarity requirement; handles trends and seasonality directly. |
Complexity | Requires identification of model orders, which can be complex. | Simpler setup with systematic component selection. |
Usage | Robust across various datasets, adaptable for complex patterns. | Easier to use when clear seasonal and trend patterns are present. |
Predictive Performance | Versatile but may require careful optimization. | Generally effective, particularly in datasets with strong seasonal components. |
Theoretical Basis | Focuses on sequential dependencies in data. | Decomposes time series into error, trend, and seasonal elements. |
Ideal Use Case | Best for datasets where autocorrelation modeling is critical. | Preferred for clear, straightforward seasonal and trend forecasting. |
Q. What are the differences between ARIMA models and regression models?
Answer
Feature | ARIMA Models | Regression Models |
---|---|---|
Purpose | Time series forecasting | Assessing relationships between variables |
Data Type | Time series data | Various data types (cross-sectional, time series, panel) |
Assumptions | Stationarity; relies on autocorrelation | Linear relationships; independent and identically distributed residuals |
Model Structure | Comprises AR, I (differencing), and MA components | Linear equation with coefficients for independent variables |
Forecasting Method | Based on past values and patterns in time series | Based on established relationships between dependent and independent variables |
Focus | Temporal dependencies | Relationships and influences among variables |
Q. What are the limitations of regression and ARIMA models, and how do dynamic regression models address these needs?
Answer
Regression models excel at incorporating relevant predictor variables but often overlook the subtle time series dynamics, such as trends and seasonality, that are essential for accurate forecasting. On the other hand, ARIMA models effectively capture these temporal patterns using past observations but fail to include other important external factors, like holidays, competitor actions, or economic changes, which can significantly influence the data.
To overcome these limitations, dynamic regression models merge the strengths of both approaches. They allow for the inclusion of external variables while also accounting for time-dependent behaviors, enabling a more comprehensive analysis. This integration enhances forecast accuracy by leveraging both the historical patterns inherent in the time series and the additional context provided by relevant predictors.
Q. How does dynamic regression models differ from regression models?
Answer
Traditional regression model takes the form of
$$ y_t = \beta_0 + \beta_1 x_{1, t} + .. + \beta_k x_{k, t} + \epsilon_t $$
where
Dynamic regression allows the errors from a regression to contain autocorrelation. To emphasise this change in perspective, we will replace
If
$$ y_t = \beta_0 + \beta_1 x_{1, t} + .. + \beta_k x_{k, t} + \eta_t $$
$$ (1 - \phi_1 B)(1 - B)\eta_t = (1 + \theta_1 B)\epsilon_t $$
Where
Note that the model includes two error terms: the error from the regression model, denoted as ( \eta_t ), and the error from the ARIMA model, denoted as ( \epsilon_t ). It is important to note that only the errors from the ARIMA model are assumed to be white noise.
Q. What must be checked before estimating a regression with ARMA errors?
Answer
All of the variables in the model, including (y_t) and the predictors ((x_{1,t}, \ldots, x_{k,t})), must be stationary. If any of these are non-stationary, the estimated coefficients may not be consistent. However, if the non-stationary variables are co-integrated and there exists a stationary linear combination, then the estimated coefficients will be consistent.
Q. How do you forecast using a regression model with ARIMA errors?
Answer
To forecast using a regression model with ARIMA errors, you need to forecast both the regression part and the ARIMA part of the model and then combine the results. If the predictors are known into the future, forecasting is straightforward. However, if the predictors are unknown, you must either model them separately or use assumed future values for each predictor.
Q. How do you forecast using a regression model with ARIMA errors?
Answer
To forecast using a regression model with ARIMA errors, you need to forecast both the regression part and the ARIMA part of the model and then combine the results. If the predictors are known into the future, forecasting is straightforward. However, if the predictors are unknown, you must either model them separately or use assumed future values for each predictor.
Q. What is dynamic harmonic regression?
Answer
Harmonic regression is a type of regression analysis used to model periodic or seasonal data. It incorporates sine and cosine terms to capture the cyclical patterns within the data. This method is particularly useful when the data exhibits regular fluctuations, such as daily, monthly, or yearly trends.
The model typically includes terms like:
[ y_t = \beta_0 + \beta_1 \cos(2\pi ft) + \beta_2 \sin(2\pi ft) + \epsilon_t ]
where (f) is the frequency of the cycles.
Q. What is benefit of dynamic harmonic regression?
Answer
- It allows any length seasonality;
- for data with more than one seasonal period, Fourier terms of different frequencies can be included;
- the smoothness of the seasonal pattern can be controlled by
$K$ , the number of Fourier sin and cos pairs – the seasonal pattern is smoother for smaller values of$K$ - the short-term dynamics are easily handled with a simple ARMA error.
Q. What are lag predictors?
Answer
Lagged predictors are predictors that have an impact that is not immediate. For example, an advertising campaign's impact on sales may continue for some time after the campaign ends.
Q. What challenges arise when forecasting higher frequency time series data with complicated seasonal patterns?
Answer
Higher frequency time series data, such as daily and hourly data, often exhibit complex seasonal patterns, including multiple types of seasonality (e.g., daily, weekly, and annual). For example, daily data can have both weekly and annual patterns, while hourly data typically includes daily, weekly, and annual seasonality. Additionally, weekly data poses challenges because there are not a whole number of weeks in a year, resulting in an average annual seasonal period of approximately (365.25/7 \approx 52.179).
Q. What is the Prophet model, and what are its key features?
Answer
The Prophet model, developed by Facebook, is designed for forecasting time series data, particularly daily data with strong seasonal patterns and holiday effects. It can handle various types of seasonal data and works best with several seasons of historical data. The model can be represented as:
[ y_t = g(t) + s(t) + h(t) + \epsilon_t ]
where (g(t)) describes a piecewise-linear trend, (s(t)) captures seasonal patterns using Fourier terms, (h(t)) accounts for holiday effects, and (\epsilon_t) is a white noise error term.
Key features include:
- Automatic selection of knots (changepoints) for the trend.
- Optional logistic function for setting an upper trend bound.
- Default settings of order 10 for annual seasonality and order 3 for weekly seasonality.
- Use of a Bayesian approach for model estimation, allowing for automatic selection of changepoints and other characteristics.
Q. What is vector autoregression?
Answer
Vector autoregression (VAR) is a statistical model designed to capture the relationships among multiple variables as they evolve over time. As a type of stochastic process model, VAR extends the concept of univariate autoregressive models to accommodate multivariate time series data.
Q. How does the VAR model address stationarity and non-stationarity in time series data?
Answer
If the series are stationary, we forecast them by fitting a VAR to the data directly (known as a “VAR in levels”). If the series are non-stationary, we take differences of the data in order to make them stationary, then fit a VAR model (known as a “VAR in differences”).
Q. Write the governing equation of 2-dimensional VAR(1) model?
Answer
2-dimensional VAR(1) model as:
$$ y_{1, t} = c_1 + \phi_{11,1}y_{1,t-1} + \phi_{12,1}y_{2,t-1} + \epsilon_{1, t} $$
$$ y_{2, t} = c_2 + \phi_{21,1}y_{1,t-1} + \phi_{22,1}y_{2,t-1} + \epsilon_{2, t} $$
Q. What are the challenges of dealing with weekly, daily and sub-daily data?
Answer
Issue Weekly Data
Working with weekly data is challenging because the seasonal period is large and non-integer, averaging 52.18 weeks in a year. Most methods require the seasonal period to be an integer, and even approximating it to 52 can lead to inefficiencies in handling such a long seasonal cycle.
Daily and sub-daily data
They involve multiple seasonal patterns, and so we need to use a method that handles such complex seasonality.
Q. How to handle missing values in time series?
Answer
We can use following approaches for handling missing data:
-
Assess Potential Bias: Determine if the missing data could introduce bias in your forecasts. For instance, if missing sales data occurs on public holidays, it may lead to underestimating sales on the following day.
-
Use Dynamic Regression Models: For situations where the missingness is related to specific events (like public holidays), implement dynamic regression models with dummy variables to indicate relevant days, as automated methods may not adequately address such context-specific effects.
-
Random Missingness: If the missing values are random (e.g., due to recording errors), and their absence is not informative for forecasting, you can handle them more easily, possibly through imputation methods.
-
Remove Unusual Observations: If certain unusual observations are removed from the dataset, this may create missing values, which can also be managed depending on the context.