anomalize
R package is now available in timetk
:
anomlize()
: 1 function that breaks down, identifies, and cleans anomaliesplot_anomalies()
: Visualize the anomalies and anomaly bandsplot_anomalies_decomp()
: Visualize the time series decomposition. Make adjustments as needed.plot_anomalies_cleaned()
: Visualize the before/after of cleaning anomalies.
Note - anomalize(.method)
: Only the .method = "stl"
is supported at this time. The "twitter"
method is also planned.
- Removed dependency on tidymodels. (#154, @olivroy).
Update forecasting vignette: Use glmnet
for time series forecasting.
CRAN Fixes:
tzdata
time zone fixes:- GB -> Europe/London
- NZ -> Pacific/Auckland
- US/Eastern -> America/New_York
- US/Pacific -> America/Los_Angeles
- Add
@aliases
to timetk-package
- remove support for
robets
- remove
tidyquant
from examples - remove
tidyverse
from examples - add
FANG
dataset totimetk
(port fromtidyquant
) - cran: fix return, dontrun -> donttest, options(max.print)
New Features
plot_time_series()
: Gets new arguments to specify.x_intercept
and.x_intercept_color
. #131
Fixes
- Fix error in
plot_time_series()
when.group_names
is not found. #121 - Merge variable checking update needed for
recipes >= 1.0.3
#132
- Expose the
facet_trelliscope()
plotting parameters.plot_time_series()
plot_time_series_boxplot()
plot_anomaly_diagnostics()
New Features
Many of the plotting functions have been upgraded for use with trelliscopejs
for
easier visualization of many time series.
-
plot_time_series()
:- Gets a new argument
trelliscope
: Used for visualizing many time series. - Gets a new argument
.facet_strip_remove
to remove facet strips since trelliscope is automatically labeled. - Gets a new argument
.facet_nrow
to adjust grid with trelliscope. - The default argument for
facet_collapse = TRUE
was changed toFALSE
for better compatibility with Trelliscope JS. This may cause some plots to have multiple groups take up extra space in the strip.
- Gets a new argument
-
plot_time_series_boxplot()
:- Gets a new argument
trelliscope
: Used for visualizing many time series. - Gets a new argument
.facet_strip_remove
to remove facet strips since trelliscope is automatically labeled. - Gets a new argument
.facet_nrow
to adjust grid with trelliscope. - The default argument for
.facet_collapse = TRUE
was changed toFALSE
for better compatibility with Trelliscope JS. This may cause some plots to have multiple groups take up extra space in the strip.
- Gets a new argument
-
plot_anomaly_diagnostics()
:- Gets a new argument
trelliscope
: Used for visualizing many time series. - Gets a new argument
.facet_strip_remove
to remove facet strips since trelliscope is automatically labeled. - Gets a new argument
.facet_nrow
to adjust grid with trelliscope. - The default argument for
.facet_collapse = TRUE
was changed toFALSE
for better compatibility with Trelliscope JS. This may cause some plots to have multiple groups take up extra space in the strip.
- Gets a new argument
Updates & Bug Fixes
-
Recipes steps (e.g.
step_timeseries_signature()
) use the newrecipes::print_step()
function. Requiresrecipes >= 0.2.0
. #110 -
Offset parameter in
step_log_interval()
was not working properly. Now works. #103
Potential Breaking Changes
- The default argument for
.facet_collapse = TRUE
was changed toFALSE
for better compatibility with Trelliscope JS. This may cause some plots to have multiple groups take up extra space in the strip.
New Features
-
tk_tsfeatures()
: A new function that makes it easy to generate time series feature matrix usingtsfeatures
. The main benefit is that you can pipe time series data intibbles
withdplyr
groups. The features will be produced by group. #95 #84 -
plot_time_series_boxplot()
: A new function that makes plotting time series boxplots simple using a.period
argument for time series aggregation.
New Vignettes
-
Time Series Clustering: Uses the new
tk_tsfeatures()
function to perform time series clustering. #95 #84 -
Time Series Visualization: Updated to include
plot_time_series_boxplot()
andplot_time_series_regression()
.
Improvements
Improvements for point forecasting when the target is n-periods into the future.
time_series_cv()
,time_series_split()
: New parameterpoint_forecast
. This is useful for testing / assessing the n-th prediction in the future. When set toTRUE
, will return a single point that returns on the last value inassess
.
Fixes
- Updates for rlang > 0.4.11 (dev version) #98
plot_time_series()
: Smoother no longer fails when time series has 1 observation #106
Improvements
-
summarize_by_time()
: Added a.week_start
argument to allow specifying.week_start = 1
for Monday start. Default is 7 for Sunday Start. This can also be changed with thelubridate
by setting thelubridate.week.start
option. -
Plotting Functions:
- Several plotting functions gain a new
.facet_dir
argument for adjusting the direction offacet_wrap(dir)
. #94 - Plot ACF Diagnostics (
plot_acf_diagnostics()
): Change default parameter to.show_white_noise_bars = TRUE
. #85 plot_timeseries_regression()
: Can nowshow_summary
for group-wise models when visualizing groups
- Several plotting functions gain a new
-
Time Series CV (
time_series_cv()
): Add Label fortune_results
-
Improve speed of
pad_by_time()
. #93
Bug Fixes
-
tk_make_timeseries()
andtk_make_future_timeseries()
are now able to handle end of months. #72 -
tk_tbl.zoo()
: Fix an issue whenreadr::type_convert()
produces warning messages about not having character columns in inputs. #89 -
plot_time_series_regression()
: Fixed an issue when lags are added to.formula
. Pads lags with NA. -
step_fourier()
andfourier_vec()
: Fixed issue with step_fourier failing with one observation. Added scale_factor argument to override date sequences with the stored scale factor. #77
Improvements
tk_augment_slidify()
,tk_augment_lags()
,tk_augment_leads()
,tk_augment_differences()
: Now works with multiple columns (passed via.value
) andtidyselect
(e.g.contains()
).
Fixes
- Reduce "New names" messages.
#> New names:
#> * NA -> ...1
- Remove dependency on
lazyeval
. #24 - Fix deprecated functions:
select_()
used withtk_xts_()
. #52
New Functions
filter_period()
(#64): Applies filtering expressions within time-based periods (windows).slice_period()
(#64): Applies slices within time-based periods (windows).condense_period()
(#64): Converts a periodicity from a higher (e.g. daily) to lower (e.g. monthly) frequency. Similar toxts::to.period()
andtibbletime::as_period()
.tk_augment_leads()
andlead_vec()
(#65): Added to make it easier / more obvious on how to create leads.
Fixes
time_series_cv()
: Fix bug with Panel Data. Train/Test Splits only returning 1st observation in final time stamp. Should return all observations.future_frame()
andtk_make_future_timeseries()
: Now sort the incoming index to ensure dates returned go into the future.tk_augment_lags()
andtk_augment_slidify()
: Now overwrite column names to match the behavior oftk_augment_fourier()
andtk_augment_differences()
.
Improvements
time_series_cv()
: Now works with time series groups. This is great for working with panel data.future_frame()
: Gets a new argument called.bind_data
. When set toTRUE
, it performs a data binding operation with the incoming data and the future frame.
Miscellaneous
- Tune startup messages (#63)
step_slidify_augment()
- A variant of step slidify that adds multiple rolling columns inside of a recipe.
Bug Fixes
- Add warning when
%+time%
and%-time%
return missing values - Fix issues with
tk_make_timeseries()
andtk_make_future_timeseries()
providing odd results for regular time series. GitHub Issue 60
New Functionality
-
tk_time_series_cv_plan()
- Now works with k-fold cross validation objects fromvfold_cv()
function. -
pad_by_time()
- Added new argument.fill_na_direction
to specify atidyr::fill()
strategy for filling missing data.
Bug Fixes
- Augment functions (e.g.
tk_augment_lags()
) - Fix bug with grouped functions not being exported - Vectorized Functions - Compatabiliy with
ts
class
New Functions
step_log_interval_vec()
- Extends thelog_interval_vec()
forrecipes
preprocessing.
Parallel Processing
- Parallel backend for use with
tune
andrecipes
Bug Fixes
log_interval_vec()
- Correct the messagingcomplement.ts_cv_split
- Helper to show time series cross validation splits in list explorer.
New Functions
mutate_by_time()
: For applying mutates by time windowslog_interval_vec()
&log_interval_inv_vec()
: For constrained interval forecasting.
Improvements
plot_acf_diagnostics()
: A new argument,.show_white_noise_bars
for adding white noise bars to an ACF / PACF Plot.pad_by_time()
: New arguments.start_date
and.end_date
for expanding/contracting the padding windows.
New Functions
plot_time_series_regression()
: Convenience function to visualize & explore features using Linear Regression (stats::lm()
formula).time_series_split()
: A convenient way to return a single split fromtime_series_cv()
. Returns the split in the same format asrsample::initial_time_split()
.
Improvements
- Auto-detect date and date-time: Affects
summarise_by_time()
,filter_by_time()
,tk_summary_diagnostics
tk_time_series_cv_plan()
: Allow a single resample fromrsample::initial_time_split
ortimetk::time_series_split
- Updated Vignette: The vignette, "Forecasting Using the Time Series Signature", has been updated with
modeltime
andtidymodels
.
Plotting Improvements
- All plotting functions now support Tab Completion (a minor breaking change was needed to do so, see breaking changes below)
plot_time_series()
:- Add
.legend_show
to toggle on/off legends. - Permit numeric index (fix issue with smoother failing)
- Add
Breaking Changes
- Tab Completion: Replace
...
with.facet_vars
or.ccf_vars
. This change is needed to improve tab-completion. It affects :plot_time_series()
plot_acf_diagnostics()
plot_anomaly_diagnostics()
plot_seasonal_diagnostics()
plot_stl_diagnostics()
Bug Fixes
fourier_vec()
andstep_fourier_vec()
: Add error if observations have zero difference. Issue #40.
New Interactive Plotting Functions
plot_anomaly_diagnostics()
: Visualize Anomalies for One or More Time Series
New Data Wrangling Functions
future_frame()
: Make a future tibble from an existing time-based tibble.
New Diagnostic / Data Processing Functions
tk_anomaly_diagnostics()
- Group-wise anomaly detection and diagnostics. A wrapper for theanomalize
R package functions without importinganomalize
.
New Vectorized Functions:
ts_clean_vec()
- Replace Outliers & Missing Values in a Time Seriesstandardize_vec()
- Centers and scales a time series to mean 0, standard deviation 1normalize_vec()
- Normalizes a time series to Range: (0, 1)
New Recipes Preprocessing Steps:
step_ts_pad()
- Preprocessing for padding time series data. Adds rows to fill in gaps and can be used withstep_ts_impute()
to interpolate going from low to high frequency!step_ts_clean()
- Preprocessing step for cleaning outliers and imputing missing values in a time series.
New Parsing Functions
parse_date2()
andparse_datetime2()
: These are similar toreadr::parse_date()
andlubridate::as_date()
in that they parse character vectors to date and datetimes. The key advantage is SPEED.parse_date2()
usesanytime
package to process using C++Boost.Date_Time
library.
Improvements:
plot_acf_diagnostics()
: The.lags
argument now handles time-based phrases (e.g..lags = "1 month"
).time_series_cv()
: Implements time-based phrases (e.g.initial = "5 years"
andassess = "1 year"
)tk_make_future_timeseries()
: Then_future
argument has been deprecated for a newlength_out
argument that accepts both numeric input (e.g.length_out = 12
) and time-based phrases (e.g.length_out = "12 months"
). A major improvement is that numeric values define the number of timestamps returned even if weekends are removed or holidays are removed. Thus, you can always anticipate the length. (Issue #19).diff_vec
: Now reports the initial values used in the differencing calculation.
Bug Fixes:
plot_time_series()
:- Fix name collision when
.value = .value
.
- Fix name collision when
tk_make_future_timeseries()
:- Respect timezones
time_series_cv()
:- Fix incorrect calculation of starts/stops
- Make
skip = 1
default.skip = 0
does not make sense. - Fix issue with
skip
adding 1 to stops. - Fix printing method
plot_time_series_cv_plan()
&tk_time_series_cv_plan()
:- Prevent name collisions when underlying data has column "id" or "splits"
tk_make_future_timeseries()
:- Fix bug when day of month doesn't exist. Lubridate
period()
returnsNA
. Fix implemented withceiling_date()
.
- Fix bug when day of month doesn't exist. Lubridate
pad_by_time()
:- Fix
pad_value
so only inserts pad values where new row was inserted.
- Fix
step_ts_clean()
,step_ts_impute()
:- Fix issue with
lambda = NULL
- Fix issue with
Breaking Changes:
These should not be of major impact since the 1.0.0 version was just released.
- Renamed
impute_ts_vec()
tots_impute_vec()
for consistency withts_clean_vec()
- Renamed
step_impute_ts()
tostep_ts_impute()
for consistency with underlying function - Renamed
roll_apply_vec()
toslidify_vec()
for consistency withslidify()
& relationship toslider
R package - Renamed
step_roll_apply
tostep_slidify()
for consistency withslidify()
& relationship toslider
R package - Renamed
tk_augment_roll_apply
totk_augment_slidify()
for consistency withslidify()
& relationship toslider
R package plot_time_series_cv_plan()
andtk_time_series_cv_plan()
: Changed argument from.rset
to.data
.
New Interactive Plotting Functions:
plot_time_series()
- A workhorse time-series plotting function that generates interactiveplotly
plots, consolidates 20+ lines ofggplot2
code, and scales well to many time series using dplyr groups.plot_acf_diagnostics()
- Visualize the ACF, PACF, and any number of CCFs in one plot for Multiple Time Series. Interactiveplotly
by default.plot_seasonal_diagnostics()
- Visualize Multiple Seasonality Features for One or More Time Series. Interactiveplotly
by default.plot_stl_diagnostics()
- Visualize STL Decomposition Features for One or More Time Series.plot_time_series_cv_plan()
- Visualize the Time Series Cross Validation plan made withtime_series_cv()
.
New Time Series Data Wrangling:
summarise_by_time()
- A time-based variant ofdplyr::summarise()
for flexible summarization using common time-based criteria.filter_by_time()
- A time-based variant ofdplyr::filter()
for flexible filtering by time-ranges.pad_by_time()
- Insert time series rows with regularly spaced timestamps.slidify()
- Make any function a rolling / sliding function.between_time()
- A time-based variant ofdplyr::between()
for flexible time-range detection.add_time()
- Add for time series index. Shifts an index by aperiod
.
New Recipe Functions:
Feature Generators:
step_holiday_signature()
- New recipe step for adding 130 holiday features based on individual holidays, locales, and stock exchanges / business holidays.step_fourier()
- New recipe step for adding fourier transforms for adding seasonal features to time series datastep_roll_apply()
- New recipe step for adding rolling summary functions. Similar torecipes::step_window()
but is more flexible by enabling application of any summary function.step_smooth()
- New recipe step for adding Local Polynomial Regression (LOESS) for smoothing noisy time seriesstep_diff()
- New recipe for adding multiple differenced columns. Similar torecipes::step_lag()
.step_box_cox()
- New recipe for transforming predictors. Similar tostep_BoxCox()
with improvements for forecasting including "guerrero" method for lambda selection and handling of negative data.step_impute_ts()
- New recipe for imputing a time series.
New Rsample Functions
time_series_cv()
- Creatersample
cross validation sets for time series. This function produces a sampling plan starting with the most recent time series observations, rolling backwards.
New Vector Functions:
These functions are useful on their own inside of mutate()
and power many of the new plotting and recipes functions.
roll_apply_vec()
- Vectorized rolling apply function - wrapsslider::slide_vec()
smooth_vec()
- Vectorized smoothing function - Applies Local Polynomial Regression (LOESS)diff_vec()
anddiff_inv_vec()
- Vectorized differencing function. PadsNA
's by default (unlikestats::diff
).lag_vec()
- Vectorized lag functions. Returns both lags and leads (negative lags) by adjusting the.lag
argument.box_cox_vec()
,box_cox_inv_vec()
, &auto_lambda()
- Vectorized Box Cox transformation. Leveragesforecast::BoxCox.lambda()
for automatic lambda selection.fourier_vec()
- Vectorized Fourier Series calculation.impute_ts_vec()
- Vectorized imputation of missing values for time series. Leveragesforecast::na.interp()
.
New Augment Functions:
All of the functions are designed for scale. They respect dplyr::group_by()
.
tk_augment_holiday_signature()
- Add holiday features to adata.frame
using only a time-series index.tk_augment_roll_apply()
- Add multiple columns of rolling window calculations to adata.frame
.tk_augment_differences()
- Add multiple columns of differences to adata.frame
.tk_augment_lags()
- Add multiple columns of lags to adata.frame
.tk_augment_fourier()
- Add multiple columns of fourier series to adata.frame
.
New Make Functions:
Make date and date-time sequences between start and end dates.
tk_make_timeseries()
- Super flexible function for creating daily and sub-daily time series.tk_make_weekday_sequence()
- Weekday sequence that accounts for both stripping weekends and holidaystk_make_holiday_sequence()
- Makes a sequence of dates corresponding to business holidays in calendars fromtimeDate
(common non-working days)tk_make_weekend_sequence()
- Weekday sequence of dates for Saturday and Sunday (common non-working days)
New Get Functions:
tk_get_holiday_signature()
- Get 100+ holiday features using only a time-series index.tk_get_frequency()
andtk_get_trend()
- Automatic frequency and trend calculation from a time series index.
New Diagnostic / Data Processing Functions
tk_summary_diagnostics()
- Group-wise time series summary.tk_acf_diagnostics()
- The data preparation function forplot_acf_diagnostics()
tk_seasonal_diagnostics()
- The data preparation function forplot_seasonal_diagnostics()
tk_stl_diagnostics()
- Group-wise STL Decomposition (Season, Trend, Remainder). Data prep forplot_stl_diagnostics()
.tk_time_series_cv_plan
- The data preparation function forplot_time_series_cv_plan()
New Datasets
- M4 Competition - Sample "economic" datasets from hourly, daily, weekly, monthly, quarterly, and yearly.
- Walmart Recruiting Retail Sales Forecasting Competition - Sample of 7 retail time series
- Web Traffic Forecasting (Wikipedia) Competition - Sample of 10 website time series
- Taylor's Energy Demand - Single time series with 30-minute interval of energy demand
- UCI Bike Sharing Daily - A time series consisting of Capital Bikesharing Transaction Counts and related time-based features.
Improvements:
tk_make_future_timeseries()
- Now acceptsn_future
as a time-based phrase like "12 seconds" or "1 year".
Bug Fixes:
- Don't set timezone on date - Accommodate recent changes to
lubridate::tz<-
which now returns POSIXct when used Date objects. Fixed in PR32 by @vspinu.
Potential Breaking Changes:
tk_augment_timeseries_signature()
- Changed fromdata
to.data
to prevent name collisions when piping.
New Features:
recipes
Integration - Ability to apply time series feature engineering in thetidymodels
machine learning workflow.step_timeseries_signature()
- Newstep_timeseries_signature()
for adding date and date-time features.
- New Vignette - "Time Series Machine Learning" (previously forecasting using the time series signature)
Bug Fixes:
xts::indexTZ
is deprecated. Usetzone
instead.- Replace
arrange_
witharrange
. - Fix failing tests due to
tidyquant
1.0.0 upagrade (single stocks now return an extra symbol column).
- Compatability with
tidyquant
v0.5.7 - Removed dependency ontidyverse
- Dependency cleanup - removed devtools and other unncessary dependencies.
- Added
timeSeries
to Suggests to satisfy a CRAN issue.
- Renamed package
timetk
. Was formerlytimekit
. - Improvements:
- Fixed issue with back-ticked date columns
- Update pkgdown
- support for
robets