Introducing Modeltime: Tidy Time Series Forecasting using Tidymodels
Written by Matt Dancho
Iβm beyond excited to introduce modeltime
, a new time series forecasting package designed to speed up model evaluation, selection, and forecasting. modeltime
does this by integrating the tidymodels
machine learning ecosystem of packages into a streamlined workflow for tidyverse
forecasting. Follow the updated modeltime article to get started with modeltime
.
- Weβll first showcase the Modeltime Ecosystem at a glance
- Weβll then explain the benefits of
modeltime
- Then weβll go through a full Modeltime Workflow where youβll build Automatic (Prophet, ARIMA), Machine Learning (Elastic Net, Random Forest), and Hybrid (Prophet-XGBoost) Models with Modeltime
If you like what you see, I have an Advanced Time Series Course where you will become the time-series expert for your organization by learning modeltime
and timetk
.
Time Series Forecasting Article Guide:
This article is part of a series of software announcements on the Modeltime Forecasting Ecosystem.
-
(Start Here) Modeltime: Tidy Time Series Forecasting using Tidymodels
-
Modeltime H2O: Forecasting with H2O AutoML
-
Modeltime Ensemble: Time Series Forecast Stacking
-
Modeltime Recursive: Tidy Autoregressive Forecasting
-
Hyperparameter Tuning Forecasts in Parallel with Modeltime
-
Time Series Forecasting Course: Now Available
Like these articles?
π Register to stay in the know
π
on new cutting-edge R software like modeltime
.
Meet the Modeltime Ecosystem
A growing ecosystem for tidymodels forecasting
Modeltime is part of a growing ecosystem of Modeltime forecasting packages. The main purpose of the Modeltime Ecosystem is to develop scalable forecasting systems.
Modeltime
The forecasting framework for the tidymodels ecosystem
modeltime
is a new package designed for rapidly developing and testing time series models using machine learning models, classical models, and automated models. There are three key benefits:
-
Systematic Workflow for Forecasting. Learn a few key functions like modeltime_table()
, modeltime_calibrate()
, and modeltime_refit()
to develop and train time series models.
-
Unlocks Tidymodels for Forecasting. Gain the benefit of all or the parsnip
models including boost_tree()
(XGBoost, C5.0), linear_reg()
(GLMnet, Stan, Linear Regression), rand_forest()
(Random Forest), and more
-
New Time Series Boosted Models including Boosted ARIMA (arima_boost()
) and Boosted Prophet (prophet_boost()
) that can improve accuracy by applying XGBoost model to the errors
Get the Cheat Sheet
As you go through this tutorial, it may help to use the Ultimate R Cheat Sheet. Page 3 Covers the Modeltime Forecasting Ecosystem with links to key documentation.
Forecasting Ecosystem Links (Ultimate R Cheat Sheet)
Getting Started
Letβs kick the tires on modeltime
Install modeltime
.
Load the following libraries.
Get Your Data
Forecasting daily bike transactions
Weβll start with a bike_sharing_daily
time series data set that includes bike transactions. Weβll simplify the data set to a univariate time series with columns, βdateβ and βvalueβ.
Next, visualize the dataset with the plot_time_series()
function. Toggle .interactive = TRUE
to get a plotly interactive plot. FALSE
returns a ggplot2 static plot.
Train / Test
Split your time series into training and testing sets
Next, use time_series_split()
to make a train/test set.
- Setting
assess = "3 months"
tells the function to use the last 3-months of data as the testing set.
- Setting
cumulative = TRUE
tells the sampling to use all of the prior data as the training set.
Next, visualize the train/test split.
tk_time_series_cv_plan()
: Converts the splits object to a data frame
plot_time_series_cv_plan()
: Plots the time series sampling data using the βdateβ and βvalueβ columns.
Modeling
This is exciting.
Now for the fun part! Letβs make some models using functions from modeltime
and parsnip
.
1. Automatic Models
Automatic models are generally modeling approaches that have been automated. This includes βAuto ARIMAβ and βAuto ETSβ functions from forecast
and the βProphetβ algorithm from prophet
. These algorithms have been integrated into modeltime
. The process is simple to set up:
- Model Spec: Use a specification function (e.g.
arima_reg()
, prophet_reg()
) to initialize the algorithm and key parameters
- Engine: Set an engine using one of the engines available for the Model Spec.
- Fit Model: Fit the model to the training data
Letβs make several models to see this process in action.
Auto ARIMA
Hereβs the basic Auto Arima Model fitting process.
- Model Spec:
arima_reg()
<β This sets up your general model algorithm and key parameters
- Set Engine:
set_engine("auto_arima")
<β This selects the specific package-function to use and you can add any function-level arguments here.
- Fit Model:
fit(value ~ date, training(splits))
<β All modeltime models require a date column to be a regressor.
Prophet
Prophet is specified just like Auto ARIMA. Note that Iβve changed to prophet_reg()
and Iβm supplying seasonality_yearly = TRUE)
.
2. Machine Learning Models
Machine learning models are more complex than the automated models. This complexity typically requires a workflow (sometimes called a pipeline in other languages). The general process goes like this:
- Create Preprocessing Recipe
- Create Model Specifications
- Use Workflow to combine Model Spec and Preprocessing, and Fit Model
Preprocessing Recipe
First, Iβll create a preprocessing recipe using recipe()
and adding time series steps. The process uses the βdateβ column to create 45 new features that Iβd like to model. These include time-series signature features and fourier series.
With a recipe in-hand, we can set up our machine learning pipelines.
Elastic Net
Making an Elastic NET model is easy to do. Just set up your model spec using linear_reg()
and set_engine("glmnet")
. Note that we have not fitted the model yet (as we did in previous steps).
Next, make a fitted workflow:
- Start with a
workflow()
- Add a Model Spec:
add_model(model_spec_glmnet)
- Add Preprocessing:
add_recipe(recipe_spec %>% step_rm(date))
<β Note that Iβm removing the βdateβ column since Machine Learning algorithms donβt typically know how to deal with date or date-time features
- Fit the Workflow:
fit(training(splits))
Random Forest
We can fit a Random Forest using a similar process as the Elastic Net.
3. Hybrid ML Models
Iβve included several hybrid models (e.g. arima_boost()
and prophet_boost()
) that combine both automated algorithms with machine learning. Iβll showcase prophet_boost()
next!
Prophet Boost
The Prophet Boost algorithm combines Prophet with XGBoost to get the best of both worlds (i.e. Prophet Automation + Machine Learning). The algorithm works by:
- First modeling the univariate series using Prophet
- Using regressors supplied via the preprocessing recipe (remember our recipe generated 45 new features), and regressing the Prophet Residuals with the XGBoost model
We can set the model up using a workflow just like with the machine learning algorithms.
The Modeltime Workflow
Speed up model evaluation and selection with modeltime
The modeltime
workflow is designed to speed up model evaluation and selection. Now that we have several time series models, letβs analyze them and forecast the future with the modeltime
workflow.
Modeltime Table
The Modeltime Table organizes the models with IDs and creates generic descriptions to help us keep track of our models. Letβs add the models to a modeltime_table()
.
Calibration
Model Calibration is used to quantify error and estimate confidence intervals. Weβll perform model calibration on the out-of-sample data (aka. the Testing Set) with the modeltime_calibrate()
function. Two new columns are generated (β.typeβ and β.calibration_dataβ), the most important of which is the β.calibration_dataβ. This includes the actual values, fitted values, and residuals for the testing set.
Forecast (Testing Set)
With calibrated data, we can visualize the testing predictions (forecast).
- Use
modeltime_forecast()
to generate the forecast data for the testing set as a tibble.
- Use
plot_modeltime_forecast()
to visualize the results in interactive and static plot formats.
Accuracy (Testing Set)
Next, calculate the testing accuracy to compare the models.
- Use
modeltime_accuracy()
to generate the out-of-sample accuracy metrics as a tibble.
- Use
table_modeltime_accuracy()
to generate interactive and static
.model_id |
.model_desc |
.type |
mae |
mape |
mase |
smape |
rmse |
rsq |
1 |
ARIMA(0,1,3) WITH DRIFT |
Test |
2540.11 |
474.89 |
2.74 |
46.00 |
3188.09 |
0.39 |
2 |
PROPHET |
Test |
1221.18 |
365.13 |
1.32 |
28.68 |
1764.93 |
0.44 |
3 |
GLMNET |
Test |
1197.06 |
340.57 |
1.29 |
28.44 |
1650.87 |
0.49 |
4 |
RANDOMFOREST |
Test |
1309.79 |
327.88 |
1.42 |
30.24 |
1809.05 |
0.47 |
5 |
PROPHET W/ XGBOOST ERRORS |
Test |
1189.28 |
332.44 |
1.28 |
28.48 |
1644.25 |
0.55 |
Analyze Results
From the accuracy measures and forecast results, we see that:
- Auto ARIMA model is not a good fit for this data.
- The best model is Prophet + XGBoost
Letβs exclude the Auto ARIMA from our final model, then make future forecasts with the remaining models.
Refit and Forecast Forward
Refitting is a best-practice before forecasting the future.
modeltime_refit()
: We re-train on full data (bike_transactions_tbl
)
modeltime_forecast()
: For models that only depend on the βdateβ feature, we can use h
(horizon) to forecast forward. Setting h = "12 months"
forecasts then next 12-months of data.
It gets better
Youβve just scratched the surface, hereβs whatβs comingβ¦
The modeltime
package functionality is much more feature-rich than what weβve covered here (I couldnβt possibly cover everything in this post). π
Hereβs what I didnβt cover:
-
Feature Engineering: The art of time series analysis is feature engineering. Modeltime works with cutting-edge time-series preprocessing tools including those in recipes
and timetk
packages.
-
Hyper Parameter Tuning: ARIMA models and Machine Learning models can be tuned. Thereβs a right and a wrong way (and itβs not the same for both types).
-
Scalability: Training multiple time series groups and automation is a huge need area in organizations. You need to know how to scale your analyses to thousands of time series.
-
Strengths and Weaknesses: Did you know certain machine learning models are better for trend, seasonality, but not both? Why is ARIMA way better for certain datasets? When will Random Forest and XGBoost fail?
-
Deep Learning: Recurrent Neural Networks (RRNs) have been crushing time series competitions. Will they work for business data? How can you implement them?
So how are you ever going to learn time series analysis and forecasting?
Youβre probably thinking:
- Thereβs so much to learn
- My time is precious
- Iβll never learn time series
I have good news that will put those doubts behind you.
You can learn time series analysis and forecasting in hours with my state-of-the-art time series forecasting course. π
High-Performance Time Series Course
Become the times series expert in your organization.
My High-Performance Time Series Forecasting in R course is available now. Youβll learn timetk
and modeltime
plus the most powerful time series forecasting techniques available like GluonTS Deep Learning. Become the times series domain expert in your organization.
π High-Performance Time Series Course.
You will learn:
- Time Series Foundations - Visualization, Preprocessing, Noise Reduction, & Anomaly Detection
- Feature Engineering using lagged variables & external regressors
- Hyperparameter Tuning - For both sequential and non-sequential models
- Time Series Cross-Validation (TSCV)
- Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
- Deep Learning with GluonTS (Competition Winner)
- and more.
Unlock the High-Performance Time Series Course
Project Roadmap, Future Work, and Contributing to Modeltime
Modeltime is a growing ecosystem of packages that work together for forecasting and time series analysis. Here are several useful links:
Have questions about modeltime?
Make a comment in the chat below. π
And, if you plan on using modeltime
for your business, itβs a no-brainer - Join my Time Series Course (itβs really insane).