Introducing Modeltime Recursive: Tidy Autoregressive Forecasting with Lags
Written by Matt Dancho and Alberto González Almuiña on April 8, 2021
I’m super excited to introduce
modeltime::recursive(), the new autoregressive forecast solution that allows you to convert any
tidymodels regression algorithm into an autoregressive forecasting algorithm. Think of Recursive as a Lag Management Tool.
The new Autoregressive Machine Learning (AR-ML) Forecasting Solution handles lags for one or more time series and was just greatly improved in Modeltime 0.5.0 (just released 🎉). This Tidy Forecasting Tutorial introduces
modeltime::recursive(): our new Tidy Autoregressive Forecast Solution.
- We’ll quickly introduce you to the challenges with Autoregressive Modeling.
- Then, we’ll showcase
modeltime::recursive()in the Tidy Autoregressive Forecast Tutorial.
If you like what you see, I have an Advanced Time Series Course where you will learn the foundations of the growing Modeltime Ecosystem.
Time Series Forecasting Article Guide:
This article is part of a series of software announcements on the Modeltime Forecasting Ecosystem.
Like these articles?
👉 Register to stay in the know 👈
on new cutting-edge R software like
The Problem with Autoregressive Forecasting: Lags make Missing Values
Forecasting with autoregressive features is a challenge. The problem is that Lags make missing values that show up as
This isn’t a new problem. Algorithms like ARIMA have been managing this internally for one time series at a time for decades. But, they’ve only been doing it for one time series at a time forcing us to loop through every time series for prediction. This iterative approach is not scalable with modern machine learning.
The new challenge is how do we manage this for multiple time series? If you have more than one time series, this quickly becomes a forecasting nightmare that will make your head spin. Then multiply this by the number of different modeling algorithms you want to experiment with, and, well, you get the picture…
modeltime::recursive(): A new function that is capable of turning any Tidymodels regression algorithm into an autoregressive forecaster.
It’s a Lag Management Tool that handles the lagged predictions on one or more time series.
Autoregressive forecasting with lag management.
Modeltime 0.5.0 includes a new and improved
modeltime::recursive() function that turns any
tidymodels regression algorithm into an autoregressive forecaster.
Recursive is a new way to manage lagged regressors used in autoregressive forecasting.
✅ Any Tidymodel can become Autoregressive.
Recursive can be used with any regression model that is part of the tidymodels ecosystem (e.g. XGBoost, Random Forest, GLMNET).
✅ Works with Multiple Time Series.
Recursive works on single time series and multiple time series (panel data).
✅ Works with Ensembles.
Recursive can also be used in Ensembles (Recursive Ensembles) with
modeltime.ensemble 0.4.0 (just released, yay! 🎉).
What do you need to do to get Recursive?
Simply upgrade to
modeltime.ensemble. Both were just released to CRAN.
This version of the tutorial uses the “development version” of both packages. We are improving this software a lot as we grow the Modeltime Ecosystem. If you’d like to install the development version with the latest features:
Autoregressive Forecast Tutorial
recursive() with Modeltime Ensemble
Here’s what we’re making:
- A Recursive Ensemble with
- That uses two sub-models: 40% GLMNET and 60% XGBOOST
- With Lags 1-24 as the main features using
modeltime::recursive()to manage the process
- We will forecast 24 months (2-years) using lags < forecast horizon
Get the Cheat Sheet
As you go through this tutorial, you will see many references to the Ultimate R Cheat Sheet. The Ultimate R Cheatsheet covers the Modeltime Forecasting Ecosystem with links to key documentation. You can download the Ultimate R Cheat Sheet for free.
We’ll be focusing on three key packages:
modeltime.ensemble. Links to the documentation are included in the cheat sheet (every package has a hyperlink, and some even have “CS” links to their cheat sheets).
80/20 Recursive Terminology
Things you’ll want to be aware of as we go through this tutorial
This is an Autoregressive Forecast. We are using short-term lags (Lags < Forecast Horizon). These short-term lags are the key features of the model. They are powerful predictors, but they create missing values (
NA) in the future data. We use
modeltime::recursive() to manage predictions, updating lags.
We are processing Multi-Time Series using a single model. The model processes in batches (panels) that are separated by an ID feature. This is a scalable approach to modeling many time series.
The model will predict (forecast) iteratively in batches (1 time stamp x 4 time series = 4 predictions) per loop.
The iteration continues until all 24 future time stamps have been predicted.
This process is highly scalable. The loop size is determined by the forecast horizon, and not the number of time series. So if you have 1000 time series, but your forecast horizon is only 24 months, the recursive prediction loop is only 24 iterations.
During this iterative process, a transformer function is used to create lagged values. We are responsible for defining the transformer function, but we have a lot of tools in
timetk that help us create the Transformer Function:
- You’ll see
- There is also
First, we need to load the necessary libraries:
We’ll use the
m4_monthly dataset, which has four monthly time series:
- This is a single data frame
- That contains 4 time series
- Each time series is identified with an “id”
- The date and value columns specify the timestamp data and the target (feature we are predicting)
We can get a visual using
timetk::plot_time_series(). Refer to the Ultimate R Cheat Sheet for documentation under time series visualization.
We can get a sense of the structure of the data.
- The “id” feature separates the panels.
- The “date” feature contains the timestamp information
- The “value” feature is our target for prediction (forecasting)
First, we select a forecast horizon of 24 days and extend the data frame with the function
future_frame() that comes from the
timetk package (Refer to the Ultimate R Cheat Sheet).
We do this to create a future dataset, which we can distinguish because its values will be
The data has been extended by 24 x 4 = 96 rows.
Then we create a Transformer Function that will be in charge of generating the lags for each time series up to each forecasting horizon. Note that this time we use grouped lags to generate lags by group. This is important when we have multiple time series. Make sure to ungroup after the lagging process.
Then, we apply the function and divide the data into training and future set. Note that the tail of the data has
NA values in the lagged regressors, which makes the problem a Recursive Forecasting problem.
We split into training data and future data.
- The train data is prepared for training.
- The future data will be used later when we forecast.
Training the Submodels
Next, we are going to create two models that we will then join into an ensemble.
The first model is an Elastic Net (GLMNET) model: An elastic net applies is an improved version of linear regression that applies a penalty to the lagged regressors preventing bad lags from dominating the results. This can show an improvement versus a standard Linear Regression.
The second model is an XGBOOST model: An xgboost model is a tree-based algorithm that is very different in how it models vs a linear model. It’s much better for non-linear data (e.g. seasonality).
Create a Recursive Ensemble
The next step is to create an ensemble with
modeltime.ensemble (Refer to the Ultimate R Cheat Sheet).
We’ll use a Weighted Ensemble
ensemble_weighted() with a 40/60 loading (GLMNET-to-XGBOOST).
Right after that we use the
recursive() function to create the recursive model:
transform: The transform function gets passed to
recursive, which tells the predictions how to generate the lagged features
train_tail: We have to use the
panel_tail()function to create the train_tail by group.
id: This indicates how the time series panels are grouped within the incoming dataset.
Next, we add the recursive ensemble to the
modeltime_table(), which organizes one or more models prior to forecasting. Refer to the Ultimate R Cheat Sheet for the full Modeltime Documentation with Workflow.
Forecast the Ensemble
Finally, we forecast over our dataset and visualize the forecast by following the Modeltime Workflow.
modeltime_forecast()to make the forecast
plot_modeltime_forecast()to visualize the predictions
It gets better
You’ve just scratched the surface, here’s what’s coming…
The Modeltime Ecosystem functionality is much more feature-rich than what we’ve covered here (I couldn’t possibly cover everything in this post). 😀
Here’s what I didn’t cover:
Feature Engineering: We can make this forecast much more accurate by including features from competition-winning strategies
Hyperparameter Tuning for Time Series: We can vastly improve our models by tuning them, but you need to understand how to tune for time series.
Deep Learning: We can use GluonTS Deep Learning for developing high-performance, scalable forecasts.
So how are you ever going to learn time series analysis and forecasting?
You’re probably thinking:
- There’s so much to learn
- My time is precious
- I’ll never learn time series
I have good news that will put those doubts behind you.
You can learn time series analysis and forecasting in hours with my state-of-the-art time series forecasting course. 👇
High-Performance Time Series Course
Become the times series expert in your organization.
My High-Performance Time Series Forecasting in R course is available now. You’ll learn
modeltime plus the most powerful time series forecasting techniques available like GluonTS Deep Learning. Become the times series domain expert in your organization.
You will learn:
- Time Series Foundations - Visualization, Preprocessing, Noise Reduction, & Anomaly Detection
- Feature Engineering using lagged variables & external regressors
- Hyperparameter Tuning - For both sequential and non-sequential models
- Time Series Cross-Validation (TSCV)
- Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
- Deep Learning with GluonTS (Competition Winner)
- and more.
Project Roadmap, Future Work, and Contributing to Modeltime
Modeltime is a growing ecosystem of packages that work together for forecasting and time series analysis. Here are several useful links:
Modeltime Ecosystem Roadmap on GitHub - See the past development and future trajectory. Did we miss something? Make a suggestion.
Business Science data science blog - I announce all Modeltime Software happenings
I’d like to acknowledge several Business Science University students that are part of the BSU Modeltime Dev Team that have helped develop
modeltime::recursive(). There efforts are truly appreciated.
- Alberto González Almuiña has helped BIG TIME with development of
modeltime::recursive()contributing the panel forecast software design.
Have questions about Modeltime Recursive?
Make a comment in the chat below. 👇
And, if you plan on using
modeltime for your business, it’s a no-brainer - Join my Time Series Course.