Time Series in 5-Minutes, Part 4: Seasonality
Written by Matt Dancho
Have 5-minutes? Then let’s learn time series. In this short articles series, I highlight how you can get up to speed quickly on important aspects of time series analysis. Today we are focusing on seasonality in time series data.
This article has been updated. View the updated Time Series in 5-Minutes article at Business Science.
Time Series in 5-Mintues
Articles in this Series
I just released
timetk 2.0.0 (read the release announcement). A ton of new functionality has been added. We’ll discuss some of the key pieces in this article series:
👉 Register for our blog to get new articles as we release them.
Then let’s learn Time Series Seasonality
A collection of tools for working with time series in R
Time series data wrangling is an essential skill for any forecaster.
timetk includes the essential data wrangling tools. In this tutorial we’ll learn to analyze seasonality within time series data.
Seasonality is the presence of variations that occur at specific regular intervals, such as weekly, monthly, or quarterly. Seasonality can be caused by factors, such as weather or holiday, and consists of periodic and repetitive patterns in a time series.
This tutorial focuses on 3 new functions for visualizing time series diagnostics:
- ACF Diagnostics:
- Seasonality Diagnostics:
- STL Diagnostics:
Let’s Get Started
plot_acf_diagnostics() returns the ACF and PACF of a target and optionally CCF’s of one or more lagged predictors in interactive plotly plots. We also scale to multiple time series using
- ACF = Autocorrelation between a target variable and lagged versions of itself.
- PACF = Partial Autocorrelation removes the dependence of lags on other lags highlighting key seasonalities.
- CCF = Shows how lagged predictors can be used for prediction of a target variable.
.lags) can either be specified as:
- A time-based phrase indicating a duraction (e.g. 2 months)
- A maximum lag (e.g. .lags = 28)
- A sequence of lags (e.g. .lags = 7:28)
Scales to Multiple Time Series with Groups
plot_acf_diagnostics() works with grouped_df’s, meaning you can group your time series by one or more categorical columns with
dplyr::group_by() and then apply
plot_acf_diagnostics() to return group-wise lag diagnostics.
Special Note on Groups
Unlike other plotting utilities, the .facet_vars arguments is NOT included. Use
dplyr::group_by() for processing multiple time series groups.
Calculating the White Noise Significance Bars
The formula for the significance bars is +2/sqrt(T) and -2/sqrt(T) where T is the length of the time series. For a white noise time series, 95% of the data points should fall within this range. Those that don’t may be significant autocorrelations.
Grouped ACF Diagnostics
Grouped CCF Plots
plot_seasonal_diagnostics() is an interactive and scalable function for visualizing time series seasonality.
Automatic Feature Selection
Internal calculations are performed to detect a sub-range of features to include using the following logic:
Example: Hourly timestamp data that lasts more than 2 weeks will have the following features: “hour”, “wday.lbl”, and “week”.
Scalable with Grouped Data Frames
This function respects grouped data.frame and tibbles that were made with
For grouped data, the automatic feature selection returned is a collection of all features within the sub-groups. This means extra features are returned even though they may be meaningless for some of the groups.
.value parameter respects transformations (e.g. .value = log(sales))
Grouped Seasonal Visualizations
plot_stl_diagnostics() function generates a Seasonal-Trend-Loess decomposition. The function is “tidy” in the sense that it works on data frames and is designed to work with dplyr groups.
The STL method implements time series decomposition using the underlying
stats::stl(). The decomposition separates the “season” and “trend” components from the “observed” values leaving the “remainder”.
Frequency & Trend Selection
The user can control two parameters:
.frequency parameter adjusts the “season” component that is removed from the “observed” values.
.trend parameter adjusts the trend window (
t.window parameter from
stl()) that is used.
The user may supply both
.trend as time-based durations (e.g. “6 weeks”) or numeric values (e.g. 180) or “auto”, which automatically selects the frequency and/or trend based on the scale of the time series.
Have questions on using Timetk for time series?
Make a comment in the chat below. 👇
And, if you plan on using
timetk for your business, it’s a no-brainer - Join the Time Series Course.