Tidy Time Series Analysis, Part 2: Rolling Functions
Written by Matt Dancho on July 23, 2017
In the second part in a series on Tidy Time Series Analysis, we’ll again use tidyquant
to investigate CRAN downloads this time focusing on Rolling Functions. If you haven’t checked out the previous post on period apply functions, you may want to review it to get up to speed. Both zoo
and TTR
have a number of “roll” and “run” functions, respectively, that are integrated with tidyquant
. In this post, we’ll focus on the rollapply
function from zoo
because of its flexibility with applying custom functions across rolling windows. If you like what you read, please follow us on social media to stay up on the latest Business Science news, events and information! As always, we are interested in both expanding our network of data scientists and seeking new clients interested in applying data science to business and finance.
Part of a 4 part series:
 Part 1: Tidy Period Apply
 Part 2: Tidy Rolling Functions
 Part 3: Tidy Rolling Correlations
 Part 4: Lags and Autocorrelations
An example of the visualization we can create using the rollapply
function with tq_mutate()
:
Libraries Needed
We’ll primarily be using two libraries today.
CRAN tidyverse Downloads
We’ll be using the same “tidyverse” dataset as the last post. The script below gets the package downloads for the first half of 2017. The data is very noisy, meaning it’s difficult to identify trends. We’ll see how rolling functions can help shortly.
Rolling Window Calculations
What are rolling window calculations, and why do we care? In time series analysis, nothing is static. A correlation may exist for a subset of time or an average may vary from one day to the next. Rolling calculations simply apply functions to a fixed width subset of this data (aka a window), indexing one observation each calculation. There are a few common reasons you may want to use a rolling calculation in time series analysis:
 Measuring the central tendency over time (
mean
,median
)  Measuring the volatility over time (
sd
,var
)  Detecting changes in trend (fast vs slow moving averages)
 Measuring a relationship between two time series over time (
cor
,cov
)
The most common example of a rolling window calculation is a moving average. Here’s a nice illustration of a 3month rolling window calculation from Chandoo.org.
A moving average allows us to visualize how an average changes over time, which is very useful in cutting through the noise to detect a trend in a time series dataset. Further, by varying the window (the number of observations included in the rolling calculation), we can vary the sensitivity of the window calculation. This is useful in comparing fast and slow moving averages (shown later).
Combining a rolling mean with a rolling standard deviation can help detect regions of abnormal volatility and consolidation. This is the concept behind Bollinger Bands in the financial industry. The bands can be useful in detecting breakouts in trend for many time series, not just financial.
Time Series Functions
The xts
, zoo
, and TTR
packages have some great functions that enable working with time series. Today, we’ll take a look at the Rolling or Running Functions from the zoo
and TTR
packages. The roll apply functions are helper functions that enable the application of other functions across a rolling window. What “other functions” can be supplied? Any function that returns a numeric vector such as scalars (mean
, median
, sd
, min
, max
, etc) or vectors (quantile
, summary
, and custom functions). The rolling (or running) functions are in the format roll[apply or fun name]
for zoo
or run[Fun]
for TTR
. You can see which functions are integrated into tidyquant
package below:
We’ll investigate the rollapply
function from the zoo
package because it allows us to use custom functions that we create!
Tidy Implementation of Time Series Functions
We’ll be using the tq_mutate()
function to apply time series functions in a “tidy” way. The tq_mutate()
function always adds columns to the existing data frame (rather than returning a new data frame like tq_transmute()
). It’s well suited for tasks that result in columnwise dimension changes (not rowwise such as periodicity changes, use tq_transmute
for those!). It comes with a bunch of integrated financial and time series package integrations. We can see which apply functions will work by investigating the list of available functions returned by tq_mutate_fun_options()
.
Tidy Application of Rolling Functions
As we saw in the tidyverse daily download graph above, it can be difficult to understand changes in trends just by visualizing the data. We can use rolling functions to better understand how trends are changing over time.
Rolling Mean: Inspecting Fast and Slow Moving Averages
Suppose we’d like to investigate if significant changes in trend are taking place among the package downloads such that future downloads are likely to continue to increase, decrease or stay the same. One way to do this is to use moving averages. Rather than try to sift through the noise, we can use a combination of a fast and slow moving average to detect momentum.
We’ll create a fast moving average with width = 28
days (just enough to detrend the data) and a slow moving average with width = 84
days (slow window = 3X fast window). To do this we apply two calls to tq_mutate()
, the first for the 28 day (fast) and the second for the 84 day (slow) moving average. There are three groups of arguments we need to supply:
tq_mutate
args: Theseselect
the column to apply the mutation to (“count”) and the mutation function (mutate_fun
) to apply (rollapply
fromzoo
).rollapply
args: These set thewidth
,align = "right"
(aligns with end of data frame), and theFUN
we wish to apply (mean
in this case).FUN
args: These are arguments that get passed to the function. In this case we want to setna.rm = TRUE
soNA
values are skipped if present.
I add an additional tq_mutate
arg, col_rename
, at the end to rename the column. This is my preference, but it can be placed with the other tq_mutate
args above.
The output is a little difficult to see. We’ll need to zoom in a little more to detect momentum. Let’s drop the “count” data from the plots and inspect just the moving averages. What we are looking for are points where the fast trend is above (has momentum) or below (is slowing) the slow trend. In addition, we want to inspect for crossover, which indicates shifts in trend.
We can see that several packages have strong upward momentum (purrr
and lubridate
). Others such as dplyr
, knitr
and tidyr
seem to be cycling in a range. Others such as ggplot2
and stringr
have short term downward trends (keep in mind these packages are getting the most downloads of the bunch). The last point is this is only a six month window of data. The long term trends may be much different than short term, but we’ll leave that for another day.
Rolling Custom Functions: Useful for multiple statistics
You may find in your analytic endeavors that you want more than one statistic. Well you’re in luck with custom functions! In this example, we’ll create a custom function, custom_stat_fun_2()
, that returns four statistics:
 mean
 standard deviation
 95% confidence interval (mean +/ 2SD)
The custom function can then be applied in the same way that mean
was applied.
Now for the fun part: performing the “tidy” rollapply. Let’s apply the custom_stat_fun_2()
to groups using tq_mutate()
and the rolling function rollapply()
. The process is almost identical to the process of applying mean()
with the main exception that we need to set by.column = FALSE
to prevent a “length of dimnames [2]” error. The output returned is a “tidy” data frame with each statistic in its own column.
We now have the data needed to visualize the rolling average (trend) and the 95% confidence bands (volatility). If you’re familiar with finance, this is actually the concept of the Bollinger Bands. While we’re not trading stocks here, we can see some similarities. We can see periods of consolidation and periods of high variability. Many of the high variability periods are when the package downloads are rapidly increasing. For example, lubridate
, purrr
and tidyquant
all had spikes in downloads causing the 95% Confidence Interval (CI) bands to widen.
Conclusions
The rollapply functions from zoo
and TTR
can be used to apply rolling window calculations. The tq_mutate()
function from tidyquant
enables efficient and “tidy” application of the functions. We were able to use the rollapply
functions to visualize averages and standard deviations on a rolling basis, which gave us a better perspective of the dynamic trends. Using custom functions, we are unlimited to the statistics we can apply to rolling windows. In fact, rolling correlations, regressions, and more complicated statistics can be applied, which will be the subject of the next posts. Stay tuned! ;)
Business Science University
Enjoy data science for business? We do too. This is why we created Business Science University where we teach you how to do Data Science For Busines (#DS4B) just like us!
Our first DS4B course (HR 201) is now available!
Who is this course for?
Anyone that is interested in applying data science in a business context (we call this DS4B). All you need is basic R
, dplyr
, and ggplot2
experience. If you understood this article, you are qualified.
What do you get it out of it?
You learn everything you need to know about how to apply data science in a business context:

Using ROIdriven data science taught from consulting experience!

Solve highimpact problems (e.g. $15M Employee Attrition Problem)

Use advanced, bleedingedge machine learning algorithms (e.g. H2O, LIME)

Apply systematic data science frameworks (e.g. Business Science Problem Framework)
“If you’ve been looking for a program like this, I’m happy to say it’s finally here! This is what I needed when I first began data science years ago. It’s why I created Business Science University.”
Matt Dancho, Founder of Business Science
DS4B Virtual Workshop: Predicting Employee Attrition
Did you know that an organization that loses 200 high performing employees per year is essentially losing $15M/year in lost productivity? Many organizations don’t realize this because it’s an indirect cost. It goes unnoticed. What if you could use data science to predict and explain turnover in a way that managers could make better decisions and executives would see results? You will learn the tools to do so in our Virtual Workshop. Here’s an example of a Shiny app you will create.
Shiny App That Predicts Attrition and Recommends Management Strategies, Taught in HR 301
Our first Data Science For Business (HR 201) Virtual Workshop teaches you how to solve this employee attrition problem in four courses that are fully integrated:
 HR 201: Predicting Employee Attrition with
h2o
andlime
 HR 301: Building A
Shiny
Web Application  HR 302: Data Story Telling With
RMarkdown
Reports and Presentations  HR 303: Building An R Package For Your Organization,
tidyattrition
The Virtual Workshop is intended for intermediate and advanced R users. It’s code intensive (like these articles), but also teaches you fundamentals of data science consulting including CRISPDM and the Business Science Problem Framework. The content bridges the gap between data science and the business, making you even more effective and improving your organization in the process.
Interested? Enroll in Business Science University today!
Follow Business Science on Social Media
 Connect with @bizScienc on twitter!
 Like us on Facebook!!!
 Follow us on LinkedIn!
 Sign up for our insights blog to stay updated!
 If you like our software, star our GitHub packages :)