Demo Week: Tidy Time Series Analysis with tibbletime
Written by Matt Dancho on October 26, 2017
We’re into the fourth day of Business Science Demo Week. We have a really cool one in store today:
tibbletime, which uses a new
tbl_time class that is time-aware!! For those that may have missed it, every day this week we are demo-ing an R package:
tibbletime (Thursday) and
h2o (Friday)! That’s five packages in five days! We’ll give you intel on what you need to know about these packages to go from zero to hero. Let’s take
tibbletime for a spin!
Demo Week Demos:
- class(Monday) <- tidyquant
- class(Tuesday) <- timetk
- class(Wednesday) <- sweep
- class(Thursday) <- tibbletime
- class(Friday) <- h2o + timetk
Get The Best Resources In Data Science. Every Friday!
Sign up for our free "5 Topic Friday" Newsletter. Every week, I'll send you the five coolest topics in data science for business that I've found that week. These could be new R packages, free books, or just some fun to end the week on.
tibbletime: What’s It Used For?
The future of “tidy” time series analysis: New class
tbl_timerests on top of
tbland makes tibbles time aware.
Time Series Functions: Can use a series of “tidy” time series functions designed specifically for
tbl_timeobjects. Some of them are:
time_filter(): Succinctly filter a tbl_time object by date.
time_summarise(): Similar to dplyr::summarise but with the added benefit of being able to summarise by a time period such as “yearly” or “monthly”.
tmap(): The family of tmap functions transform a tbl_time input by applying a function to each column at a specified time interval.
as_period(): Convert a tbl_time object from daily to monthly, from minute data to hourly, and more. This allows the user to easily aggregate data to a less granular level.
time_collapse(): When time_collapse is used, the index of a tbl_time object is altered so that all dates that fall in a period share a common date.
rollify(): Modify a function so that it calculates a value (or a set of values) at specific time intervals. This can be used for rolling averages and other rolling calculations inside the
create_series(): Use shorthand notation to quickly initialize a
tbl_timeobject containing a
datecolumn with a regularly spaced time series.
tibbletime package is under active development, and because of this we recommend downloading the package from GitHub using
devtools. You’ll get the latest functionality with all of the features demo-ed in this article.
Once installed, load the following libraries:
tibbletime: Enables creation of time-aware tibbles. Can use new
tidyverse, and is used to get data with
We’ll download the daily stock prices for the FANG stocks (FB, AMZN, NFLX, GOOG) using
We setup a function to plot facets by symbol that can be reused throughout this article. For those unfamiliar with the
rlang package and
tidyeval framework, it’s not necessary to understand for this article. Just recognize that we are creating a
ggplot2 function that creates plots that are faceted by “symbol” by specifying the data frame, x, y, and group (if present).
We can quickly visualize our data with our plotting function,
ggplot_facet_by_symbol. Let’s have a look at the “adjusted” stock prices by “date”.
Now that we see what data we are dealing with, let’s move onto the
We’ll test out the following functions today:
time_filter: Tidy Time Filtering
time_summarise: Tidy Time-Based Summarization
as_period: Flexible Periodicity Change
rollify: Turn Any Function Into A Rolling Function
Initialize a Tibble-Time Object
Before we can use these new functions, we need to create a
tbl_time object. The new class operates almost identically to a normal tibble object. However, under the hood it tracks the time information.
as_tbl_time() function to initialize the object. Specify
index = date, which tells the
tbl_time object which index to track.
We can print the
tbl_time object. Looks almost identical to a grouped tibble. Note that “Index: date” informs us that the”time tibble” is initialized properly.
We can plot it with our plotting function,
ggplot_facet_by_symbol(), and we see the
tbl_time object reacts the same as the
Special Time Series Functions
Let’s see what we can do with the new
time_filter() function is used to succinctly filter a
tbl_time object by date. It uses a function format (e.g. “date_operator_start ~ date_operator_end”). We specify the date operators in normal YYYY-MM-DD + HH:MM:SS, but there is also powerful shorthand to more efficiently subset by date.
Suppose we’d like to filter all observations inclusive of “2014-06-01” and “2014-06-15”. We can do this using the function notation,
time_filter(2014-06-01 ~ 2014-06-15).
We can do the same by month. Suppose we just want observations in March, 2014. Use the shorthand functional notation “~ 2014-03”.
tbl_time object also responds to bracket notation
[. Here we collect all dates in 2014 for each of the groups.
time_summarise() function is similar to
dplyr::summarise but with the added benefit of being able to summarise by a time period such as “yearly” or “monthly”
The really cool thing about
time_summarise() is that we can use the functional notation to define the period to summarize over. For example if we want bimonthly, or every two months, we can use the notation 2 Months: “2~m”. Similarly we could do every 20 days as “20~d”. The summarization options are endless.
Let’s plot the min, max, and median on a Bi-Monthly frequency (2~m) with
time_summarise(). This is really cool!!
Those interested in furthering their understanding of
time_summarise() can check out the time_summarise function documentation.
The next function,
as_period(), enables changing the period of a
tbl_time object. Two advantages to using this method over traditional approaches:
- The functions are flexible: “yearly” == “y” == “1~y”
The functional notation allows for endless periodicity change combinations, for example:
- “15~d” to change to 15-day periodicity
- “2~m” to change to bi-monthly periodicity
- “4~m” to change to tri-annual (semesters or trimesters)
- “6~m” to change to bi-annual
To start off, let’s do a simple monthly periodicity change.
Let’s step it up a notch. What about bi-monthly? Just use the functional notation, “2~m”.
Let’s keep going. What about bi-annually? Just use “6~m”.
The possibilities are endless with the functional notation. Interested learners can check out the vignette on periodicity change with tibbletime.
rollify() function is an adverb (a special type of function in the
tidyverse that modifies another function). What
rollify() does is turn any function into a rolling version of itself.
We can even do more complicated rolling functions such as correlations. We use the functional form
.f = ~ fun(.x, .y, ...) within
We can even return multiple results. For example, we can create a rolling quantile.
First, create a function that returns a tibble of quantiles.
Great, it works. Next, use
rollify to create a rolling version. We set
unlist = FALSE to return a list-column.
Next, apply the rolling quantile function within
mutate() to get a rolling quantile. Make sure you
unnest() to remove unnecessary columns, filter
NA values, and unnest the list-column (“rolling_quantile”). Each date now has five values for each quantile.
Finally, we can plot the results.
Interested learners can continue exploring
rollify by checking out our vignette on rolling functions with rollify.
This package is currently under active development. Don’t be shocked if the functionality increases soon… Davis Vaughan is working hard to expand the capability of
tibbletime. Reproducible bug reports are welcome!
Interested learners can check out the following links to further understanding of
- Business Science Software Website
- tibbletime documentation
- tibbletime GitHub Page
- Business Science Insights Blog
We have a busy couple of weeks. In addition to Demo Week, we have:
!!TONIGHT!! Thursday, October 26 at 7PM EST, Matt will be giving a FREE LIVE #DataTalk on Machine Learning for Recruitment and Reducing Employee Attrition. You can sign up for a reminder at the Experian Data Lab website.
On Friday, November 3rd, Matt will be presenting at the EARL Conference on HR Analytics: Using Machine Learning to Predict Employee Turnover.
Based on recent demand, we are considering offering application-specific machine learning courses for Data Scientists. The content will be business problems similar to our popular articles:
The student will learn from Business Science how to implement cutting edge data science to solve business problems. Please let us know if you are interested. You can leave comments as to what you would like to see at the bottom of the post in Disqus.
Business Science specializes in “ROI-driven data science”. Our focus is machine learning and data science in business applications. We help businesses that seek to add this competitive advantage but may not have the resources currently to implement predictive analytics. Business Science works with clients primarily in small to medium size businesses, guiding these organizations in expanding predictive analytics while executing on ROI generating projects. Visit the Business Science website or contact us to learn more!