Demo Week: Tidy Time Series Analysis with tibbletime
Written by Matt Dancho
We’re into the fourth day of Business Science Demo Week. We have a really cool one in store today: tibbletime
, which uses a new tbl_time
class that is time-aware!! For those that may have missed it, every day this week we are demo-ing an R package: tidyquant
(Monday), timetk
(Tuesday), sweep
(Wednesday), tibbletime
(Thursday) and h2o
(Friday)! That’s five packages in five days! We’ll give you intel on what you need to know about these packages to go from zero to hero. Let’s take tibbletime
for a spin!
Demo Week Demos:
Get The Best Resources In Data Science. Every Friday!
Sign up for our free "5 Topic Friday" Newsletter. Every week, I'll send you the five coolest topics in data science for business that I've found that week. These could be new R packages, free books, or just some fun to end the week on.
Sign Up For Five-Topic-Friday!
tibbletime: What’s It Used For?
-
The future of “tidy” time series analysis: New class tbl_time
rests on top of tbl
and makes tibbles time aware.
-
Time Series Functions: Can use a series of “tidy” time series functions designed specifically for tbl_time
objects. Some of them are:
-
time_filter()
: Succinctly filter a tbl_time object by date.
-
time_summarise()
: Similar to dplyr::summarise but with the added benefit of being able to summarise by a time period such as “yearly” or “monthly”.
-
tmap()
: The family of tmap functions transform a tbl_time input by applying a function to each column at a specified time interval.
-
as_period()
: Convert a tbl_time object from daily to monthly, from minute data to hourly, and more. This allows the user to easily aggregate data to a less granular level.
-
time_collapse()
: When time_collapse is used, the index of a tbl_time object is altered so that all dates that fall in a period share a common date.
-
rollify()
: Modify a function so that it calculates a value (or a set of values) at specific time intervals. This can be used for rolling averages and other rolling calculations inside the tidyverse
framework.
-
create_series()
: Use shorthand notation to quickly initialize a tbl_time
object containing a date
column with a regularly spaced time series.
Load Libraries
The tibbletime
package is under active development, and because of this we recommend downloading the package from GitHub using devtools
. You’ll get the latest functionality with all of the features demo-ed in this article.
Once installed, load the following libraries:
tibbletime
: Enables creation of time-aware tibbles. Can use new tbl_time
functions.
tidyquant
: Loads tidyverse
, and is used to get data with tq_get()
.
Data
We’ll download the daily stock prices for the FANG stocks (FB, AMZN, NFLX, GOOG) using tq_get()
.
We setup a function to plot facets by symbol that can be reused throughout this article. For those unfamiliar with the rlang
package and tidyeval
framework, it’s not necessary to understand for this article. Just recognize that we are creating a ggplot2
function that creates plots that are faceted by “symbol” by specifying the data frame, x, y, and group (if present).
We can quickly visualize our data with our plotting function, ggplot_facet_by_symbol
. Let’s have a look at the “adjusted” stock prices by “date”.
Now that we see what data we are dealing with, let’s move onto the tibbletime
demo.
DEMO: tibbletime
We’ll test out the following functions today:
Initialize a Tibble-Time Object
Before we can use these new functions, we need to create a tbl_time
object. The new class operates almost identically to a normal tibble object. However, under the hood it tracks the time information.
Use the as_tbl_time()
function to initialize the object. Specify index = date
, which tells the tbl_time
object which index to track.
We can print the tbl_time
object. Looks almost identical to a grouped tibble. Note that “Index: date” informs us that the”time tibble” is initialized properly.
We can plot it with our plotting function, ggplot_facet_by_symbol()
, and we see the tbl_time
object reacts the same as the tbl
object.
Special Time Series Functions
Let’s see what we can do with the new tbl_time
object.
time_filter
The time_filter()
function is used to succinctly filter a tbl_time
object by date. It uses a function format (e.g. “date_operator_start ~ date_operator_end”). We specify the date operators in normal YYYY-MM-DD + HH:MM:SS, but there is also powerful shorthand to more efficiently subset by date.
Suppose we’d like to filter all observations inclusive of “2014-06-01” and “2014-06-15”. We can do this using the function notation, time_filter(2014-06-01 ~ 2014-06-15)
.
We can do the same by month. Suppose we just want observations in March, 2014. Use the shorthand functional notation “~ 2014-03”.
The tbl_time
object also responds to bracket notation [
. Here we collect all dates in 2014 for each of the groups.
The time_filter()
has a lot of capability and useful shorthand. Those interested should check out the time_filter vignette and the time_filter function documentation.
time_summarise
The time_summarise()
function is similar to dplyr::summarise
but with the added benefit of being able to summarise by a time period such as “yearly” or “monthly”
The really cool thing about time_summarise()
is that we can use the functional notation to define the period to summarize over. For example if we want bimonthly, or every two months, we can use the notation 2 Months: “2~m”. Similarly we could do every 20 days as “20~d”. The summarization options are endless.
Let’s plot the min, max, and median on a Bi-Monthly frequency (2~m) with time_summarise()
. This is really cool!!
Those interested in furthering their understanding of time_summarise()
can check out the time_summarise function documentation.
as_period
The next function, as_period()
, enables changing the period of a tbl_time
object. Two advantages to using this method over traditional approaches:
- The functions are flexible: “yearly” == “y” == “1~y”
-
The functional notation allows for endless periodicity change combinations, for example:
- “15~d” to change to 15-day periodicity
- “2~m” to change to bi-monthly periodicity
- “4~m” to change to tri-annual (semesters or trimesters)
- “6~m” to change to bi-annual
To start off, let’s do a simple monthly periodicity change.
Let’s step it up a notch. What about bi-monthly? Just use the functional notation, “2~m”.
Let’s keep going. What about bi-annually? Just use “6~m”.
The possibilities are endless with the functional notation. Interested learners can check out the vignette on periodicity change with tibbletime.
rollify
The rollify()
function is an adverb (a special type of function in the tidyverse
that modifies another function). What rollify()
does is turn any function into a rolling version of itself.
We can even do more complicated rolling functions such as correlations. We use the functional form .f = ~ fun(.x, .y, ...)
within rollify()
.
We can even return multiple results. For example, we can create a rolling quantile.
First, create a function that returns a tibble of quantiles.
Great, it works. Next, use rollify
to create a rolling version. We set unlist = FALSE
to return a list-column.
Next, apply the rolling quantile function within mutate()
to get a rolling quantile. Make sure you select()
, filter()
and unnest()
to remove unnecessary columns, filter NA
values, and unnest the list-column (“rolling_quantile”). Each date now has five values for each quantile.
Finally, we can plot the results.
Interested learners can continue exploring rollify
by checking out our vignette on rolling functions with rollify.
Changes Coming
This package is currently under active development. Don’t be shocked if the functionality increases soon… Davis Vaughan is working hard to expand the capability of tibbletime
. Reproducible bug reports are welcome!
Next Steps
Interested learners can check out the following links to further understanding of tibbletime
:
Announcements
We have a busy couple of weeks. In addition to Demo Week, we have:
DataTalk
!!TONIGHT!! Thursday, October 26 at 7PM EST, Matt will be giving a FREE LIVE #DataTalk on Machine Learning for Recruitment and Reducing Employee Attrition. You can sign up for a reminder at the Experian Data Lab website.
EARL
On Friday, November 3rd, Matt will be presenting at the EARL Conference on HR Analytics: Using Machine Learning to Predict Employee Turnover.
Courses
Based on recent demand, we are considering offering application-specific machine learning courses for Data Scientists. The content will be business problems similar to our popular articles:
The student will learn from Business Science how to implement cutting edge data science to solve business problems. Please let us know if you are interested. You can leave comments as to what you would like to see at the bottom of the post in Disqus.
About Business Science
Business Science specializes in “ROI-driven data science”. Our focus is machine learning and data science in business applications. We help businesses that seek to add this competitive advantage but may not have the resources currently to implement predictive analytics. Business Science works with clients primarily in small to medium size businesses, guiding these organizations in expanding predictive analytics while executing on ROI generating projects. Visit the Business Science website or contact us to learn more!