# Demo Week: Tidy Time Series Analysis with tibbletime

Written by Matt Dancho on October 26, 2017

We’re into the fourth day of Business Science Demo Week. We have a really cool one in store today: tibbletime, which uses a new tbl_time class that is time-aware!! For those that may have missed it, every day this week we are demo-ing an R package: tidyquant (Monday), timetk (Tuesday), sweep (Wednesday), tibbletime (Thursday) and h2o (Friday)! That’s five packages in five days! We’ll give you intel on what you need to know about these packages to go from zero to hero. Let’s take tibbletime for a spin!

#### Demo Week Demos:

Sign up for our free "5 Topic Friday" Newsletter. Every week, I'll send you the five coolest topics in data science for business that I've found that week. These could be new R packages, free books, or just some fun to end the week on.

# tibbletime: What’s It Used For?

1. The future of “tidy” time series analysis: New class tbl_time rests on top of tbl and makes tibbles time aware.

2. Time Series Functions: Can use a series of “tidy” time series functions designed specifically for tbl_time objects. Some of them are:

• time_filter(): Succinctly filter a tbl_time object by date.

• time_summarise(): Similar to dplyr::summarise but with the added benefit of being able to summarise by a time period such as “yearly” or “monthly”.

• tmap(): The family of tmap functions transform a tbl_time input by applying a function to each column at a specified time interval.

• as_period(): Convert a tbl_time object from daily to monthly, from minute data to hourly, and more. This allows the user to easily aggregate data to a less granular level.

• time_collapse(): When time_collapse is used, the index of a tbl_time object is altered so that all dates that fall in a period share a common date.

• rollify(): Modify a function so that it calculates a value (or a set of values) at specific time intervals. This can be used for rolling averages and other rolling calculations inside the tidyverse framework.

• create_series(): Use shorthand notation to quickly initialize a tbl_time object containing a date column with a regularly spaced time series.

The tibbletime package is under active development, and because of this we recommend downloading the package from GitHub using devtools. You’ll get the latest functionality with all of the features demo-ed in this article.

Once installed, load the following libraries:

• tibbletime: Enables creation of time-aware tibbles. Can use new tbl_time functions.
• tidyquant: Loads tidyverse, and is used to get data with tq_get().

# Data

We’ll download the daily stock prices for the FANG stocks (FB, AMZN, NFLX, GOOG) using tq_get().

We setup a function to plot facets by symbol that can be reused throughout this article. For those unfamiliar with the rlang package and tidyeval framework, it’s not necessary to understand for this article. Just recognize that we are creating a ggplot2 function that creates plots that are faceted by “symbol” by specifying the data frame, x, y, and group (if present).

We can quickly visualize our data with our plotting function, ggplot_facet_by_symbol. Let’s have a look at the “adjusted” stock prices by “date”.

Now that we see what data we are dealing with, let’s move onto the tibbletime demo.

# DEMO: tibbletime

We’ll test out the following functions today:

## Initialize a Tibble-Time Object

Before we can use these new functions, we need to create a tbl_time object. The new class operates almost identically to a normal tibble object. However, under the hood it tracks the time information.

Use the as_tbl_time() function to initialize the object. Specify index = date, which tells the tbl_time object which index to track.

We can print the tbl_time object. Looks almost identical to a grouped tibble. Note that “Index: date” informs us that the”time tibble” is initialized properly.

We can plot it with our plotting function, ggplot_facet_by_symbol(), and we see the tbl_time object reacts the same as the tbl object.

## Special Time Series Functions

Let’s see what we can do with the new tbl_time object.

#### time_filter

The time_filter() function is used to succinctly filter a tbl_time object by date. It uses a function format (e.g. “date_operator_start ~ date_operator_end”). We specify the date operators in normal YYYY-MM-DD + HH:MM:SS, but there is also powerful shorthand to more efficiently subset by date.

Suppose we’d like to filter all observations inclusive of “2014-06-01” and “2014-06-15”. We can do this using the function notation, time_filter(2014-06-01 ~ 2014-06-15).

We can do the same by month. Suppose we just want observations in March, 2014. Use the shorthand functional notation “~ 2014-03”.

The tbl_time object also responds to bracket notation [. Here we collect all dates in 2014 for each of the groups.

The time_filter() has a lot of capability and useful shorthand. Those interested should check out the time_filter vignette and the time_filter function documentation.

#### time_summarise

The time_summarise() function is similar to dplyr::summarise but with the added benefit of being able to summarise by a time period such as “yearly” or “monthly”

The really cool thing about time_summarise() is that we can use the functional notation to define the period to summarize over. For example if we want bimonthly, or every two months, we can use the notation 2 Months: “2~m”. Similarly we could do every 20 days as “20~d”. The summarization options are endless.

Let’s plot the min, max, and median on a Bi-Monthly frequency (2~m) with time_summarise(). This is really cool!!

Those interested in furthering their understanding of time_summarise() can check out the time_summarise function documentation.

#### as_period

The next function, as_period(), enables changing the period of a tbl_time object. Two advantages to using this method over traditional approaches:

1. The functions are flexible: “yearly” == “y” == “1~y”
2. The functional notation allows for endless periodicity change combinations, for example:

• “15~d” to change to 15-day periodicity
• “2~m” to change to bi-monthly periodicity
• “4~m” to change to tri-annual (semesters or trimesters)
• “6~m” to change to bi-annual

To start off, let’s do a simple monthly periodicity change.

Let’s step it up a notch. What about bi-monthly? Just use the functional notation, “2~m”.

Let’s keep going. What about bi-annually? Just use “6~m”.

The possibilities are endless with the functional notation. Interested learners can check out the vignette on periodicity change with tibbletime.

#### rollify

The rollify() function is an adverb (a special type of function in the tidyverse that modifies another function). What rollify() does is turn any function into a rolling version of itself.

We can even do more complicated rolling functions such as correlations. We use the functional form .f = ~ fun(.x, .y, ...) within rollify().

We can even return multiple results. For example, we can create a rolling quantile.

First, create a function that returns a tibble of quantiles.

Great, it works. Next, use rollify to create a rolling version. We set unlist = FALSE to return a list-column.

Next, apply the rolling quantile function within mutate() to get a rolling quantile. Make sure you select(), filter() and unnest() to remove unnecessary columns, filter NA values, and unnest the list-column (“rolling_quantile”). Each date now has five values for each quantile.

Finally, we can plot the results.

Interested learners can continue exploring rollify by checking out our vignette on rolling functions with rollify.

## Changes Coming

This package is currently under active development. Don’t be shocked if the functionality increases soon… Davis Vaughan is working hard to expand the capability of tibbletime. Reproducible bug reports are welcome!

# Next Steps

Interested learners can check out the following links to further understanding of tibbletime:

# Announcements

We have a busy couple of weeks. In addition to Demo Week, we have:

#### DataTalk

!!TONIGHT!! Thursday, October 26 at 7PM EST, Matt will be giving a FREE LIVE #DataTalk on Machine Learning for Recruitment and Reducing Employee Attrition. You can sign up for a reminder at the Experian Data Lab website.

#### EARL

On Friday, November 3rd, Matt will be presenting at the EARL Conference on HR Analytics: Using Machine Learning to Predict Employee Turnover.

#### Courses

Based on recent demand, we are considering offering application-specific machine learning courses for Data Scientists. The content will be business problems similar to our popular articles:

The student will learn from Business Science how to implement cutting edge data science to solve business problems. Please let us know if you are interested. You can leave comments as to what you would like to see at the bottom of the post in Disqus.