# It's tibbletime v0.0.2: Time-Aware Tibbles, New Functions, Weather Analysis and More

Written by Davis Vaughan on October 8, 2017

Today we are introducing tibbletime v0.0.2, and we’ve got a ton of new features in store for you. We have functions for converting to flexible time periods with the ~period formula~ and making/calculating custom rolling functions with rollify() (plus a bunch more new functionality!). We’ll take the new functionality for a spin with some weather data (from the weatherData package). However, the new tools make tibbletime useful in a number of broad applications such as forecasting, financial analysis, business analysis and more! We truly view tibbletime as the next phase of time series analysis in the tidyverse. If you like what we do, please connect with us on social media to stay up on the latest Business Science news, events and information!

## Introduction

We are excited to announce the release of tibbletime v0.0.2 on CRAN. Loads of new functionality have been added, including:

• Generic period support: Perform time-based calculations by a number of supported periods using a new ~period formula~.

• Creating series: Use create_series() to quickly create a tbl_time object initialized with a regular time series.

• Rolling calculations: Turn any function into a rolling version of itself with rollify().

• A number of smaller tweaks and helper functions to make life easier.

As we further develop tibbletime, it is becoming clearer that the package is a tool that should be used in addition to the rest of the tidyverse. The combination of the two makes time series analysis in the tidyverse much easier to do!

## In this post

Today we will take a look at weather data for New York and San Francisco from 2013. It will be an exploratory analysis to show off some of the new features in tibbletime, but the package itself has much broader application. As we will see, tibbletime’s time-based functionality can be a valuable data manipulation tool to help with:

• Product and sales forecasting

• Financial analysis with custom rolling functions

• Grouping data into time buckets to analyze change over time, which is great for any part of an organization including sales, marketing, manufacturing, and HR!

### Data and packages

The datasets used are from a neat package called weatherData. While weatherData has functionality to pull weather data for a number of cities, we will use the built-in datasets. We encourage you to explore the weatherData API if you’re interested in collecting weather data.

To get started, load the following packages:

• tibbletime: Time-aware tibbles for the tidyverse
• tidyverse: Loads packages including dplyr, tidyr, purrr, and ggplot
• corrr: Tidy correlations
• weatherData: Slick package for getting weather data

Also, load the datasets from weatherData, “NewYork2013” and “SFO2013”.

### Combine and convert

To tidy up, we first join our data sets together using bind_rows(). Passing a named list of tibbles along with specifying the .id argument allows bind_rows() to create a new City reference column for us.

Next, we will convert to tbl_time and group by our City variable. Note that we know this is a tbl_time object by Index: Time that gets printed along with the tibble.

### Period conversion

The first new idea to introduce is the ~period formula~. This tells the tibbletime functions how you want to time-group your data. It is specified as multiple ~ period, with examples being 1~d for “every 1 day,” and 4~m for “every 4 months.”

In our original data, it looks like weather is an hourly dataset, with each new data point coming in on the 51st minute of the hour for NYC and the 56th minute for SFO. Unfortunately, a number of points don’t follow this. Check out the following rows:

What we want is 1 row per hour, and in this case we get two rows for NYC hour 12. We can use as_period() to ensure that we only have 1 row for each hour

Now that we have our data in an hourly format, we probably don’t care about which minute it came in on. We can floor the date column using a helper function, time_floor(). Credit to Hadley Wickham because this is essentially a convenient wrapper around lubridate::floor_date(). Setting the period to 1~h floors each row to the beginning of the last hour.

### Visualize the data

Now that we have cleaned up a bit, let’s visualize the data.

Seems like hourly data is a bit overwhelming for this kind of chart. Let’s convert to daily and try again.

That’s better. It looks like NYC has a much wider range of temperatures than SFO. Both seem to be hotter in summer months.

### Period-based summaries

The dplyr::summarise() function is very useful for taking grouped summaries. time_summarise() takes this a step further by allowing you to summarise by period.

Below we take a look at the average and standard deviation of the temperatures calculated at monthly and bimonthly intervals.

### A closer look at July

July seemed to be one of the hottest months for NYC, let’s take a closer look at it.

To just grab July dates, use time_filter(). If you haven’t seen this before, a time formula is used to specify the dates to filter for. The one-sided formula below expands to include dates between, 2013-07-01 00:00:00 ~ 2013-07-31 23:59:59.

To visualize July’s weather, we will make a boxplot of the temperatures. Specifically, we will slice July into intervals of 2 days, and create a series of boxplots based on the data inside those intervals. To do this, we will use time_collapse(), which collapses a column of dates into a column of the same lenth, but where every row in a time interval shares the same date. You can use this resulting column for further grouping or labeling operations.

Let’s visualize to see if we can gain any insights. Wow, San Fran maintained a constant cool average of 60 degrees in the hottest month of the year!

### Period and rolling correlations

Finally, we will look at the correlation of temperatures in our two cities in a few different ways.

First, let’s look at the overall correlation. The corrr package provides a nice way to accomplish this with data frames.

Next, let’s look at monthly correlations. The general idea will be to nest each month into it’s own data frame, apply correlate() to each nested data frame, and then unnest the results. We will use time_nest() to easily perform the monthly nesting.

For each month, calculate the correlation tibble and then focus() on the NYC column. Then unnest and floor the results.

It seems that summer and fall months tend to have higher correlation than colder months.

And finally we will calculate the rolling correlation of NYC and SFO temperatures. The “width” of our roll will be monthly, or 360 hours since we are in hourly format.

There are a number of ways to do this, but for this example we introduce rollify(), which takes any function that you give it and creates a rolling version of it. The first argument to rollify() is the function that you want to modify, and it is passed to rollify() in the same way that you would pass a function to purrr::map(). The second argument is the window size. Call the rolling function just as you would call a non-rolling version of cor() from inside mutate().

It looks like the correlation is definitely not stable throughout the year, so that initial correlation value of .65 definitely has to be taken with a grain of salt!

### Rolling Functions: Pros/Cons and Recommendations

There are a number of ways to do rolling functions, and we recommend based on your needs. If you are interested in:

• Flexibility: Use rollify(). You can literally turn any function into a “tidy” rolling function. Think everything from rolling statistics to rolling regressions. Whatever you can dream up, it can do. The speed is fast, but not quite as fast as other Rcpp based libraries.

• Performance: Use the roll package, which uses RcppParallel as its backend making it the fastest option available. The only downside is flexibility since you cannot create custom rolling functions and are bound to those available.

### Wrapping up

We’ve touched on a few of the new features in tibbletime v0.0.2. Notably:

• rollify() for rolling functions

• as_period() with generic periods

• time_collapse() for collapsing date columns

A full change log can be found in the NEWS file on Github or CRAN.

We are always open to new ideas and encourage you to submit an issue on our Github repo here.

Have fun with tibbletime!

## Final thoughts

Mind you this is only v0.0.2. We have a lot of work to do, but we couldn’t wait any longer to share this. Feel free to kick the tires on tibbletime, and let us know your thoughts. Please submit any comments, issues or bug reports to us on GitHub here. Enjoy!