# Time Series in 5-Minutes, Part 1: Data Wrangling and Rolling Calculations

Written by Matt Dancho

Have 5-minutes? Then letâ€™s learn time series. In this short articles series, I highlight how you can get up to speed quickly on important aspects of time series analysis. Today we are focusing preparing data for timeseries analysis rolling calculations.

## Time Series in 5-Mintues Articles in this Series

I just released timetk 2.0.0 (read the release announcement). A ton of new functionality has been added. Weâ€™ll discuss some of the key pieces in this article series:

# Have 5-Minutes? Then letâ€™s learn Rolling Calculations

A collection of tools for working with time series in R Time series data wrangling is an essential skill for any forecaster. timetk includes the essential data wrangling tools. In this tutorial:

• Summarise by Time - For time-based aggregations
• Filter by Time - For complex time-based filtering
• Pad by Time - For filling in gaps and going from low to high frequency
• Slidify - For turning any function into a sliding (rolling) function

• Imputation - Needed for Padding (See Low to High Frequency)
• Advanced Filtering - Using the new add time %+time infix operation (See Padding Data: Low to High Frequency)
• Visualization - plot_time_series() for all visualizations

# Data

This tutorial will use the FANG dataset:

• Daily
• Irregular (missing business holidays and weekends)
• 4 groups (FB, AMZN, NFLX, and GOOG).

The volume column contains the trade volume (number of times the stock was transacted) for the day.

# Summarize by Time

summarise_by_time() aggregates by a period. Itâ€™s great for:

• Period Aggregation - SUM()
• Period Smoothing - AVERAGE(), FIRST(), LAST()

## Period Summarization

Objective: Get the total trade volume by quarter

• Use SUM()
• Aggregate using .by = "quarter"

## Period Smoothing

Objective: Get the first value in each month

• We can use FIRST() to get the first value, which has the effect of reducing the data (i.e. smoothing). We could use AVERAGE() or MEDIAN().
• Use the summarization by time: .by = "month" to aggregate by month.

# Filter By Time

Used to quickly filter a continuous time range.

## Time Range Filtering

Objective: Get the adjusted stock prices in the 3rd quarter of 2013.

• .start_date = "2013-09": Converts to â€ś2013-09-01
• .end_date = "2013": Converts to â€ś2013-12-31
• A more advanced example of filtering using %+time and %-time is shown in â€śPadding Data: Low to High Frequencyâ€ť.

Used to fill in (pad) gaps and to go from from low frequency to high frequency. This function uses the awesome padr library for filling and expanding timestamps.

## Fill in Gaps

Objective: Make an irregular series regular.

• We will leave padded values as NA.
• We can add a value using .pad_value or we can impute using a function like ts_impute_vec() (shown next).

## Low to High Frequency

Objective: Go from Daily to Hourly timestamp intervals for 1 month from the start date. Impute the missing values.

• .by = "hour" pads from daily to hourly
• Imputation of hourly data is accomplished with ts_impute_vec(), which performs linear interpolation when period = 1.
• Filtering is accomplished using:
• â€śstartâ€ť: A special keyword that signals the start of a series
• FIRST(date) %+time% "1 month": Selecting the first date in the sequence then using a special infix operation, %+time%, called â€śadd timeâ€ť. In this case I add â€ś1 monthâ€ť.

# Sliding (Rolling) Calculations

We have a new function, slidify() that turns any function into a sliding (rolling) window function. It takes concepts from tibbletime::rollify() and it improves them with the R package slider.

## Rolling Mean

Objective: Calculate a â€ścenteredâ€ť simple rolling average with partial window rolling and the start and end windows.

• slidify() turns the AVERAGE() function into a rolling average.

For simple rolling calculations (rolling average), we can accomplish this operation faster with slidify_vec() - A vectorized rolling function for simple summary rolls (e.g. mean(), sd(), sum(), etc)

## Rolling Regression

Objective: Calculate a rolling regression.

• This is a complex sliding (rolling) calculation that requires multiple columns to be involved.
• slidify() is built for this.
• Use the multi-variable purrr ..1, ..2, ..3, etc notation to setup a function

# Have questions on using Timetk for time series?

Make a comment in the chat below. đź‘‡

And, if you plan on using timetk for your business, itâ€™s a no-brainer - Join the Time Series Course.