Recreating RView's ''Reproducible Finance With R: Sector Correlations''
Written by Davis Vaughan
The folks at RStudio have a segment on their RViews blog, “Reproducible Finance with R”, one that we at Business Science are very fond of! In the spirit of reproducibility, we thought that it would be appropriate to recreate the RViews post, “Reproducible Finance with R: Sector Correlations”. This time, however, the
tidyquant package will be used to streamline much of the code that is currently used. The main advantage of
tidyquant is to bridge the gap between the best quantitative resources for collecting and manipulating quantitative data:
TTR, and the data modeling workflow and infrastructure of the
tidyverse. When implemented,
tidyquant cuts the code down by about half and simplifies the workflow.
Table of Contents
Correlating Sector ETF Returns to the SP500
The folks at RStudio have a new segment in RViews (The RStudio blog) called “Reproducible Finance with R”. For their first installation of 2017 they looked at how Sector Exchange Traded Fund (ETF) returns correlate to the broader market using the Spider SP500 ETF (“SPY”) as a proxy for the SP500 index. The RViews post can be found here, and here’s a snapshot of the final chart comparing the correlation between the SP500 (overall market) and the Technology ETF over time.
Source: Reproducible Finance with R: Sector Correlations
Today, the newest member of the Business Science (BizSci) team, Davis Vaughan, shows how you can implement our R financial package,
tidyquant, to streamline the RViews Sector Correlations analysis. I hope you’ll join me in welcoming Davis to our team! You can follow him on twitter, LinkedIn and GitHub.
Matt Dancho, Director of Product Development @ BizSci
Let’s start by loading some packages.
We’ll use the same ETF tickers as RViews.
Alright, now is where things get interesting! Let’s take a peek at the differences in how you might get data to solve this problem.
Here’s the RViews code snippet. RViews created a special function to import closing prices using
getSymbols() directly from the
quantmod package. They then used
periodReturn() to convert these prices to weekly log returns. Internal to the function, there are calls to
lapply twice, once to get closing prices and once to get log weekly returns. Pretty complicated. Kudos to the author, Jonathan Regenstein, at RStudio for figuring this out.
And, the BizSci version using
tidyquant. We first get the ETF prices using
tq_get() and group the prices by ticker and sector. Then we use
tq_transform() to get the period returns from the stock prices.
If you are new to
tq_transform() is used when the return is in a different periodicity than the input. It accepts
ohlc_fun = Cl and
transform_fun = periodReturn , along with any additional
periodReturn args passed by way of
.... This tells the function to use the closing price to calculate period returns and return the result as a new tibble. Note that typically you would used the
ohlc_fun = Ad for period returns since stock splits are present in closing prices, but for an ETF we should not have splits.
It’s that easy! No need for
lapply(), or any special functions! It’s all been taken care of for you. Grouping by ticker (and sector to keep the column) allows us to perform the transform on each group separately, but with one line of code. Also, notice that the data is preserved in a tidy format, as opposed to the xts format that RViews uses.
Additionally, RViews intends to create a flexdashboard from their notebook:
…this Notebook will be the first step toward an flexdashboard that lets us do more interactive exploration – choosing different sector ETFs and rolling windows
-Jonathan Regenstein, RStudio
It would be easy to now create a function wrapping the process like RViews did, allowing the user to just enter the tibble of tickers. This could be useful in the flexdashboard that they will create, but for this post, we chose not do to that.
The next step is to calculate rolling correlations between the SP500 index (“SPY”) ETF returns and the sector-specific ETF returns.
Here’s how RViews solved this problem in two steps.
Step 1: Create a Sector Index Correlation Function
A function is a nice approach, but the downside is it only works for one component unless you use the
purrr package to map the function. A special function was again created to merge the sector and SPY returns and then apply the rolling correlation using
rollapply() with another special function. Very well done, but complicated.
Step 2: Apply the function to the data
RViews applies the special function across the “Information Technology” ETF only. Again,
purrr is needed to map across all ETF’s if desired, which is an additional step.
And, here’s how we solved it using
Step 1: Merge SPY ETF Weekly Returns with Sector ETF Weekly Returns
First, we add the weekly returns for the “SPY” index (which is currently the last group in the tibble) as it’s own column. This is what our correlations will be calculated against. To do this, we will have to isolate that “SPY” data, and merge it with the original data. The easiest way is to
filter() the “SPY” weekly returns and then join (
inner_join()) as a new column using
by = "date" as the merge key.
Step 2: Use tq_mutate_xy() to apply runCor()
Now what? RViews used the more generic
rollapply() function, and then created the function for correlations. While this is definitely possible using
tq_mutate, it’s easier to just use the
runCor() function from the
TTR package through
tq_mutate_xy() instead. If you are new to
tidyquant, the mutate functions will do exactly what we need:
tq_mutate() aggregates the functions from
TTR using OHLCV style data and notation. It accepts
mutate_fun to apply a function to OHLCV inputs. We don’t use this version because it can’t accept non-OHLC data or apply functions that require two primary arguments.
tq_mutate_xy() works with functions from
TTR packages that require two arguments (x and y). It’s also used when you have data that is not in OHLCV format. Here, we face both situations. It accepts
mutate_fun args, which handles our situation perfectly!
The usage of
runCor by itself looks like:
runCor(x, y, n = 10) so we will use
tq_mutate_xy() to pass in the
y arguments, and then pass through
n = 20 using the
.... As an aside, you may be wondering what the
col_rename argument is. Simply put it renames the mutation output, which is surprisingly handy by eliminating one extra line of code.
There’s an added bonus…
As opposed to the RViews function, we actually calculated the rolling correlations for all of the ETF groups in the tibble, not just the one that you pass in! This is typically desired because the user is usually interested in understanding how all groups within a data set correlate to a baseline as opposed to just one.
Finally, let’s recreate the Dygraph for the “Information Technology” sector. Dygraphs take an xts object as input, which is NOT the format we are in currently (we are in tibble format). The most useful function here is
tidyquant function that provides an easy way to convert from tibbles to xts. Selecting just the “date” and “cor” columns from the input and specifying the
date_col = date in
as_xts() allows us to use the same code as RViews to create the Dygraph.
And that’s it! Hopefully you have seen that
tidyquant is a great way to streamline and even scale your financial analysis workflow. And, we have only scratched the surface of what it can do! You can check out the stable release of
tidyquant from CRAN, and the development release from Github. Stay tuned for more to come!