Become a Data Scientist and accelerate your career in 6-months or less.
5-10 Hours Per Week. 80/20 Tools. End-To-End Business Projects.
Recreating RView's ''Reproducible Finance With R: Sector Correlations''
Written by Davis Vaughan
The folks at RStudio have a segment on their RViews blog, “Reproducible Finance with R”, one that we at Business Science are very fond of! In the spirit of reproducibility, we thought that it would be appropriate to recreate the RViews post, “Reproducible Finance with R: Sector Correlations”. This time, however, the tidyquant package will be used to streamline much of the code that is currently used. The main advantage of tidyquant is to bridge the gap between the best quantitative resources for collecting and manipulating quantitative data: xts, zoo, quantmod and TTR, and the data modeling workflow and infrastructure of the tidyverse. When implemented, tidyquantcuts the code down by about half and simplifies the workflow.
The folks at RStudio have a new segment in RViews (The RStudio blog) called “Reproducible Finance with R”. For their first installation of 2017 they looked at how Sector Exchange Traded Fund (ETF) returns correlate to the broader market using the Spider SP500 ETF (“SPY”) as a proxy for the SP500 index. The RViews post can be found here, and here’s a snapshot of the final chart comparing the correlation between the SP500 (overall market) and the Technology ETF over time.
Today, the newest member of the Business Science (BizSci) team, Davis Vaughan, shows how you can implement our R financial package, tidyquant, to streamline the RViews Sector Correlations analysis. I hope you’ll join me in welcoming Davis to our team! You can follow him on twitter, LinkedIn and GitHub.
Matt Dancho, Director of Product Development @ BizSci
Let’s start by loading some packages.
We’ll use the same ETF tickers as RViews.
Alright, now is where things get interesting! Let’s take a peek at the differences in how you might get data to solve this problem.
Here’s the RViews code snippet. RViews created a special function to import closing prices using getSymbols() directly from the quantmod package. They then used periodReturn() to convert these prices to weekly log returns. Internal to the function, there are calls to getSymbols and do.call with merge and lapply twice, once to get closing prices and once to get log weekly returns. Pretty complicated. Kudos to the author, Jonathan Regenstein, at RStudio for figuring this out.
And, the BizSci version using tidyquant. We first get the ETF prices using tq_get() and group the prices by ticker and sector. Then we use tq_transform() to get the period returns from the stock prices.
If you are new to tidyquant, tq_transform() is used when the return is in a different periodicity than the input. It accepts ohlc_fun = Cl and transform_fun = periodReturn , along with any additional periodReturn args passed by way of .... This tells the function to use the closing price to calculate period returns and return the result as a new tibble. Note that typically you would used the ohlc_fun = Ad for period returns since stock splits are present in closing prices, but for an ETF we should not have splits.
It’s that easy! No need for do.call(), lapply(), or any special functions! It’s all been taken care of for you. Grouping by ticker (and sector to keep the column) allows us to perform the transform on each group separately, but with one line of code. Also, notice that the data is preserved in a tidy format, as opposed to the xts format that RViews uses.
Additionally, RViews intends to create a flexdashboard from their notebook:
…this Notebook will be the first step toward an flexdashboard that lets us do more interactive exploration – choosing different sector ETFs and rolling windows
-Jonathan Regenstein, RStudio
It would be easy to now create a function wrapping the process like RViews did, allowing the user to just enter the tibble of tickers. This could be useful in the flexdashboard that they will create, but for this post, we chose not do to that.
The next step is to calculate rolling correlations between the SP500 index (“SPY”) ETF returns and the sector-specific ETF returns.
Here’s how RViews solved this problem in two steps.
Step 1: Create a Sector Index Correlation Function
A function is a nice approach, but the downside is it only works for one component unless you use the purrr package to map the function. A special function was again created to merge the sector and SPY returns and then apply the rolling correlation using rollapply() with another special function. Very well done, but complicated.
Step 2: Apply the function to the data
RViews applies the special function across the “Information Technology” ETF only. Again, purrr is needed to map across all ETF’s if desired, which is an additional step.
First, we add the weekly returns for the “SPY” index (which is currently the last group in the tibble) as it’s own column. This is what our correlations will be calculated against. To do this, we will have to isolate that “SPY” data, and merge it with the original data. The easiest way is to filter() the “SPY” weekly returns and then join (inner_join()) as a new column using by = "date" as the merge key.
Step 2: Use tq_mutate_xy() to apply runCor()
Now what? RViews used the more generic rollapply() function, and then created the function for correlations. While this is definitely possible using tq_mutate, it’s easier to just use the runCor() function from the TTR package through tq_mutate_xy() instead. If you are new to tidyquant, the mutate functions will do exactly what we need:
tq_mutate() aggregates the functions from quantmod, xts, zoo, and TTR using OHLCV style data and notation. It accepts ohlc_fun and mutate_fun to apply a function to OHLCV inputs. We don’t use this version because it can’t accept non-OHLC data or apply functions that require two primary arguments.
tq_mutate_xy() works with functions from quantmod, xts, zoo, and TTR packages that require two arguments (x and y). It’s also used when you have data that is not in OHLCV format. Here, we face both situations. It accepts x, y, and mutate_fun args, which handles our situation perfectly!
The usage of runCor by itself looks like: runCor(x, y, n = 10) so we will use tq_mutate_xy() to pass in the x and y arguments, and then pass through n = 20 using the .... As an aside, you may be wondering what the col_rename argument is. Simply put it renames the mutation output, which is surprisingly handy by eliminating one extra line of code.
There’s an added bonus…
As opposed to the RViews function, we actually calculated the rolling correlations for all of the ETF groups in the tibble, not just the one that you pass in! This is typically desired because the user is usually interested in understanding how all groups within a data set correlate to a baseline as opposed to just one.
Finally, let’s recreate the Dygraph for the “Information Technology” sector. Dygraphs take an xts object as input, which is NOT the format we are in currently (we are in tibble format). The most useful function here is as_xts(), a tidyquant function that provides an easy way to convert from tibbles to xts. Selecting just the “date” and “cor” columns from the input and specifying the date_col = date in as_xts() allows us to use the same code as RViews to create the Dygraph.
And that’s it! Hopefully you have seen that tidyquant is a great way to streamline and even scale your financial analysis workflow. And, we have only scratched the surface of what it can do! You can check out the stable release of tidyquant from CRAN, and the development release from Github. Stay tuned for more to come!