tidyquant 0.4.0: PerformanceAnalytics, Improved Documentation, ggplot2 Themes and More
Written by Matt Dancho
I’m excited to announce the release of
tidyquant version 0.4.0!!! The release is yet again sizable. It includes integration with the
PerformanceAnalytics package, which now enables full financial analyses to be performed without ever leaving the “tidyverse” (i.e. with DATA FRAMES). The integration includes the ability to perform performance analysis and portfolio attribution at scale (i.e. with many stocks or many portfolios at once)! But wait there’s more… In addition to an introduction vignette, we created five (yes, five!) topic-specific vignettes designed to reduce the learning curve for financial data scientists. We also have new
ggplot2 themes to assist with creating beautiful and meaningful financial charts. We included
tq_get support for “compound getters” so multiple data sources can be brought into a nested data frame all at once. Last, we have added new
tq_exchange() functions to make collecting stock data with
tq_get even easier. I’ll briefly touch on several of the updates. The package is open source, and you can view the code on the tidyquant github page.
Table of Contents
First, update to
FANG data set, which will be used in the examples. The
FANG data set contains the historical stock prices for FB, AMZN, NFLX, and GOOG from the beginning of 2013 through the end of 2016.
I also recommend the open-source RStudio IDE, which makes R Programming easy and efficient especially for financial analysis.
tidyquant: Bringing financial analysis to the tidyverse
Before I dive into the updates, if you are new to
tidyquant there’s a few core functions that you need to be aware of:
Getting Financial Data from the web:
tq_get(). This is a one-stop shop for getting web-based financial data in a “tidy” data frame format. Get data for daily stock prices (historical), key statistics (real-time), key ratios (historical), financial statements, dividends, splits, economic data from the FRED, FOREX rates from Oanda.
Manipulating Financial Data:
tq_mutate(). Integration for many financial functions from
tq_mutate() is used to add a column to the data frame, and
tq_transmute() is used to return a new data frame which is necessary for periodicity changes. Important: In v0.4.0,
tq_transform() for consistency with
Coercing Data To and From xts and tibble:
as_xts(). There are a ton of Stack Overflow articles on converting data frames to and from xts. These two functions can be used to answer 99% of these questions.
Performance Analysis and Portfolio Analysis:
tq_portfolio(). The newest additions to the
tidyquant family integrate
tq_performance() converts investment returns into performance metrics.
tq_portfolio() aggregates a group (or multiple groups) of asset returns into one or more portfolios.
To learn more, browse the new and improved vignettes.
We’ve got some neat examples to show off the new capabilities:
- PerformanceAnalytics Integration
- New User-Friendly Vignettes
- New ggplot2 Themes
- “Compound Getters” in tq_get
- tq_index and tq_exchange
1: PerformanceAnalytics Integration
PerformanceAnalytics package does two things very well. First, it enables performance analysis of investment returns using a wide variety of metrics that are detailed in the text, “Practical Portfolio Performance Measurement and Attribution” by Carl Bacon. Second, it enables portfolio aggregation, the process of aggregating a weighted group of stocks or investments into a single set of returns. When combined, this functionality enables portfolio attribution, a set of techniques used to explain a portfolio’s performance versus a benchmark.
The next few examples show off some of the basic capability. These examples scratch the surface of the full capability. Below is a figure demonstrating multiple portfolio analysis, which is an advanced topic discussed in the vignette.
A: Stock Performance Analysis
The Sharpe ratio is commonly used in finance as a measure of return per unit risk. The larger the value, the better the reward-to-risk trade off. The
PerformanceAnalytics package contains a function
SharpeRatio.modified) that can be used to quickly calculate from a set of returns. We’ll use
tq_performance to calculate the Sharpe ratio in a “tidy” way, using the
PerformanceAnalytics integration. Call
tq_performance_fun_options() to see a full list of integrated functions. Spoiler alert: there’s 128 functions divided into 14 categories.
tq_performance() allows us to apply
SharpeRatio to “tidy” data frames. The
tq_performance() function uses
Rb to specify the asset returns and baseline returns, respectively. These values get passed to the
performance_fun, which in our case will be
... allows the user to pass additional arguments to the underlying
PerformanceAnalytics function. The arguments are shown below.
To understand the end goal, we need to analyze the
SharpeRatio function. The arguments are displayed below. It contains
R a set of returns,
Rf the risk-free rate,
p the confidence level, and
FUN the value of the denominator (default returns Sharpe ratio using all three), and a few other functions that are not used in this example. It’s important to recognize that
R in the
SharpeRatio() function is specified using asset returns (
Ra) in the
tq_performance() function. The baseline returns argument (
Rb) in the
tq_performance() function is not required since the baseline is not required to calculate
SharpeRatio. Just keep in mind that you will either see
R or the combination of
Ra, Rb in the
PerformanceAnalytics function arguments, which indicates whether or not
Rb is required in
Now that we understand the function, we can easily begin the task of getting the Sharpe ratios for the “FANG” stocks. It involves three steps:
- Get data with
tq_get (already done since we have
FANG loaded). Make sure to group by symbol if the tibble includes prices for multiple stocks.
- Transmute to period returns with
tq_transmute(mutate_fun = periodReturn)
- Calculate Sharpe ratio with
tq_performance(performance_fun = SharpeRatio)
It’s very easy to get performance metrics for multiple stocks. Next, we’ll take a look at portfolio performance.
B: Basic Portfolio Performance
Combining a group of assets into a portfolio is one of the most useful techniques for controlling risk versus reward. The blending of assets naturally diversifies and can reduce downside risk. Further, portfolio attribution is a set of techniques used to analyze a portfolio or set of portfolios against a benchmark. The newest vignette, Performance Analysis with tidyquant, breaks the process into several steps shown in the workflow diagram below.
The process for a single portfolio aggregation without a benchmark is shown below. Portfolio aggregation requires a set of weights that can be applied to the various assets (stocks) in the portfolio. Our portfolio consists of FB, AMZN, NFLX, and GOOG. Passing the weights of 50%, 25%, 25%, and 0% blends and aggregates into one set of portfolio returns.
At this point, it’s nice to visualize using a wealth index, which shows the growth of the portfolio. The wealth index is actually an option in
tq_portfolio, but it can also be created by converting the portfolio returns using the
cumprod() function shown below.
We can even get some performance metrics using
PerformanceAnalytics functions. The table functions are the most useful since they calculate groups of portfolio attribution metrics. Eighteen different table functions are available. We’ll use the
table.Stats function, which returns a “tidy” set of 15 summary statistics on the stock returns including arithmetic mean, standard deviation, skewness, kurtosis, and more.
There’s also capability for performance attribution (comparing portfolio performance against a benchmark) and scaling analyses to multiple portfolios. For those interested in furthering the analysis, please visit the new vignette, Performance Analysis with tidyquant.
2: New User-Friendly Vignettes
Financial analysis can be overwhelming due to the depth and breadth of various topics. Add to it a new package with new functions and workflows, and the task can seem impossible. The good news is we understand.
We are actively taking steps to reduce the learning curve so you can get up to speed quickly. While the work is not done yet, we believe that the vignettes are a good place to start. The goal is to break down complex tasks without overloading the user with everything at once. There is now one main “introduction” that links to five topic-specific vignettes. Each topical vignette covers the basics behind the package including real-world examples so you can see how the package can be implemented. You can access the new vignettes here.
3: New ggplot2 Themes
tidyquant ships with some new themes to assist with creating beautiful and meaningful financial charts:
theme_tq() and some extra fun ones including
theme_tq_green(). To coordinate aesthetic colors and fills with the appropriate theme, we’ve added
scale_color_tq(theme = "light"). You can modify the
theme arg to get the colors to correspond with the different themes. In addition, we have
palette_green() for those interested in using the color palette. Here’s a quick example.
For those interested in learning more about the
tidyquant charting capabilities, please visit the updated vignette, Charting with tidyquant.
4: “Compound Getters” in tq_get
Compound getters are a nice tool for those looking to get multiple data sets for one stock symbol. For example, one may want the “key.ratios” and the “key.stats”, which provides key fundamental and financial ratio data on both a historical and real-time basis, respectively. You can now pull this information in one call to
tq_get using a “compound getter”.
Let’s examine what’s in the “key.ratios” column using
Like peeling away layers we can see whats inside. Let’s do one more
We can do the same thing with the “key.stats”. Set
.drop = TRUE to remove the “key.ratios” column.
The benefit to “compound getters” is that all your data is stored in one data frame. To access it, you can simply
unnest the list columns. Additionally, the “compound getters” can be scaled in the same way that a single get can be scaled: with a vector of stock symbols or a data frame of stock symbols with the symbols in the first column. See the next section for scaling using the new
5. tq_index and tq_exchange
We got some really good feedback from a certain someone at RStudio on combining two calls to
tq_get() in a row for retrieving an index of stock symbols (e.g. “SP500”) and then the scaling the retrieval of data for the stock symbols. The advice was really good because (1) it was ugly having two calls to
tq_get() in a row and (2) more importantly it got us thinking how we can improve scaling data collection. Here’s the significant change from “old way” to the “new way”.
The separation of a stock list from a call to retrieve the data for each of the stocks is fundamentally a good idea because now we can have more lists. For example, if you want to download stock prices for every stock covered on the NASDAQ exchange, you can use the new
tq_exchange("NASDAQ") to retrieve the stock list and then pipe (
%>%) the list to
tq_get. (Warning: A word of caution that this could take 10-20 minutes to download the stock prices for all 3169 stock symbols.)
The combination of
tq_exchange now gives the user access to a wide range of stock lists. To get the full list of options, use
This is an exciting release for a few reasons. First, the
PerformanceAnalytics integration fills a big gap that now allows full financial analysis to be performed within the “tidyverse” (i.e. using data frames only). You can start a workflow with a symbol or set of symbols and through piping (
tq_performance can end with performance metrics all in a few lines of code. Previously this was impossible.
Second, portfolio attribution and performance analysis is now possible in the “tidyverse”. This is very interesting because with the data science workflow discussed in R for Data Science the scale at which portfolios can be modeled and analyzed is limitless (refer to many models and the
Third, data science is a rapidly evolving field with new people joining the community by the second. With this influx we recognize it’s important to reduce the learning curve for “financial data scientists”, those looking to apply data science to finance. As a result, we are actively taking steps to reduce the learning curve. The first step of providing a set of improved vignettes is complete. We will continue to focus on this area in the future.
This post was meant to give users and potential users a flavor for the new additions to
tidyquant v0.4.0. We took a peek at the new
PerformanceAnalytics integration, which enables performance analysis and portfolio aggregation. We introduced the new vignettes, which are topical and are designed to get users up to speed quickly. We discussed several other important new features such as new
ggplot2 themes, the new support for “compound getters” in
tq_get, and the new
tq_exchange functions for retrieving stock lists. There are a number of other changes not specifically addressed. Those interested can view the NEWS here.
Tidyquant Vignettes: This overview just scratches the surface of
tidyquant. The vignettes explain much, much more!
R for Data Science: A free book that thoroughly covers the “tidyverse”. A prerequisite for maximizing your abilities with