tidyquant 0.3.0: ggplot2 Enhancements, Real-Time Data, and More
Written on January 22, 2017
tidyquant, version 0.3.0, is a pretty sizable release that includes a little bit for everyone, including new financial charting and moving average geoms for use with
ggplot2, a new
tq_get get option called
"key.stats" for retrieving real-time stock information, and several nice integrations that improve the ease of scaling your analyses. If your not already familiar with
tidyquant, it integrates the best quantitative resources for collecting and analyzing quantitative data,
TTR, with the
tidyverse allowing for seamless interaction between each. I’ll briefly touch on some of the updates by going through some neat examples. The package is open source, and you can view the code on the tidyquant github page.
Table of Contents
- v0.3.0 Updates
- Further Reading
tidyquant: Bringing financial analysis to the tidyverse
When I said this was a big release, I wasn’t kidding. We have some major enhancements in
Financial Visualizations for ggplot2: Candlestick charts, barcharts, moving averages and Bollinger Bands can be used in the
ggplot“grammar of graphics” workflow. There’s a new vignette, Charting with tidyquant, that details the new financial charting capabilities.
- Key stats from Yahoo Finance: Users can now get 55 different key statistics in real-time from Yahoo Finance with the new
"key.stats"get option. The statistics include Bid, Ask, Day’s High, Day’s Low, Last Trade Price, Current P/E Ratio, and many more most of which change throughout the day. With the addition of the key statistics,
tq_getis now truly a one-stop shop for financial information. The user can now get:
- Real-time key stock statistics with
- Historical key ratios and financial information over the past 10-years with
- Quarterly and annual financial statement data with
- Historical daily stock prices with
- Stock indexes for 18 different indexes with
- And more!
- Real-time key stock statistics with
Enhancements that Make Scaling Financial Analysis Simple:
tq_getnow accepts multiple stocks in the form of either a character vector (e.g.
c("AAPL", "GOOG", "FB")) or a data frame with the stocks in the first column. This means scaling is ridiculously simple now. A call to
tq_get(c("AAPL", "GOOG", "FB"), get = "stock.prices")now gets the 10-years of daily stock prices for all three stocks in one data frame!
tq_transformnow work with grouped data frames. This means that you can extend the
TTRfunctions to grouped data frames the same way that you can with
dplyr::mutate. In addition, you can now more easily rename the transformed / mutated data frame, with the
col_renameargument. All of this saves you time and requires less code!
This concludes the major changes. Now, let’s go through some examples!
First, update to
I also recommend the open-source RStudio IDE, which makes R Programming easy and efficient.
We’ve got some neat examples to show off the new capabilities:
Enhanced Financial Data Visualizations: We’ll check out how to use the new
ggplot2, which provide great visualizations for time-series and stock data!
Working with Key Statistics: We’ll investigate the new
get = "key.stats", which enables access to real-time, intraday trading information!
Scaling Your Analysis: We’ll test out some of the new scaling features that make it even easier to scale your analysis from one security to many!
I absolutely love these new ggplot geoms that come packaged with
tidyquant, and I’m really excited to show them off! Two new chart types come packaged with
geom_barchart (not to be confused with
geom_bar). In this post, we’ll focus on the candlestick chart, but the barchart works in a very similar manner.
Before we start, let’s get some data using
tq_get. The first call gets a single stock (nothing new here), and the second call retrieves the FANG stocks using the new scaling functionality by piping (
%>%) a character vector of symbols to
tq_get (there are other ways too!).
Before v0.3.0, we used
geom_line to create a line chart like so. Note that
coord_x_date is a new
tidyquant coordinate function that enables zooming in a part of the chart without out-of-bounds data loss (
scale_x_date is similar but causes out-of-bounds data loss which wreaks havoc on moving average geoms).
tidyquant, we can replace the
geom_candlestick to create a beautiful candlestick chart that shows open, high, low, close, and direction visually. The only real difference is that we need to specify the aesthetic arguments,
close. Everything else can stay the same.
Pretty sweet! Let’s take this a step further with moving averages. The moving average geom,
geom_ma, is used to quickly draw moving average lines using a moving average function,
ma_fun, that is one of seven from the
TTR package. We can use these to “rapid prototype” moving averages, enabling us to quickly identify changes in trends. Let’s add 15 and 50-day moving averages. Note that
geom_ma takes arguments to control the moving average function (
ma_fun = SMA and
n = 15) and arguments to control the line such as
color = "red" or
linetype = 4.
We can also use Bollinger Bands to help visualize volatility. BBands take a moving average, such as
ma_fun = SMA from
TTR, and a standard deviation,
sd = 2 by default. Because BBands depend on the high, low and close prices, we need to add these as aesthetic arguments. Let’s use a 20-day simple moving average with two standard deviations. We can see that there were two periods, one in October and one in November, that had higher volatility.
Last, we can visualize multiple stocks at once by adding a
group aesthetic and tacking on a
facet_wrap at the end of the
ggplot workflow. Note that the out-of-bounds data becomes important to the scale of the facet: too much data and the y-axis is off scale, too little data and the moving average is thrown off. An easy way to adjust is to use
filter() to subtract double the moving average number of periods (
2 * n) from the start date of the data. This reduces the out-of-bounds data without eliminating data that the moving average function needs for calculations.
tq_get is the get option
get = "key.stats". So, what are key stats? Yahoo Finance has an amazing list of real-time statistics such as bid price, ask price, day’s high, day’s low, change, and many more features that change throughout the day. Key stats are our access to live data, the most current features of a stock / company, many of which are accurate to the second that they are retrieved. Pretty neat!
Getting Key Stats
Let’s get some key stats, and see what’s inside. We get key stats using the
tq_get function, setting
get = "key.stats". When we show the data, it’s kind of messy (there’s a reason) so I’ve just listed the first ten column names. It comes in the form of a one row tibble (tidy data frame) that has 55 columns, one for each key stat.
The reason that the data comes this way is because, using the new scaling capability, we can get key stats for multiple stocks, and the rows get stacked on top of each other. This makes comparing key stats very easy!
Retrieve Real-Time Data at Periodic Intervals
Something great about real-time data is that it can be collected at periodic intervals when trading is in-session! The following code chunk when run will retrieve stock prices at a periodic interval:
Comparing Historical Data to Current Data
We now have
get = "key.stats" for current stats and with v0.2.0 we got
get = "key.ratios" for 10-years of historical ratios. When combined, we can now compare current attributes to historical trends. To put into perspective, we will investigate the P/E Ratio: Comparing Historical Trends Versus Current Value for AAPL. The P/E ratio is a measure of the stock valuation. Stocks are considered “expensive” when they trade above historical averages or above industry averages.
We already have the key stats from AAPL, so getting the current P/E Ratio is very easy.
Due to the amount of data and time-series nature, the key ratios come as a nested tibble, grouped by section type.
We need to get the historical P/E Ratios, which are in the “Valuation Ratios” section. We will do a series of filtering and unnesting to peel away the layers and isolate the “Price to Earnings” time-series data.
Now, we are ready to visualize the P/E Ratio: Comparing Historical Trends Versus Current Value for AAPL. The visualization below is inspired by r-statistics.co, an awesome resource for
ggplot2 and R analysis. We add the following:
geom_point()to chart the historical data
geom_ma()to chart the three period simple moving average (the three period average helps identify the trend through the noise)
geom_hline()to add a horizontal line at current P/E Ratio obtained from key stats.
- Legend: We manipulate the colors with
scale_color_manual()and the position in the
- Logo: A logo is generated as a
grob(grid graphical object) using the
pngpackages. The function
annotate_custom()allows us to simply add to the ggplot workflow. See Add an Image to Background for a tutorial.
The chart shows that the current valuation is slightly above the recent historical valuation indicating that the stock prices is slightly “expensive”. However, given that the P/E ratio is below the current SP500 average of 25, courtesy of www.multpl.com, one could also consider this stock “inexpensive”. It just depends on your perspective. :)
Probably the single most important benefit of performing financial analysis in the
tidyverse is the ability to scale. Based on some excellent feedback from @KanAugust, I have made scaling even easier. There’s two new options for scaling:
New Option 1: Passing a character vector of symbols:
Send a character vector in the form
c("X", "Y", "Z") to
tq_get. A new column is generated,
symbol.x, with the symbols that were passed to the
New Option 2: Passing a tibble with symbols in the first column:
We can combine
tq_get calls using
get = "stock.index" and
get = "stock.prices" to pass a stock index to get stock prices. I’ve added
slice(1:3) to get the first three stocks from the index, which reduces the download time. If you remove
slice(1:3), you will get the historical prices for all stocks in an index in the next step!
First, get stocks from an index.
Then get stock prices. Note that symbols must be in the first column.
We can also use
dplyr::group_by to scale analyses! Thanks to some great feedback from @dvaughan32, the
col_rename argument is available to conveniently rename the newly transformed / mutated columns.
Here’s a powerful example: We can use
tq_transform to collect annual returns for a tibble of stock prices for multiple stocks. The result can be piped to
ggplot for charting.
tidyquant package has several enhancements for financial analysis:
ggplot2geoms for candlestick charts, barcharts, moving averages, and Bollinger Bands, and a brand new vignette to help guide users on charting capabilities.
get = "key.stats"for current stats on stocks: 55 total are available. The key stats compliment the key ratios (`get = “key.ratios”), which contain 10-years of historical information on various key ratios and financial information.
New capabilities for scaling financial analyses to many stocks:
tq_getwith character vectors or tibbles of stocks
With these updates, we can really do full financial analyses without ever leaving the
We went over a few examples to illustrate the main updates to
The first example showed an implementation of several new
tidyquantgeoms that work with
The second example showed use of the new
get = "key.stats". The key stats provide real-time data from Yahoo Finance, and are a handy complement to the historical data provided using get options,
The third and final example showed some of the improvements in scaling analysis with the
tidyverse. You can now pipe multiple symbols into
tq_getto scale any of the get options, and you can use
I hope you enjoy the new features as much as I did creating them. As always there’s more to come! :)
r-statistics.co: You need to check out this website, which contains a wealth of quality, up-to-date R information. The Top 50 ggplot2 visualizations is amazing. This is now my go-to reference on
Tidyquant Vignettes: This tutorial just scratches the surface of
tidyquant. The vignettes explain much, much more!
R for Data Science: A free book that thoroughly covers the
TTR on CRAN: The reference manual covers each of the
Zoo Vignettes: Covers the
zoorollapply functions as well as other usage.