`tidyquant`

, version 0.2.0, is now available on CRAN. If your not already familiar, `tidyquant`

integrates the best quantitative resources for collecting and analyzing quantitative data, `xts`

, `zoo`

, `quantmod`

and `TTR`

, with the tidy data infrastructure of the `tidyverse`

allowing for seamless interaction between each. I’ll briefly touch on some of the updates. The package is open source, and you can view the code on the tidyquant github page.

### tidyquant: Bringing Quantitative Financial Analysis to the tidyverse

My new package, `tidyquant`

, is now available on CRAN. `tidyquant`

integrates the best quantitative resources for collecting and analyzing quantitative data, `xts`

, `quantmod`

and `TTR`

, with the tidy data infrastructure of the `tidyverse`

allowing for seamless interaction between each. While this post aims to introduce `tidyquant`

to the *R community*, it just scratches the surface of the features and benefits. We’ll go through a simple stock visualization using `ggplot2`

, which which shows off the integration. The package is open source, and you can view the code on the tidyquant github page.

### Speed Up Your Code: Parallel Processing with multidplyr

There’s nothing more frustrating than waiting for long-running *R* scripts to iteratively run. I’ve recently come across a new-ish package for parallel processing that plays nicely with the tidyverse: `multidplyr`

. The package has saved me countless hours when applied to long-running, iterative scripts. In this post, I’ll discuss the workflow to parallelize your code, and I’ll go through a real world example of collecting stock prices where it improves speed by over 5X for a process that normally takes 2 minutes or so. Once you grasp the workflow, the parallelization can be applied to almost any iterative scripts regardless of application.

### Russell 2000 Quantitative Stock Analysis in R: Six Stocks with Amazing, Consistent Growth

The Russell 2000 Small-Cap Index, ticker symbol: ^RUT, is the hottest index of 2016 with **YTD gains of over 18%**. The index components are interesting not only because of recent performance, but because the top performers either grow to become mid-cap stocks or are bought by large-cap companies at premium prices. This means **selecting the best components can result in large gains**. In this post, I’ll perform a quantitative stock analysis on the entire list of Russell 2000 stock components using the *R programming language*. Building on the methodology from my S&P Analysis Post, I develop screening and ranking metrics to identify the **top stocks with amazing growth and most consistency**. I use *R* for the analysis including the `rvest`

library for web scraping the list of Russell 2000 stocks, `quantmod`

to collect historical prices for all 2000+ stock components, `purrr`

to map modeling functions, and various other `tidyverse`

libraries such as `ggplot2`

, `dplyr`

, and `tidyr`

to visualize and manage the data workflow. Last, I use `plotly`

to create an interactive visualization used in the screening process. Whether you are familiar with quantitative stock analysis, just beginning, or just interested in the *R programming language*, you’ll gain both knowledge of data science in *R* and immediate insights into the best Russell 2000 stocks, quantitatively selected for future returns!

### Quantitative Stock Analysis Tutorial: Screening the Returns for Every S&P500 Stock in Less than 5 Minutes

Quantitative trading strategies are easy to develop in **R** if you can manage the data workflow. In this post, I analyze every stock in the S&P500 to screen in terms of **risk versus reward**. I’ll show you how to use `quantmod`

to collect daily stock prices and calculate log returns, `rvest`

to web scrape the S&P500 list of stocks from *Wikipedia*, `purrr`

to map functions and perform calculations on nested tibbles (`tidyverse`

data frames), and `plotly`

to visualize risk versus reward and extract actionable information for use in your trading strategies. At the end, you will have a visualization that compares the entire set of S&P500 stocks, and we’ll screen them to find those with the best future prospects on a quantitative basis. As a bonus we’ll investigate correlations to add diversification to your portfolio. Finally, the code that generates the `plotly`

and `corrplot`

visualizations is made available on GitHub for future stock screening. Whether you are a veteran trader or a newbie stock enthusiast, you’ll learn a useful workflow for modeling and managing massive data sets using the `tidyverse`

packages. And, the entire script runs in **less than five minutes** so you can begin screening stocks quickly.

### Customer Segmentation Part 3: Network Visualization

This post is the third and final part in the customer segmentation analysis. The first post focused on *K*-Means Clustering to segment customers into distinct groups based on purchasing habits. The second post takes a different approach, using Pricipal Component Analysis (PCA) to visualize customer groups. The third and final post performs Network Visualization (Graph Drawing) using the `igraph`

and `networkD3`

libraries as a method to visualize the customer connections and relationship strengths.

### Customer Segmentation Part 2: PCA for Segment Visualization

This post is the second part in the customer segmentation analysis. The first post focused on *k*-means clustering in `R`

to segment customers into distinct groups based on purchasing habits. This post takes a different approach, using Pricipal Component Analysis (PCA) in `R`

as a tool to view customer groups. Because PCA attacks the problem from a different angle than *k*-means, we can get different insights. We’ll compare both the *k*-means results with the PCA visualization. Let’s see what happens when we apply PCA.

### Customer Segmentation Part 1: K-Means Clustering

In this post, we’ll be using *k*-means clustering in `R`

to segment customers into distinct groups based on purchasing habits. *k*-means clustering is an unsupervised learning technique, which means we don’t need to have a target for clustering. All we need is to format the data in a way the algorithm can process, and we’ll let it determine the customer segments or clusters. This makes *k*-means clustering great for exploratory analysis as well as a jumping-off point for more detailed analysis. We’ll walk through a relevant example using the Cannondale `bikes data set`

from the `orderSimulatoR`

project GitHub repository.