Speed Up Your Code: Parallel Processing with multidplyr

    Written on December 18, 2016

    There’s nothing more frustrating than waiting for long-running R scripts to iteratively run. I’ve recently come across a new-ish package for parallel processing that plays nicely with the tidyverse: multidplyr. The package has saved me countless hours when applied to long-running, iterative scripts. In this post, I’ll discuss the workflow to parallelize your code, and I’ll go through a real world example of collecting stock prices where it improves speed by over 5X for a process that normally takes 2 minutes or so. Once you grasp the workflow, the parallelization can be applied to almost any iterative scripts regardless of application.


    Russell 2000 Quantitative Stock Analysis in R: Six Stocks with Amazing, Consistent Growth

    Written on November 30, 2016

    The Russell 2000 Small-Cap Index, ticker symbol: ^RUT, is the hottest index of 2016 with YTD gains of over 18%. The index components are interesting not only because of recent performance, but because the top performers either grow to become mid-cap stocks or are bought by large-cap companies at premium prices. This means selecting the best components can result in large gains. In this post, I’ll perform a quantitative stock analysis on the entire list of Russell 2000 stock components using the R programming language. Building on the methodology from my S&P Analysis Post, I develop screening and ranking metrics to identify the top stocks with amazing growth and most consistency. I use R for the analysis including the rvest library for web scraping the list of Russell 2000 stocks, quantmod to collect historical prices for all 2000+ stock components, purrr to map modeling functions, and various other tidyverse libraries such as ggplot2, dplyr, and tidyr to visualize and manage the data workflow. Last, I use plotly to create an interactive visualization used in the screening process. Whether you are familiar with quantitative stock analysis, just beginning, or just interested in the R programming language, you’ll gain both knowledge of data science in R and immediate insights into the best Russell 2000 stocks, quantitatively selected for future returns!


    Quantitative Stock Analysis Tutorial: Screening the Returns for Every S&P500 Stock in Less than 5 Minutes

    Written on October 23, 2016

    Quantitative trading strategies are easy to develop in R if you can manage the data workflow. In this post, I analyze every stock in the S&P500 to screen in terms of risk versus reward. I’ll show you how to use quantmod to collect daily stock prices and calculate log returns, rvest to web scrape the S&P500 list of stocks from Wikipedia, purrr to map functions and perform calculations on nested tibbles (tidyverse data frames), and plotly to visualize risk versus reward and extract actionable information for use in your trading strategies. At the end, you will have a visualization that compares the entire set of S&P500 stocks, and we’ll screen them to find those with the best future prospects on a quantitative basis. As a bonus we’ll investigate correlations to add diversification to your portfolio. Finally, the code that generates the plotly and corrplot visualizations is made available on GitHub for future stock screening. Whether you are a veteran trader or a newbie stock enthusiast, you’ll learn a useful workflow for modeling and managing massive data sets using the tidyverse packages. And, the entire script runs in less than five minutes so you can begin screening stocks quickly.


    Customer Segmentation Part 3: Network Visualization

    Written on October 1, 2016

    This post is the third and final part in the customer segmentation analysis. The first post focused on K-Means Clustering to segment customers into distinct groups based on purchasing habits. The second post takes a different approach, using Pricipal Component Analysis (PCA) to visualize customer groups. The third and final post performs Network Visualization (Graph Drawing) using the igraph and networkD3 libraries as a method to visualize the customer connections and relationship strengths.


    Customer Segmentation Part 2: PCA for Segment Visualization

    Written on September 4, 2016

    This post is the second part in the customer segmentation analysis. The first post focused on k-means clustering in R to segment customers into distinct groups based on purchasing habits. This post takes a different approach, using Pricipal Component Analysis (PCA) in R as a tool to view customer groups. Because PCA attacks the problem from a different angle than k-means, we can get different insights. We’ll compare both the k-means results with the PCA visualization. Let’s see what happens when we apply PCA.


    Customer Segmentation Part 1: K-Means Clustering

    Written on August 7, 2016

    In this post, we’ll be using k-means clustering in R to segment customers into distinct groups based on purchasing habits. k-means clustering is an unsupervised learning technique, which means we don’t need to have a target for clustering. All we need is to format the data in a way the algorithm can process, and we’ll let it determine the customer segments or clusters. This makes k-means clustering great for exploratory analysis as well as a jumping-off point for more detailed analysis. We’ll walk through a relevant example using the Cannondale bikes data set from the orderSimulatoR project GitHub repository.


    orderSimulatoR: Simulate Orders for Business Analytics

    Written on July 12, 2016

    In this post, we will be discussing orderSimulatoR, which enables fast and easy R order simulation for customer and product learning. The basic premise is to simulate data that you’d retrieve from a SQL query of an ERP system. The data can then be merged with products and customers tables to data mine. I’ll go through the basic steps to create an order data set that combines customers and products, and I’ll wrap up with some visualizations to show how you can use order data to expose trends. You can get the scripts and the Cannondale bikes data set at the orderSimulatoR GitHub repository. In case you are wondering what simulated orders look like, click here to scroll to the end result.


    Marketing Strategy: Why MBAs Can Benefit from Learning Analytics

    Written by Matt Dancho on May 1, 2016

    Just because you’re a business professional does not mean you can’t or you shouldn’t pursue furthering yourself in analytics. Businesses view strategic decision making as a competitive advantage. You should too! Learning the basics behind data science not only adds value to your organization, it increases your value and thus your demand too.