Customer Segmentation Part 1: K-Means Clustering

    Written on August 7, 2016

    In this post, we’ll be using k-means clustering in R to segment customers into distinct groups based on purchasing habits. k-means clustering is an unsupervised learning technique, which means we don’t need to have a target for clustering. All we need is to format the data in a way the algorithm can process, and we’ll let it determine the customer segments or clusters. This makes k-means clustering great for exploratory analysis as well as a jumping-off point for more detailed analysis. We’ll walk through a relevant example using the Cannondale bikes data set from the orderSimulatoR project GitHub repository.


    orderSimulatoR: Simulate Orders for Business Analytics

    Written on July 12, 2016

    In this post, we will be discussing orderSimulatoR, which enables fast and easy R order simulation for customer and product learning. The basic premise is to simulate data that you’d retrieve from a SQL query of an ERP system. The data can then be merged with products and customers tables to data mine. I’ll go through the basic steps to create an order data set that combines customers and products, and I’ll wrap up with some visualizations to show how you can use order data to expose trends. You can get the scripts and the Cannondale bikes data set at the orderSimulatoR GitHub repository. In case you are wondering what simulated orders look like, click here to scroll to the end result.


    Marketing Strategy: Why MBAs Can Benefit from Learning Analytics

    Written by Matt Dancho on May 1, 2016

    Just because you’re a business professional does not mean you can’t or you shouldn’t pursue furthering yourself in analytics. Businesses view strategic decision making as a competitive advantage. You should too! Learning the basics behind data science not only adds value to your organization, it increases your value and thus your demand too.


    A Data Scientist's Resources

    Written by Matt Dancho on April 9, 2016

    Getting up and running in data science is tough. It’s easy to get overwhelmed, and your biggest asset is time (don’t waste it). Here’s some resources to help speed you along. I’ll continually update these as I get time. Feel free to comment or email me if I’m missing something.