Wrangling Big Data is one of the best features of the R programming language - which boasts a Big Data Ecosystem that contains fast in-memory tools (e.g. data.table) and distributed computational tools (sparklyr). With the NEW dtplyr package, data scientists with dplyr experience gain the benefits of data.table backend. We saw a 3X speed boost for dplyr!
I'm pleased to announce the introduction of correlationfunnel version 0.1.0, which officially hit CRAN yesterday. The correlationfunnel package is something I've been using for a while to efficiently explore data, understand relationships, and get to business insights as fast as possible.
This article demonstrates a real-world case study for business forecasting with regression models including artificial neural networks (ANNs) with Keras
Model interpretability is critical to businesses. If you want to use high performance models (GLM, RF, GBM, Deep Learning, H2O, Keras, xgboost, etc), you need to learn how to explain them. With machine learning interpretability growing in importance, several R packages designed to provide this capability are gaining in popularity. We analyze the IML package in this article.
Real world data science - Learn how to compete in a Kaggle Competition using Machine Learning with R.
I’m pleased to announce that we released brand new content for our flagship course, Data Science For Business (DS4B 201). Over the course of 10 weeks, the DS4B 201 course teaches students and end-to-end data science project solving Employee Churn with R, H2O, & LIME. The latest content is focused on transitioning from modeling Employee Churn with H2O and LIME to evaluating our binary classification model using Return-On-Investment (ROI), thus delivering business value. We do this through application of a special tool called the Expected Value Framework. Let’s learn about the new course content available now in DS4B 201, Chapter 7, which covers the Expected Value Framework for modeling churn with H2O!
KERAS LSTM deep learning time series analysis. Use the NASA sunspots data set to predict sunspots ten years into the future with an KERAS LSTM deep learning model.
Predict employee churn with H2O machine learning and LIME. Use LIME (local Interpretable Model-agnostic Explanations) for model explanation in data science for business.
We are pleased to announce that our Data Science For Business (#DS4B) Course (HR 201) is OFFICIALLY OPEN! This course is for intermediate to advanced data scientists looking to apply H2O and LIME to a real-world binary classification problem in an organization: Employee Attrition. If you are interested applying data science for business in a real-world setting with advanced tools using a client-proven system that delivers ROI to the organization, then this is the course for you. For a limited time we are offering 15% off enrollment.
Last November, our data science team embarked on a journey to build the ultimate Data Science For Business (DS4B) learning platform. We saw a problem: A gap exists in organizations between the data science team and the business. To bridge this gap, we’ve created Business Science University, an online learning platform that teaches DS4B, using high-end machine learning algorithms, and organized in the fashion of an on-premise workshop but at a fraction of the price. I’m pleased to announce that, in 5 days, we will launch our first course, HR 201, as part of a 4-course Virtual Workshop. We crafted the Virtual Workshop after the data science program that we wished we had when we began data science (after we got through the basics of course!). Now, our data science process is being opened up to you. We guide you through our process for solving high impact business problems with data science!
Learn time series analysis with Keras LSTM deep learning. Learn to predict sunspots ten years into the future with an LSTM deep learning model.
Anomaly detection algorithm using Anomolize: an open-source tidy anomaly detection algorithm that’s time-based.
Tonight at 7PM EST, we will be giving a LIVE #DataTalk on Using Machine Learning to Predict Employee Turnover. Employee turnover (attrition) is a major cost to an organization, and predicting turnover is at the forefront of needs of Human Resources (HR) in many organizations. Until now the mainstream approach has been to use logistic regression or survival curves to model employee attrition. However, with advancements in machine learning (ML), we can now get both better predictive performance and better explanations of what critical features are linked to employee attrition. We used two cutting edge techniques: the
h2o package’s new FREE automatic machine learning algorithm,
h2o.automl(), to develop a predictive model that is in the same ballpark as commercial products in terms of ML accuracy. Then we used the new
lime package that enables breakdown of complex, black-box machine learning models into variable importance plots. The talk will cover HR Analytics and how we used R, H2O, and LIME to predict employee turnover.
Predictive sales analytics to predict product backorders can increase sales and customer satisfaction. Using a Kaggle dataset, we use H2O AutoML predict backorders.
This post is the third and final part in the customer segmentation analysis. The first post focused on K-Means Clustering to segment customers into distinct groups based on purchasing habits. The second post takes a different approach, using Pricipal Component Analysis (PCA) to visualize customer groups. The third and final post performs Network Visualization (Graph Drawing) using the
networkD3 libraries as a method to visualize the customer connections and relationship strengths.
This post is the second part in the customer segmentation analysis. The first post focused on k-means clustering in
R to segment customers into distinct groups based on purchasing habits. This post takes a different approach, using Pricipal Component Analysis (PCA) in
R as a tool to view customer groups. Because PCA attacks the problem from a different angle than k-means, we can get different insights. We’ll compare both the k-means results with the PCA visualization. Let’s see what happens when we apply PCA.
In this machine learning with R tutorial, use k means clustering to segment customers into distinct groups based on purchasing habits.