Last November, our data science team embarked on a journey to build the ultimate Data Science For Business (DS4B) learning platform. We saw a problem: A gap exists in organizations between the data science team and the business. To bridge this gap, we’ve created Business Science University, an online learning platform that teaches DS4B, using high-end machine learning algorithms, and organized in the fashion of an on-premise workshop but at a fraction of the price. I’m pleased to announce that, in 5 days, we will launch our first course, HR 201, as part of a 4-course Virtual Workshop. We crafted the Virtual Workshop after the data science program that we wished we had when we began data science (after we got through the basics of course!). Now, our data science process is being opened up to you. We guide you through our process for solving high impact business problems with data science!

Highlights

  • A major benefit to the Virtual Workshop is that: We teach our internally developed systematic process, the Business Science Problem Framework (BSPF). We use this process to solve high impact problems, tying data science to financial benefit. Below is the BSPF, which is one of the tools that has been instrumental to our success. In Data Science For Business (HR 201), we follow the BSPF throughout the course, showing you how to apply the framework to a data science project.

Business Science Problem Framework

  • Another benefit is that you get to see our process for dissecting and analyzing difficult problems. We show you how to tie financial impact to the problem, which is critical in gaining organizational acceptance of a data science project.

  • Yet another benefit is you will learn how to code within the tidyverse, and specifically using Tidy Eval for programming with dplyr and other tidyverse packages.

  • And finally, one more benefit is you will spend a sizable chunk of time using: tidyverse, h2o, lime, recipes, GGally, skimr, and more!

The Course Overview touches on the content. Take a look and let us know what you think!

Course Overview

We show you how we use data science to solve high impact problems using proven methodologies and tying data science to financial benefit to the organization.

Data Science For Business (HR 201) is the first course in a 4-part Virtual Workshop that focuses on a $15M/year problem1 that’s hidden from the organization: Employee Turnover. We use a real-world problem to show you how tools like the Business Science Problem Framework and advanced Machine learning algorithms like H2O and LIME can solve this problem, saving the organization millions in the process. Just think, a 10% reduction could save $1.5M/year. That’s the power of data science!

Data Science For Business, HR 201

Chapter 0: Getting Started

  • Data Science Project Setup
  • The True Cost of Employee Attrition
  • What Tools Are in Our Toolbox?
  • Frameworks

In this chapter, we introduce you to our systematic process using the Business Science Problem Framework (BSPF), which augments CRISP-DM. The BSPF focuses on problem understanding and business outcomes on a detailed level whereas CRISP-DM contains the tools necessary for high-level data science project management. Combined, they create one of the tools that has been instrumental to our success.

Business Science Problem Framework

Business Science Problem Framework

Chapter 1: Business Understanding

  • Problem Understanding With BSPF
  • Streamlining The Attrition Code Workflow
  • Visualizing Attrition with ggplot2
  • Making A Custom Plotting Function: plot_attrition()
  • Challenge 1: Cost Of Attrition

This chapter kicks off CRISP-DM Stage 1 along with BSPF Stages 1-4. You will understand the business problem assigning a financial cost to employee turnover. We develop custom functions to enable visualizing attrition cost by department and job role. These functions are later developed into an R package, tidyattrition, as part of HR 303. We cap it off by developing a custom plotting function, plot_attrition(), that generates an impactful visualization for executives to see the value of your data science project.

Business Science Problem Framework

Visualizing Attrition Cost

Chapter 2: Data Understanding

  • EDA Part 1: Exploring Data By Data Type With skimr
  • EDA Part 2: Visualizing Feature-Target Interactions with GGally
  • Challenge 2: Assessing Feature Pairs

In this chapter, we focus on two methods of exploratory data analysis (EDA) to gain a thorough understanding of the features. First, we tackle our problem by data type with skimr, separating categorical data from numeric. Second, we visualize interactions using GGally.

Chapter 3: Data Preparation

  • Data Preparation For People (Humans)
  • Data Preparation For Machines With recipes

Next, we process the data for both people and machines. We make extensive use of the recipes package to properly transform data for a pre-modeling Correlation Analysis.

Chapter 4: Automated Machine Learning With H2O

  • Building A Classifier With h2o Automated Machine Learning
  • Inspecting the H2O Leaderboard
  • Building A Custom Leaderboard Plotting Function: plot_h2o_leaderboard()
  • Extracting Models
  • Making Predictions

Building a high accuracy model is the goal with this stage. We show how to run h2o automated machine learning. We also detail how to build a custom plotting function, plot_h2o_leaderboard() to visualize the best models and select them for work on a hold out (testing) set.

Business Science Problem Framework

Custom H2O Leaderboard Visualization

Chapter 5: Assessing H2O Performance

  • Classifier Summary Metrics
  • Precision & Recall: Adjusting The Classifier Threshold
  • Classifier Gain and Lift: Charts For Exec’s
  • Visualizing Performance
  • Making A Custom H2O Performance Plot: plot_h2o_performance()

In this chapter, we show you how to assess performance and visualize model quality in a way that executives and other business decision makers understand.

Chapter 6: Explaining Black-Box Models With LIME

  • Using lime For Local Model Explanations
  • Making An Explainer
  • Explaining Multiple Cases

We use lime to explain the black-box classification model showing which features drive whether the employee stays or leaves.

Business Science Problem Framework

LIME Feature Explanation Visualization

Chapter 7: Recommendation Algorithm

Finally, we put our data science investigative skills to use developing a recommendation algorithm that helps managers and executives make better decisions to prevent employee turnover. This recommendation algorithm is used in HR 301 to build a Machine-Learning powered shiny Web Application that can be deployed to executives and managers.

HR 301 App - Management Strategies

HR 301 Shiny App: Management Strategies

HR 301 App - LIME Feature Importance

HR 301 Shiny App: Attrition Risk

Timing

The HR 201 course will be opened on Monday (4/30). A special offer will be provided to those that enroll in BSU early. The course will not be visible until Monday when it’s released.

What You Need

All you need is a basic proficiency in R programming. A basic (novice) knowledge of R, dplyr, and ggplot2 is our expectation. We’ll take care of the rest. If you are unsure, there is a proficiency quiz to check your baseline. Also, there’s a 30-day money-back guarantee if the course is too difficult or if you are not completely satisfied.

Education Assistance

Many employers offer education assistance to cover the cost of courses. Begin discussions with your employer immediately if this is available to you and you are interested in this course. They will benefit BIG TIME from you taking this course. The special offer we send out is available for a limited time only!

Enroll Now

Enrollment in BSU is open already. Enroll now to take advantage of a special offer. The course will open on Monday, and I will send an announcement to those that are enrolled in BSU along with the special offer. Time is limited.

About Business Science

Business Science specializes in “ROI-driven data science”. We offer training, education, coding expertise, and data science consulting related to business and finance. Our latest creation is Business Science University, a Virtual Workshop that is self-paced and teaches you our data science process! In addition, we deliver about 80% of our effort into the open source data science community in the form of software and our Business Science blog. Visit Business Science on the web or contact us to learn more!

Don’t Miss A Beat

Connect With Business Science

If you like our software (anomalize, tidyquant, tibbletime, timetk, and sweep), our courses, and our company, you can connect with us:

Footnotes

  1. An organization that loses 200+ high performers per year can lose an estimated $15M/year in hidden costs primarily associated with productivity. We show you how to calculate this cost in Chapter 1: Business Understanding