Data Scientists want to run successful projects. However, the sad fact is that most data science projects in organizations fail. It’s not because of lack of skill or knowledge. Data science projects need a clear and effective plan of attack to be successful. As data scientists, we study a wide array of tools: advanced algorithms, knowledge of statistics, and even programming skills. However, if you’re like us, you’ve had to learn how to successfully manage a project through trial and error. Fortunately, we’ve learned a lot over the past several years working with clients, and we’ve integrated the best resources into one streamlined framework to make your life easier: The Business Science Project Framework! In this article, we’ll cover the basics showing you how the BSPF helps as a guide for successful data science projects following a Customer Churn Problem example. Download the BPSF for FREE here.
In this article, you’ll learn:
A successful data science project doesn’t happen by accident. It takes:
Communication to effectively pitch a benefits to executives showing the results that relate to organizational goals
Business understanding, which only happens through interaction with the business stakeholders that are closest to the process or problem
Planning to align everyone involved with the project scope and plan
A checklist of proven actions that must be considered
The single most effective resource in our arsenal is a special project framework that we’ve designed, pooling from our consulting experience and combining with proven project management resources and philosophies. It’s called the Business Science Project Framework (BSPF).
Business Science Problem Framework (BSPF)
We think the BSPF is great as a systematic plan of attack, but it’s more than just us. It’s been validated based on client and, now, student feedback.
A Client And Student Proven System
At Business Science, we’ve been using the BSPF in the wild with clients for some time. We noticed that we were repeating many of the same activities as we were diving into various clients data science projects. We began keeping track of the steps we were following as we completed projects. Eventually, we formalized our process, calling it the Business Science Problem Framework. Clients loved the BSPF because it put a clear path forward. We loved it because it systemized our problem-solving method, making results more repeatable. Win-Win!
We’ve recently began teaching the BSPF in our Data Science For Business (DS4B 201 / HR 201) Course that is part of Business Science University. To summarize the student’s feedback in one word: AMAZING! The BSPF is one of the most loved aspects of the course because they “finally get a framework that they can follow tying data science to the business”. Here’s specifically what the two of the students have said:
“Data Science For Business (DS4B 201 / HR 201) is the first course that gives me a CLEAR FRAMEWORK to apply data science to Business Intelligence!”
~ Renaud Liber, Business/Data Analyst, Napoleon Games NV
“I took away a repeatable methodology and project structure that can be used to solve future business problems using data science”
~ David Curry, CTO, Africa Talent Management
If you want to solve a real-world churn problem applying the Business Science Problem Framework, take Data Science For Business (DS4B 201 / HR 201). You’ll gain experience implementing the framework under my guidance (Matt Dancho, Instructor and Founder of Business Science).
So, why use the Business Science Problem Framework to manage data science projects in businesses?
The Goal: Systematic Decision Making
The goal is simple: to implement data science in a way that enables decision making to follow a systematic process. We do this through the following equation relating measurement and analysis to improvement within a business context:
Equation for organizational improvement via systematic decision making
The combination of measurement and analysis are critical for businesses that want to improve. Measurement, or collecting information typically in the form of data, combined with analysis, or digesting the information into usable insights, will lead to improvement. This improvement is driven by Systematic Decision Making, or converting the learning that we achieve through measurement and analysis into processes that improve results.
The reality is that this equation is over simplified. Before we can implement Systematic Decision Making, we need to understand the business. And, before we can understand the business, we need to identify the business problem. Thinking about this further, achieving Systematic Decision Making follows a path that can be visualized as a pyramid built on identifying drivers and understanding the business.
Systematic Decision Making Pyramid
This process of identifying problems, then understanding the business, and then converting the learning into systematic decision making is what the BSPF helps us do!
The BSPF allows us to go from identifying business problems to making systematic decisions. You can download the BSPF for FREE here (under the “Resources” tab on the Business Science website).
Business Science Problem Framework (BSPF)
Combining Decision Making And Project Management Tools
The BSPF combines three tools making it both high level and detailed while being built on experience:
Business Science Experience: Our own internal learnings that have been incorporated into a course available for students that want to learn Data Science For Business (DS4B). From how to setup a data science project to how to show financial impact and size the business problem in terms executives need to see.
CRISP-DM: A high-level data mining project framework that generalizes well to any data science project, but lacks critical details for business problems.
Principles by Ray Dalio: A great book that touches on so many learnings that Ray Dalio, Founder of Bridgewatter Capital, has learned through successes and failures. The learning and business analysis philosophy has been incorporated into the BSPF.
In our Data Science For Business (DS4B 201 / HR 201) course, we show you how to implement the Business Science Problem Framework process, and you will leave with a template and the knowledge to make an impact on your organization by making the best use of your organization’s data.
~ Matt Dancho, Founder of Business Science // Instructor Of DS4B 201
The BSPF is split into a top and bottom section. The top half contains details of what to investigate while the bottom half contains high level stages of the project. The two sections are integrated, meaning they work together to provide a complete program for managing a data science project in a business context. Finally, the BSPF is built on experience, which means it’s validated.
BSPF Top Half
The BSPF has seven phases that are detailed with specific actions focused on understanding the problem and tying the results to Return On Investment (ROI), which is what the organization is keenly focused on:
- View The Business As A Machine
- Understand The Drivers
- Measure The Drivers
- Uncover Problems and Opportunities
- Encode Algorithms
- Measure Results
- Report Financial Impact
Top Half of BSPF
BSPF Bottom Half
The seven BSPF phases flow along the six phases of CRISP-DM that are high-level steps for any data science problem (beyond just business):
- Business Understanding
- Data Understanding
- Data Preparation
Bottom Half of BSPF
The beauty of the framework is that we get both high-level and detail in one package!
Built On Experience
Further, it’s built on experience and best practice of business analysis. Many of the philosophies come from the writings of Ray Dalio (Refer to Principles) along with our experience using the BSPF with clients. Beyond, high level and detailed, it’s proven!
Principles by Ray Dalio
Let’s go through an example: Customer Churn!
Problem: Customers Are Leaving
Customer churn refers to the act of customers leaving. These could be subscribers to a software or service or physically customers that shop at a store but elect to go somewhere else. Customer churn is a big problem! Often it goes undiagnosed because, individually, customers can be small, but when aggregated the effect of churn can be LARGE!
Phase 1: View The Business As A Machine
The first phase is viewing the business as a machine. This involves:
- Isolating business units
- Defining objectives
- Collecting outcomes
This involves breaking the business into internal parts (Sales, Manufacturing, Accounting, etc) and external parts (customers, suppliers) visualizing the connections.
Segmenting the business into components of the machine
We then need to visualize this interaction as a machine. The machine has goals and outcomes. The goals relate to business objectives. The outcomes are what actually happens. The machine has inner workings, which are driven by people and processes. The process defines the setup, and the people execute the plan.
Visualizing The Business As A Machine
For the example customer churn problem, we make the following assessment:
- Isolating business units: The interaction occurs between Sales and the Customer
- Defining objectives: Make customers happy
- Collecting outcomes: We are slowly losing customers. It’s lowering revenue for the organization $500K per year.
A key in aspect in this stage is understanding the size of the problem. If we are slowly losing customers, how is this impacting revenue? Is the problem a $100 problem, a $100,000 problem, or a $1,000,000 problem? If it’s less than $100K, it may not be worth your time. Further, if it’s over $1M, executives need to know this. Get them involved quickly!
Phase 2: Understand The Drivers
Next, we begin the process of understanding the drivers. The key steps are:
- Investigate if objectives are being met
- Synthesize outcomes
- Hypothesize drivers
The key in this phase is starting with the business objectives: Customer Satisfaction. When customers are happy, they keep coming back. Loss of customers generally indicates low satisfaction. This could be related to availability of products, poor customer service, or competition offering lower prices and/or better service or quality.
We need to synthesize outcomes. In our hypothetical example, customers are leaving for a competitor. In speaking with Sales, several customers have stated “Competition has faster delivery”. This is an indicator that lead time, or the ability to quickly service customers, is not competitive.
The final step is to hypothesize drivers. At this stage, it’s critical to meet with subject-matter experts (SMEs). These are people in the organization that are close to process and customers. We need to understand what are the potential drivers of lead time. Form a general equation that they help create.
Developing a hypothesis with Subject Matter Experts (SMEs)
For the example customer churn problem, we make the following assessment:
- Investigate if objectives are being met: No, customers are unhappy
- Synthesize outcomes: Competitor has a faster lead time
- Hypothesize drivers: Lead time is related to supplier delivery, inventory availability, personnel, and the scheduling process
A key in this stage is communication. As a data scientist, we know the tools really well. But, tools are only useful when we understand the drivers and the business problem fully. We need to educate ourselves by listening to SMEs.
Phase 3: Measure Drivers
Now we begin the process of measuring the drivers. The key steps are:
- Collect Data
- Develop KPIs
First, we need to collect data related to the high level drivers. This data could be stored in databases or it may need to be collected. We could collect competitor data, supplier data, sales data (Enterprise Resource Planning or ERP data), personnel data, and more.
Collecting Data From Internal and External Sources
After the data is collected, we need to develop key performance indicators (KPIs), which are quantifiable measures that the organization uses to gauge performance. For our customer churn example,
- Average Lead Time: The level is 2-weeks, which is based on customer feedback on competitors.
- Supplier Average Lead Time: The level is 3 weeks, which is based on feedback related to our competitor’s suppliers.
- Inventory Availability Percentage: The level of 90% is related based on where customers are experiencing unmet demand. This data comes from the ERP data comparing sale requests to product availability.
- Personnel Turnover: The level of 15% is based on the industry averages.
Developing Key Performance Indicators (KPIs)
Two key points in this step:
Collecting data takes time, but don’t let it stop you. It may require effort to set up processes to collect it, but developing strategic data sources becomes a competitive advantage over time.
Notice that KPIs requires knowledge of customers and industry for supplier, inventory, and turnover metrics. Realize that a wealth of data is available outside of your organization. Learn where this data resides, and it becomes a tremendous asset.
Phase 4: Uncover Problems And Opportunities
It’s time to uncover problems and opportunities. We need to:
- Evaluate performance vs KPIs
- Highlight potential problem areas
- Review the our project for what could have been missed
For our Customer Satisfaction example, we review the results from organizational findings against the KPIs to determine where the problem areas may exist. We extended the KPI table to include an Actual Value and Conclusion vs the KPI Level:
- Our average lead time is 6 weeks compared to the competitor average lead time of 2 weeks, which is the first order cause for the customer churn
- Our supplier average lead time is on par with our competitor’s, which does not necessitate a concern.
- Our inventory percentage availability is 80%, which is too low to maintain a high customer satisfaction level. This could be a reason that churn is increasing.
- Our personnel turnover in key areas is zero over the past 12 months, so no cause for concern.
Performance Vs KPIs
It’s a good time at this spot to read a quote from Thomas Edison:
“When you have exhausted all possibilities, remember this - you haven’t”
~ Thomas Edison
Remember to ask questions and constantly test your assumptions. Talk with SME’s to make sure they agree with your findings so far.
Phase 5: Encode Decision Making Algorithms
The key steps in this phase are:
- Develop algorithms to predict and explain the problem
- Optimize decisions to maximize profit
- Use recommendation algorithms to improve decision making
First, develop algorithms using advanced tools like H2O Automated Machine Learning and LIME for black-box model explanations.
H2O is a great option because of Automated Machine Learning (AutoML), which we teach in Data Science For Business (DS4B 201 / HR 201). Automated machine learning is fast and develops highly accurate models, saving the data scientist time.
LIME is used to explain deep learning, random forest, and stacked ensembles, which are traditionally unexplainable. We also teach LIME as part of Data Science For Business (DS4B 201 / HR 201).
Sample H2O + LIME Algorithm, Taught in DS4B 201
Next, optimize decision selections to maximize profit. Investigate threshold optimization for binary classification problems. Also, try sensitivity analysis to gauge which features have the largest effect on the profitability of the decisions.
Sample Threshold Optimization Visualization, Taught in DS4B 201
Last, build recommendation algorithms that incorporate feedback from SME’s along with the feature explanations from LIME (or similar feature explanation procedures).
Sample Recommendation Algorithm, Taught in DS4B 201
Once a systematic decision making algorithm is developed it’s time to deploy into the wild and measure results. Here’s an example of a web application built with
Shiny that is taught in our forthcoming course on building ML-powered web applications.
Shiny App That Predicts Attrition and Recommends Management Strategies, Taught in DS4B 301
Phase 6: Measure The Results
Once a model has been developed, evaluated, and is pushed to production (i.e. deployed), it’s time to measure the results. This requires you to:
- Capture outcomes
- Synthesize results
- Visualize outcomes over time
Once the algorithm is implemented via a web application or other decision making tool, the results must be measured to show progress. This requires more analysis. We capture outcomes over time and synthesize results. We are looking for progress. If we have experienced good outcomes, then we need to recognize what contributed to those good outcomes.
- Were the decision makers using the tools?
- Did they follow the systematic recommendation?
- Did the model accurately predict risk?
- Were the results poor? Same questions apply.
For our Customer Satisfaction example, we can make charts like these that expose the inventory availability and customer churn rate. We are seeing the inventory rise and the customer churn go down. These are good results!
Visualizing Results Over Time
Phase 7: Report Financial Impact
We’re now in the last phase, report financial impact. If we’ve done good data science, implemented systematic decision making, and iterated through problems, correcting along the way, we should now see positive results. Here are the steps:
- Measure actual results
- Tie to financial benefits
- Report financial benefit to key stakeholders
Once results are understood, we need to show the results as financial benefits. This not only justifies our existence, but shows the organization that it is improving. The key here is that results must be conveyed in terms on financial impact. It’s insufficient to say that we saved 75 employees or 75 customers. Rather, we need to say that the average cost of a lost employee or lost customer is $100,000 per year, so we just saved the organization $7.5M/year. Always report as a financial value.
Here’s an example of charts that now show the net profit and cumulative net profit over time. These are great charts to show executives, which conveys the success of the project and return on investment (ROI)!
Measuring Return On Investment (ROI)
If you’ve enjoyed learning about the Business Science Problem Framework and are ready to take the next step with a real-world employee attrition problem, then take our Data Science For Business (DS4B 201 / HR 201) Course! It has a student satisfaction rating of 9.1/10, and students are learning how to apply data science to business using R code, the Business Science Problem Framework, and more! Check out the trailer below.
Matt was recently on Episode 165 of the SuperDataScience Podcast. In his second appearance (also was on Episode 109 where he spoke about the transition to data science), he talks about giving back to the data science community if the form of education, open source software, and blogging!
If you are looking to take the next step and learn Data Science For Business (DS4B), Business Science University is for you! Our goal is to empower data scientists through teaching the tools and techniques we implement every day. You’ll learn:
- Data Science Framework: Business Science Problem Framework
- Tidy Eval
- H2O Automated Machine Learning
- LIME Feature Explanations
- Sensitivity Analysis
- Tying data science to financial improvement
All while solving a REAL WORLD CHURN PROBLEM: Employee Turnover!
Special Autographed “Deep Learning With R” Giveaway!!!
One lucky student that enrolls in June will receive an autographed copy of Deep Learning With R, signed by JJ Allaire, Founder of Rstudio and DLwR co-author.
Did you know that an organization that loses 200 high performing employees per year is essentially losing $15M/year in lost productivity? Many organizations don’t realize this because it’s an indirect cost. It goes unnoticed. What if you could use data science to predict and explain turnover in a way that managers could make better decisions and executives would see results? You will learn the tools to do so in our Virtual Workshop. Here’s an example of a Shiny app you will create.
Shiny App That Predicts Attrition and Recommends Management Strategies, Taught in HR 301
Our first Data Science For Business Virtual Workshop teaches you how to solve this employee attrition problem in four courses that are fully integrated:
- HR 201: Predicting Employee Attrition with
- HR 301 (Coming Soon): Building A
- HR 302 (EST Q4): Data Story Telling With
RMarkdownReports and Presentations
- HR 303 (EST Q4): Building An R Package For Your Organization,
The Virtual Workshop is intended for intermediate and advanced R users. It’s code intensive (like these articles), but also teaches you fundamentals of data science consulting including CRISP-DM and the Business Science Problem Framework. The content bridges the gap between data science and the business, making you even more effective and improving your organization in the process.
Don’t Miss A Beat
- Sign up for the Business Science blog to stay updated
- Enroll in Business Science University to learn how to solve real-world data science problems from Business Science
- Check out our Open Source Software
If you like our software (
sweep), our courses, and our company, you can connect with us: