How To Successfully Manage A Data Science Project: The Business Science Problem Framework

Written by Matt Dancho on June 19, 2018

Data Scientists want to run successful projects. However, the sad fact is that most data science projects in organizations fail. It’s not because of lack of skill or knowledge. Data science projects need a clear and effective plan of attack to be successful. As data scientists, we study a wide array of tools: advanced algorithms, knowledge of statistics, and even programming skills.

If you’re like us, you’ve had to learn how to successfully manage a project through trial and error. Fortunately, we’ve learned a lot over the past several years working with clients, and we’ve integrated the best resources into one streamlined framework to make your life easier: The Business Science Project Framework! In this article, we’ll cover the basics showing you how the BSPF helps as a guide for successful data science projects following a Customer Churn Problem example. Download the BPSF for FREE here.

Learning Tragectory

In this article, you’ll learn:

What It Takes To Succeed

A successful data science project doesn’t happen by accident. It takes:

  • Communication to effectively pitch a benefits to executives showing the results that relate to organizational goals

  • Business understanding, which only happens through interaction with the business stakeholders that are closest to the process or problem

  • Planning to align everyone involved with the project scope and plan

  • A checklist of proven actions that must be considered

The single most effective resource in our arsenal is a special project framework that we’ve designed, pooling from our consulting experience and combining with proven project management resources and philosophies. It’s called the Business Science Project Framework (BSPF).

Business Science Problem Framework

Business Science Problem Framework (BSPF)

We think the BSPF is great as a systematic plan of attack, but it’s more than just us. It’s been validated based on client and, now, student feedback.

The Goal: Systematic Decision Making

The goal is simple: to implement data science in a way that enables decision making to follow a systematic process. We do this through the following equation relating measurement and analysis to improvement within a business context:

\[Measurement + Analysis = Improvement\]

Equation for organizational improvement via systematic decision making

The combination of measurement and analysis are critical for businesses that want to improve. Measurement, or collecting information typically in the form of data, combined with analysis, or digesting the information into usable insights, will lead to improvement. This improvement is driven by Systematic Decision Making, or converting the learning that we achieve through measurement and analysis into processes that improve results.

The reality is that this equation is over simplified. Before we can implement Systematic Decision Making, we need to understand the business. And, before we can understand the business, we need to identify the business problem. Thinking about this further, achieving Systematic Decision Making follows a path that can be visualized as a pyramid built on identifying drivers and understanding the business.

Systematic Decision Making Pyramid

Systematic Decision Making Pyramid

This process of identifying problems, then understanding the business, and then converting the learning into systematic decision making is what the BSPF helps us do!

Get The BSPF

The BSPF allows us to go from identifying business problems to making systematic decisions. You can download the BSPF for FREE here (under the “Resources” tab on the Business Science website).

Business Science Problem Framework

Business Science Problem Framework (BSPF)

Combining Decision Making And Project Management Tools

The BSPF combines three tools making it both high level and detailed while being built on experience:

  • Business Science Experience: Our own internal learnings that have been incorporated into a course available for students that want to learn Data Science For Business (DS4B). From how to setup a data science project to how to show financial impact and size the business problem in terms executives need to see.

  • CRISP-DM: A high-level data mining project framework that generalizes well to any data science project, but lacks critical details for business problems.

  • Principles by Ray Dalio: A great book that touches on so many learnings that Ray Dalio, Founder of Bridgewatter Capital, has learned through successes and failures. The learning and business analysis philosophy has been incorporated into the BSPF.

In our Data Science For Business (DS4B 201 / HR 201) course, we show you how to implement the Business Science Problem Framework process, and you will leave with a template and the knowledge to make an impact on your organization by making the best use of your organization’s data.

  • Matt Dancho, Founder of Business Science // Instructor Of DS4B 201

How The BSPF Works

The BSPF is split into a top and bottom section. The top half contains details of what to investigate while the bottom half contains high level stages of the project. The two sections are integrated, meaning they work together to provide a complete program for managing a data science project in a business context. Finally, the BSPF is built on experience, which means it’s validated.

BSPF Top Half

The BSPF has seven phases that are detailed with specific actions focused on understanding the problem and tying the results to Return On Investment (ROI), which is what the organization is keenly focused on:

  1. View The Business As A Machine
  2. Understand The Drivers
  3. Measure The Drivers
  4. Uncover Problems and Opportunities
  5. Encode Algorithms
  6. Measure Results
  7. Report Financial Impact

BPSF Top Half

Top Half of BSPF

BSPF Bottom Half

The seven BSPF phases flow along the six phases of CRISP-DM that are high-level steps for any data science problem (beyond just business):

  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Modeling
  5. Evaluation
  6. Deployment

BPSF Top Half

Bottom Half of BSPF

The beauty of the framework is that we get both high-level and detail in one package!

Built On Experience

Further, it’s built on experience and best practice of business analysis. Many of the philosophies come from the writings of Ray Dalio (Refer to Principles) along with our experience using the BSPF with clients. Beyond, high level and detailed, it’s proven!

Principles by Ray Dalio

Principles by Ray Dalio

How To Use The BSPF: A Customer Churn Example

Let’s go through an example: Customer Churn!

Problem: Customers Are Leaving

Customer churn refers to the act of customers leaving. These could be subscribers to a software or service or physically customers that shop at a store but elect to go somewhere else. Customer churn is a big problem! Often it goes undiagnosed because, individually, customers can be small, but when aggregated the effect of churn can be LARGE!

Phase 1: View The Business As A Machine

The first phase is viewing the business as a machine. This involves:

  1. Isolating business units
  2. Defining objectives
  3. Collecting outcomes

This involves breaking the business into internal parts (Sales, Manufacturing, Accounting, etc) and external parts (customers, suppliers) visualizing the connections.

Business Components

Segmenting the business into components of the machine

We then need to visualize this interaction as a machine. The machine has goals and outcomes. The goals relate to business objectives. The outcomes are what actually happens. The machine has inner workings, which are driven by people and processes. The process defines the setup, and the people execute the plan.

Business Machine

Visualizing The Business As A Machine

For the example customer churn problem, we make the following assessment:

  1. Isolating business units: The interaction occurs between Sales and the Customer
  2. Defining objectives: Make customers happy
  3. Collecting outcomes: We are slowly losing customers. It’s lowering revenue for the organization $500K per year.

A key in aspect in this stage is understanding the size of the problem. If we are slowly losing customers, how is this impacting revenue? Is the problem a $100 problem, a $100,000 problem, or a $1,000,000 problem? If it’s less than $100K, it may not be worth your time. Further, if it’s over $1M, executives need to know this. Get them involved quickly!

Phase 2: Understand The Drivers

Next, we begin the process of understanding the drivers. The key steps are:

  1. Investigate if objectives are being met
  2. Synthesize outcomes
  3. Hypothesize drivers

The key in this phase is starting with the business objectives: Customer Satisfaction. When customers are happy, they keep coming back. Loss of customers generally indicates low satisfaction. This could be related to availability of products, poor customer service, or competition offering lower prices and/or better service or quality.

We need to synthesize outcomes. In our hypothetical example, customers are leaving for a competitor. In speaking with Sales, several customers have stated “Competition has faster delivery”. This is an indicator that lead time, or the ability to quickly service customers, is not competitive.

The final step is to hypothesize drivers. At this stage, it’s critical to meet with subject-matter experts (SMEs). These are people in the organization that are close to process and customers. We need to understand what are the potential drivers of lead time. Form a general equation that they help create.

\[LeadTime = f(SupplierDelivery, InventoryAvailability, Personnel, SchedulingProcess, ...)\]

Developing a hypothesis with Subject Matter Experts (SMEs)

For the example customer churn problem, we make the following assessment:

  1. Investigate if objectives are being met: No, customers are unhappy
  2. Synthesize outcomes: Competitor has a faster lead time
  3. Hypothesize drivers: Lead time is related to supplier delivery, inventory availability, personnel, and the scheduling process

A key in this stage is communication. As a data scientist, we know the tools really well. But, tools are only useful when we understand the drivers and the business problem fully. We need to educate ourselves by listening to SMEs.

Phase 3: Measure Drivers

Now we begin the process of measuring the drivers. The key steps are:

  1. Collect Data
  2. Develop KPIs

First, we need to collect data related to the high level drivers. This data could be stored in databases or it may need to be collected. We could collect competitor data, supplier data, sales data (Enterprise Resource Planning or ERP data), personnel data, and more.

Collect Data

Collecting Data From Internal and External Sources

After the data is collected, we need to develop key performance indicators (KPIs), which are quantifiable measures that the organization uses to gauge performance. For our customer churn example,

  • Average Lead Time: The level is 2-weeks, which is based on customer feedback on competitors.
  • Supplier Average Lead Time: The level is 3 weeks, which is based on feedback related to our competitor’s suppliers.
  • Inventory Availability Percentage: The level of 90% is related based on where customers are experiencing unmet demand. This data comes from the ERP data comparing sale requests to product availability.
  • Personnel Turnover: The level of 15% is based on the industry averages.

Developing KPIs

Developing Key Performance Indicators (KPIs)

Two key points in this step:

  1. Collecting data takes time, but don’t let it stop you. It may require effort to set up processes to collect it, but developing strategic data sources becomes a competitive advantage over time.

  2. Notice that KPIs requires knowledge of customers and industry for supplier, inventory, and turnover metrics. Realize that a wealth of data is available outside of your organization. Learn where this data resides, and it becomes a tremendous asset.

Phase 4: Uncover Problems And Opportunities

It’s time to uncover problems and opportunities. We need to:

  1. Evaluate performance vs KPIs
  2. Highlight potential problem areas
  3. Review the our project for what could have been missed

For our Customer Satisfaction example, we review the results from organizational findings against the KPIs to determine where the problem areas may exist. We extended the KPI table to include an Actual Value and Conclusion vs the KPI Level:

  • Our average lead time is 6 weeks compared to the competitor average lead time of 2 weeks, which is the first order cause for the customer churn
  • Our supplier average lead time is on par with our competitor’s, which does not necessitate a concern.
  • Our inventory percentage availability is 80%, which is too low to maintain a high customer satisfaction level. This could be a reason that churn is increasing.
  • Our personnel turnover in key areas is zero over the past 12 months, so no cause for concern.

Performance Vs KPIs

Performance Vs KPIs

It’s a good time at this spot to read a quote from Thomas Edison:

“When you have exhausted all possibilities, remember this - you haven’t”

~ Thomas Edison

Remember to ask questions and constantly test your assumptions. Talk with SME’s to make sure they agree with your findings so far.

Phase 5: Encode Decision Making Algorithms

The key steps in this phase are:

  1. Develop algorithms to predict and explain the problem
  2. Optimize decisions to maximize profit
  3. Use recommendation algorithms to improve decision making

First, develop algorithms using advanced tools like H2O Automated Machine Learning and LIME for black-box model explanations.

Algorithm Development

Sample H2O + LIME Algorithm, Taught in DS4B 201

Next, optimize decision selections to maximize profit. Investigate threshold optimization for binary classification problems. Also, try sensitivity analysis to gauge which features have the largest effect on the profitability of the decisions.

Threshold Optimization

Sample Threshold Optimization Visualization, Taught in DS4B 201

Last, build recommendation algorithms that incorporate feedback from SME’s along with the feature explanations from LIME (or similar feature explanation procedures).

Recommendation Algorithm

Sample Recommendation Algorithm, Taught in DS4B 201

Once a systematic decision making algorithm is developed it’s time to deploy into the wild and measure results. Here’s an example of a web application built with Shiny that is taught in our forthcoming course on building ML-powered web applications.

HR 202 Shiny Application: Employee Prediction

Shiny App That Predicts Attrition and Recommends Management Strategies, Taught in DS4B 202

Phase 6: Measure The Results

Once a model has been developed, evaluated, and is pushed to production (i.e. deployed), it’s time to measure the results. This requires you to:

  1. Capture outcomes
  2. Synthesize results
  3. Visualize outcomes over time

Once the algorithm is implemented via a web application or other decision making tool, the results must be measured to show progress. This requires more analysis. We capture outcomes over time and synthesize results. We are looking for progress. If we have experienced good outcomes, then we need to recognize what contributed to those good outcomes.

  • Were the decision makers using the tools?
  • Did they follow the systematic recommendation?
  • Did the model accurately predict risk?
  • Were the results poor? Same questions apply.

For our Customer Satisfaction example, we can make charts like these that expose the inventory availability and customer churn rate. We are seeing the inventory rise and the customer churn go down. These are good results!

Visualize Results

Visualizing Results Over Time

Phase 7: Report Financial Impact

We’re now in the last phase, report financial impact. If we’ve done good data science, implemented systematic decision making, and iterated through problems, correcting along the way, we should now see positive results. Here are the steps:

  1. Measure actual results
  2. Tie to financial benefits
  3. Report financial benefit to key stakeholders

Once results are understood, we need to show the results as financial benefits. This not only justifies our existence, but shows the organization that it is improving. The key here is that results must be conveyed in terms on financial impact. It’s insufficient to say that we saved 75 employees or 75 customers. Rather, we need to say that the average cost of a lost employee or lost customer is $100,000 per year, so we just saved the organization $7.5M/year. Always report as a financial value.

Here’s an example of charts that now show the net profit and cumulative net profit over time. These are great charts to show executives, which conveys the success of the project and return on investment (ROI)!

Visualize ROI

Measuring Return On Investment (ROI)

A Proven System

At Business Science, we’ve been using the BSPF in the wild with clients for some time. We were repeating many of the same activities as we were diving into data science projects. We began keeping track of the steps we were following as we completed projects. Eventually, we formalized our process, calling it the Business Science Problem Framework.

Clients love the BSPF because it put a clear path forward - they saw all of the steps required, where there input would be needed, and understood why the project would take several weeks to complete.

We love the BSPF because it systemized our problem-solving method, making results more repeatable. Win-Win!

We’ve recently began teaching the BSPF in our Data Science For Business (DS4B 201-R) Course that is part of Business Science University.

To summarize the student’s feedback in one word: AMAZING! The BSPF is one of the most loved aspects of the course because they “finally get a framework that they can follow tying data science to the business”. Here’s specifically what the two of the students have said:

“Data Science For Business (DS4B 201-R) is the first course that gives me a CLEAR FRAMEWORK to apply data science to Business Intelligence!”

Renaud Liber, Business/Data Analyst, Napoleon Games NV

“I took away a repeatable methodology and project structure that can be used to solve future business problems using data science”

David Curry, CTO, Africa Talent Management

Why do the students love it? One word…


Before we dive into the BSPF Framework, let us first explain the incredible results that this methodology has generated. At the end of the day, results are what the organization cares about. Let’s put the results first then.

A recent success story is that of Rodrigo Prado. Rodrigo is a high-end data science consultant and graduate of the prestigious University of Columbia Masters of Science in Applied Analytics. While the program was very good, Rodrigo left with a knowledge gap still present not fully enabling him to connect data science to the business.

While the program was very good, Rodrigo left with a knowledge gap still present not fully enabling him to connect data science to the business.

He read an article about the Business Science Problem Framework, and immediately signed up for our Data Science For Business With R (DS4B 201-R) course.

Through his company, Genesis Partners, Rodrigo has since implemented the BSPF on 3 projects. According to Rodrigo, the BSPF has cut his time-to-deliver data science projects in half!

Let’s think about this for a minute. Half. That’s 50% of the time it used to take to complete a project. This means Rodrigo just doubled his effeciency. If he was generating 10X ROI as a consultant, he’s now generating 20X ROI just by implementing our BSPF framework!

If interested, you can listen to his 2-minute testimonial.

Experience The Data Science Course That Cut Rodrigo's Time-To-Deliver In Half

If you're interested in learning how to apply critical thinking and BSPF while solving a real-world business problem following an end-to-end data science project, check out Data Science For Business With R (DS4B 201-R). Over the course of 10 weeks you will solve an end-to-end Employee Churn data science project following our systematic Business Science Problem Framework.

Data Science For Business With R

Start Learning Today!

Next Steps

If you’ve enjoyed learning about the Business Science Problem Framework and are ready to take the next step with a real-world employee attrition problem, then take our Data Science For Business (DS4B 201-R) Course! It has a student satisfaction rating of 9.1/10, and students are learning how to apply data science to business using R code, the Business Science Problem Framework, and more! Check out the trailer below.

If you're interested in learning how to apply critical thinking and BSPF while solving a real-world business problem following an end-to-end data science project, check out Data Science For Business With R (DS4B 201-R). Over the course of 10 weeks you will solve an end-to-end Employee Churn data science project following our systematic Business Science Problem Framework.

Data Science For Business With R

Start Learning Today!