-->

Agile Data Science

Data Science is an amazing field of research that is under active development both from the academia and the industry. One of the saddest facts in the real-world is that most data science projects in organizations fail. Here I’ll present a new iteration of an agile framework called Business Science Problem Framework (Download PDF here) to implement data science in a way that enables decision making to follow a systematic process that connects the models you create to Return On Investment (ROI) and show the value that your improvements bring to the business. The end result is that the BSPF is an agile framework, and we are working to develop a new visualization (BSPF 2.0) that conveys this agility.

Business Science Problem Framework

THE PROBLEM DEFINITION

Doing data science for business is not easy for several reasons. One of them is that most people have a partial definition, or description, of what data science actually is and what it means to be a good data scientist for solving real problems.

DEFINING DATA SCIENCE

Because of that I’ll start this article with my definition (or description) of how data science should be defined for a business:

Data science is the resolution to business problems through mathematics, programming and the scientific method that involves the creation of hypotheses, experiments and tests through the analysis of data and the generation of predictive models. It is responsible for transforming these problems into well-posed questions that can also respond to the initial hypothesis in a creative way finding the optimal threshold that maximizes the expected profit or savings. It must also include the effective communication of the results obtained and how the solution adds value to the business.

I’ll explain my definition step by step below so stick with me.

Modeling is the process of understanding the “reality”, the world around us, but creating a higher level prototype that will describe the things we are seeing, hearing and feeling, but it’s a representative thing, not the “actual” or “real” thing. This is what we actually do in science and data science is no exception.

DEFINING DATA SCIENCE

Because of that I’ll start this article with my definition (or description) of how data science should be defined for a business:

Data science is the resolution to business problems through mathematics, programming and the scientific method that involves the creation of hypotheses, experiments and tests through the analysis of data and the generation of predictive models. It is responsible for transforming these problems into well-posed questions that can also respond to the initial hypothesis in a creative way finding the optimal threshold that maximizes the expected profit or savings. It must also include the effective communication of the results obtained and how the solution adds value to the business.

I’ll explain my definition step by step below so stick with me.

Modeling is the process of understanding the “reality”, the world around us, but creating a higher level prototype that will describe the things we are seeing, hearing and feeling, but it’s a representative thing, not the “actual” or “real” thing. This is what we actually do in science and data science is no exception.

What I’m saying here is that data science is very much linked to the business, but it is a science in the end. A lot of people can disagree with me in the part that data science is a science. But keep your mind open and read this carefully. I think it could be very useful that we define data science as a science because, if that’s the case, every project in data science should be at least:

  • Reproducible: Necessary for making easy to test others work and analysis.
  • Fallible: Data Science and Science are not looking for the truth, they look for knowledge, so every project can be substituted or improved in the future, no solution is the ultimate solution.
  • Collaborative: The data scientist doesn’t exists alone, he needs a team (even a virtual team, like collaborating in an open source project), this team will make things possible for creating intelligence and solutions. Collaboration is a big part of science, and data science should not be an exception.
  • Creative: Most of what data scientists do is new research, new approaches or takes on different solutions, so their environment should be very creative and easy to work. Creativity is crucial in science, is the only way we can find solutions to hard and complex problems.
  • Compliant to regulations: Right now there are a lot of regulations on science, not that much on data science, but there will be more in the future. It is important that the projects we are building can be aware of these different types of regulations so we can create a clean and acceptable solution to the problems.

If we don’t follow those basic principles it would be very hard to conduct a proper data science practice. Data science should be implemented in a way that enables decision making to follow a systematic process.

.
.
.

Learn More