The goal is simple: to educate and empower future data scientists so they can help organizations gain data-driven results. This is why it was a no-brainer when the opportunity came up for Business Science to partner with Method Data Science, the go-to data science accelerator for aspiring data scientists. Now Method Data Scientists will get exclusive lectures from Business Science Instructors and have discounted access to Business Science University, the revolutionary online education platform for learning data science for business, along with instructor trainings as part of the Method Data Science accelerator program. This is big news for current and future data scientists seeking to gain real-world experience while learning how to deliver results to organizations!
I’m pleased to announce that we released brand new content for our flagship course, Data Science For Business (DS4B 201). Over the course of 10 weeks, the DS4B 201 course teaches students and end-to-end data science project solving Employee Churn with R, H2O, & LIME. The latest content is focused on transitioning from modeling Employee Churn with H2O and LIME to evaluating our binary classification model using Return-On-Investment (ROI), thus delivering business value. We do this through application of a special tool called the Expected Value Framework. Let’s learn about the new course content available now in DS4B 201, Chapter 7, which covers the Expected Value Framework for modeling churn with H2O!
One of the most difficult and most critical parts of implementing data science in business is quantifying the return-on-investment or ROI. As a data scientist in an organization, it’s of chief importance to show the value that your improvements bring. In this article, we highlight three reasons you need to learn the Expected Value Framework, a framework that connects the machine learning classification model to ROI. Further, we’ll point you to a new video we released on the Expected Value Framework: Modeling Employee Churn With H2O that was recently taught as part of our flagship course: Data Science For Business (DS4B 201). The video serves as an overview of the steps involved in calculating ROI from reducing employee churn with H2O, tying the key H2O functions to the process. Last, we’ll go over some Expected Value Framework FAQ’s that are commonly asked in relation to applying Expected Value to machine learning classification problems in business.
One of the ways Deep Learning can be used in business is to improve the accuracy of time series forecasts (prediction). We recently showed how a Long Short Term Memory (LSTM) Models developed with the Keras library in R could be used to take advantage of autocorrelation to predict the next 10 years of monthly Sunspots (a solar phenomenon that’s tracked by NASA). In this article, we teamed up with RStudio to take another look at the Sunspots data set, this time implementing some really advanced Deep Learning functionality available with TensorFlow for R. Sigrid Keydana, TF Developer Advocate at RStudio, put together an amazing Deep Learning tutorial using
keras for implementing Keras in R and
tfruns, a suite of tools for trackingtracking, visualizing, and managing TensorFlow training runs and experiments from R. Sounds amazing, right? It is! Let’s get started with this Deep Learning Tutorial!
Data science tools are getting better and better, which is improving the predictive performance of machine learning models in business. With new, high-performance tools like, H2O for automated machine learning and Keras for deep learning, the performance of models are increasing tremendously. There’s one catch: Complex models are unexplainable… that is until LIME came along! LIME, which stands for Local Interpretable Model-agnostic Explanations, has opened the doors to black-box (complex, high-performance, but unexplainable) models in business applications! Explanations are MORE CRITICAL to the business than PERFORMANCE. Think about it. What good is a high performance model that predicts employee attrition if we can’t tell what features are causing people to quit? We need explanations to improve business decision making. Not just performance.
How To Successfully Manage A Data Science Project For Businesses: The Business Science Problem Framework
Data Scientists want to run successful projects. However, the sad fact is that most data science projects in organizations fail. It’s not because of lack of skill or knowledge. Data science projects need a clear and effective plan of attack to be successful. As data scientists, we study a wide array of tools: advanced algorithms, knowledge of statistics, and even programming skills. However, if you’re like us, you’ve had to learn how to successfully manage a project through trial and error. Fortunately, we’ve learned a lot over the past several years working with clients, and we’ve integrated the best resources into one streamlined framework to make your life easier: The Business Science Project Framework! In this article, we’ll cover the basics showing you how the BSPF helps as a guide for successful data science projects following a Customer Churn Problem example. Download the BPSF for FREE here.
Information Security (InfoSec) is critical to a business. For those new to InfoSec, it is the state of being protected against the unauthorized use of information, especially electronic data. A single malicious threat can cause massive damage to a firm, large or small. It’s this reason when I (Matt Dancho) saw Russ McRee’s article, “Anomaly Detection & Threat Hunting with Anomalize”, that I asked him to repost on the Business Science blog. In his article, Russ speaks to use of our new R package,
anomalize, as a way to detect threats (aka “threat hunting”). Russ is Group Program Manager of the Blue Team (the internal security team that defends against real attackers) for Microsoft’s Windows and Devices Group (WDG), now part of the Cloud and AI (C+AI) organization. He writes toolsmith, a monthly column for information security practitioners, and has written for other publications including Information Security, (IN)SECURE, SysAdmin, and Linux Magazine. The data Russ routinely deals with is massive in scale: He processes security event telemetry of all types (operating systems, network, applications, service layer) for all of Windows, Xbox, the Universal Store (transactions/purchases), and a few others. Billions of events in short order.
Algorithmic Trading: Using Quantopian's Zipline Python Library In R And Backtest Optimizations By Grid Search And Parallel Processing
We are ready to demo our new experimental package for Algorithmic Trading,
flyingfox, which uses
reticulate to to bring Quantopian’s open source algorithmic trading Python library,
Zipline, to R. The
flyingfox library is part of our NEW Business Science Labs innovation lab, which is dedicated to bringing experimental packages to our followers early on so they can test them out and let us know what they think before they make their way to CRAN. This article includes a long-form code tutorial on how to perform backtest optimizations of trading algorithms via grid search and parallel processing. In this article, we’ll show you how to use the combination of
tibbletime (time-based extension of
furrr (a parallel-processing compliment to
Zipline in R) to develop a backtested trading algorithm that can be optimized via grid search and parallel processing. We are releasing this article as a compliment to the R/Finance Conference presentation “A Time Series Platform For The Tidyverse”, which Matt will present on Saturday (June 2nd, 2018). Enjoy!