How To Successfully Manage A Data Science Project For Businesses: The Business Science Problem Framework

    Written by Matt Dancho on June 19, 2018

    Data Scientists want to run successful projects. However, the sad fact is that most data science projects in organizations fail. It’s not because of lack of skill or knowledge. Data science projects need a clear and effective plan of attack to be successful. As data scientists, we study a wide array of tools: advanced algorithms, knowledge of statistics, and even programming skills. However, if you’re like us, you’ve had to learn how to successfully manage a project through trial and error. Fortunately, we’ve learned a lot over the past several years working with clients, and we’ve integrated the best resources into one streamlined framework to make your life easier: The Business Science Project Framework! In this article, we’ll cover the basics showing you how the BSPF helps as a guide for successful data science projects following a Customer Churn Problem example. Download the BPSF for FREE here.

    Read More...

    Information Security: Anomaly Detection and Threat Hunting with Anomalize

    Written by Russ McRee on June 10, 2018

    Information Security (InfoSec) is critical to a business. For those new to InfoSec, it is the state of being protected against the unauthorized use of information, especially electronic data. A single malicious threat can cause massive damage to a firm, large or small. It’s this reason when I (Matt Dancho) saw Russ McRee’s article, “Anomaly Detection & Threat Hunting with Anomalize”, that I asked him to repost on the Business Science blog. In his article, Russ speaks to use of our new R package, anomalize, as a way to detect threats (aka “threat hunting”). Russ is Group Program Manager of the Blue Team (the internal security team that defends against real attackers) for Microsoft’s Windows and Devices Group (WDG), now part of the Cloud and AI (C+AI) organization. He writes toolsmith, a monthly column for information security practitioners, and has written for other publications including Information Security, (IN)SECURE, SysAdmin, and Linux Magazine. The data Russ routinely deals with is massive in scale: He processes security event telemetry of all types (operating systems, network, applications, service layer) for all of Windows, Xbox, the Universal Store (transactions/purchases), and a few others. Billions of events in short order.

    Read More...

    Algorithmic Trading: Using Quantopian's Zipline Python Library In R And Backtest Optimizations By Grid Search And Parallel Processing

    Written by Davis Vaughan and Matt Dancho on May 31, 2018

    We are ready to demo our new experimental package for Algorithmic Trading, flyingfox, which uses reticulate to to bring Quantopian’s open source algorithmic trading Python library, Zipline, to R. The flyingfox library is part of our NEW Business Science Labs innovation lab, which is dedicated to bringing experimental packages to our followers early on so they can test them out and let us know what they think before they make their way to CRAN. This article includes a long-form code tutorial on how to perform backtest optimizations of trading algorithms via grid search and parallel processing. In this article, we’ll show you how to use the combination of tibbletime (time-based extension of tibble) + furrr (a parallel-processing compliment to purrr) + flyingfox (Zipline in R) to develop a backtested trading algorithm that can be optimized via grid search and parallel processing. We are releasing this article as a compliment to the R/Finance Conference presentation “A Time Series Platform For The Tidyverse”, which Matt will present on Saturday (June 2nd, 2018). Enjoy!

    Read More...

    Data Science For Business: Course Now Open!

    Written by Matt Dancho on April 30, 2018

    We are pleased to announce that our Data Science For Business (#DS4B) Course (HR 201) is OFFICIALLY OPEN! This course is for intermediate to advanced data scientists looking to apply H2O and LIME to a real-world binary classification problem in an organization: Employee Attrition. If you are interested applying data science for business in a real-world setting with advanced tools using a client-proven system that delivers ROI to the organization, then this is the course for you. For a limited time we are offering 15% off enrollment.

    Read More...

    Data Science For Business: Course Launch In 5 Days!!!

    Written by Matt Dancho on April 25, 2018

    Last November, our data science team embarked on a journey to build the ultimate Data Science For Business (DS4B) learning platform. We saw a problem: A gap exists in organizations between the data science team and the business. To bridge this gap, we’ve created Business Science University, an online learning platform that teaches DS4B, using high-end machine learning algorithms, and organized in the fashion of an on-premise workshop but at a fraction of the price. I’m pleased to announce that, in 5 days, we will launch our first course, HR 201, as part of a 4-course Virtual Workshop. We crafted the Virtual Workshop after the data science program that we wished we had when we began data science (after we got through the basics of course!). Now, our data science process is being opened up to you. We guide you through our process for solving high impact business problems with data science!

    Read More...

    How To Learn R, Part 1: Learn From A Master Data Scientist's Code

    Written by Matt Dancho on March 3, 2018

    The R programming language is a powerful tool used in data science for business (DS4B), but R can be unnecessarily challenging to learn. We believe you can learn R quickly by taking an 80/20 approach to learning the most in-demand functions and packages. In this article, we seek to ultimately understand what techniques are most critical to a beginners success through analyzing a master data scientist’s code base. Half of this article covers the web scraping procedure (using rvest and purrr) we used to collect our data (if new to R, you can skip this). The second half covers the insights gained from analyzing a master’s code base. In the next article in our series, we’ll develop a strategic learning plan built on our knowledge of the master. Last, there’s a bonus at the end of the article that shows how you can analyze your own code base using the new fs package. Enjoy.

    Read More...

    The Tidy Time Series Platform: tibbletime 0.1.0

    Written by Davis Vaughan on January 4, 2018

    We’re happy to announce the third release of the tibbletime package. This is a huge update, mainly due to a complete rewrite of the package. It contains a ton of new functionality and a number of breaking changes that existing users need to be aware of. All of the changes have been well documented in the NEWS file, but it’s worthwhile to touch on a few of them here and discuss the future of the package. We’re super excited so let’s check out the vision for tibbletime and its new functionality!

    Read More...

    EARL Presentation on HR Analytics: Using ML to Predict Employee Turnover

    Written by Matt Dancho on November 6, 2017

    The EARL Boston 2017 conference was held November 1 - 3 in Boston, Mass. There were some excellent presentations illustrating how R is being embraced in enterprises, especially in the financial and pharmaceutical industries. Matt Dancho, founder of Business Science, presented on using machine learning to predict and explain employee turnover, a hot topic in HR! We’ve uploaded the HR Analytics presentation to YouTube. Check out the presentation, and don’t forget to follow us on social media to stay up on the latest Business Science news, events and information!

    Read More...