Predict employee churn with H2O machine learning and LIME. Use LIME (local Interpretable Model-agnostic Explanations) for model explanation in data science for business.
Data Scientists want to run successful projects. However, the sad fact is that most data science projects in organizations fail. It’s not because of lack of skill or knowledge. Data science projects need a clear and effective plan of attack to be successful. As data scientists, we study a wide array of tools: advanced algorithms, knowledge of statistics, and even programming skills.
Information Security (InfoSec) is critical to a business. For those new to InfoSec, it is the state of being protected against the unauthorized use of information, especially electronic data. A single malicious threat can cause massive damage to a firm, large or small. It’s this reason when I (Matt Dancho) saw Russ McRee’s article, “Anomaly Detection & Threat Hunting with Anomalize”, that I asked him to repost on the Business Science blog. In his article, Russ speaks to use of our new R package,
anomalize, as a way to detect threats (aka “threat hunting”). Russ is Group Program Manager of the Blue Team (the internal security team that defends against real attackers) for Microsoft’s Windows and Devices Group (WDG), now part of the Cloud and AI (C+AI) organization. He writes toolsmith, a monthly column for information security practitioners, and has written for other publications including Information Security, (IN)SECURE, SysAdmin, and Linux Magazine. The data Russ routinely deals with is massive in scale: He processes security event telemetry of all types (operating systems, network, applications, service layer) for all of Windows, Xbox, the Universal Store (transactions/purchases), and a few others. Billions of events in short order.
Algorithmic Trading: Using Quantopian's Zipline Python Library In R And Backtest Optimizations By Grid Search And Parallel Processing
We are ready to demo our new experimental package for Algorithmic Trading,
flyingfox, which uses
reticulate to to bring Quantopian’s open source algorithmic trading Python library,
Zipline, to R. The
flyingfox library is part of our NEW Business Science Labs innovation lab, which is dedicated to bringing experimental packages to our followers early on so they can test them out and let us know what they think before they make their way to CRAN. This article includes a long-form code tutorial on how to perform backtest optimizations of trading algorithms via grid search and parallel processing. In this article, we’ll show you how to use the combination of
tibbletime (time-based extension of
furrr (a parallel-processing compliment to
Zipline in R) to develop a backtested trading algorithm that can be optimized via grid search and parallel processing. We are releasing this article as a compliment to the R/Finance Conference presentation “A Time Series Platform For The Tidyverse”, which Matt will present on Saturday (June 2nd, 2018). Enjoy!
We are pleased to announce that our Data Science For Business (#DS4B) Course (HR 201) is OFFICIALLY OPEN! This course is for intermediate to advanced data scientists looking to apply H2O and LIME to a real-world binary classification problem in an organization: Employee Attrition. If you are interested applying data science for business in a real-world setting with advanced tools using a client-proven system that delivers ROI to the organization, then this is the course for you. For a limited time we are offering 15% off enrollment.
Last November, our data science team embarked on a journey to build the ultimate Data Science For Business (DS4B) learning platform. We saw a problem: A gap exists in organizations between the data science team and the business. To bridge this gap, we’ve created Business Science University, an online learning platform that teaches DS4B, using high-end machine learning algorithms, and organized in the fashion of an on-premise workshop but at a fraction of the price. I’m pleased to announce that, in 5 days, we will launch our first course, HR 201, as part of a 4-course Virtual Workshop. We crafted the Virtual Workshop after the data science program that we wished we had when we began data science (after we got through the basics of course!). Now, our data science process is being opened up to you. We guide you through our process for solving high impact business problems with data science!
Learn time series analysis with Keras LSTM deep learning. Learn to predict sunspots ten years into the future with an LSTM deep learning model.
Anomaly detection algorithm using Anomolize: an open-source tidy anomaly detection algorithm that’s time-based.
The R programming language is a powerful tool used in data science for business (DS4B), but R can be unnecessarily challenging to learn. We believe you can learn R quickly by taking an 80/20 approach to learning the most in-demand functions and packages. In this article, we seek to ultimately understand what techniques are most critical to a beginners success through analyzing a master data scientist’s code base. Half of this article covers the web scraping procedure (using
purrr) we used to collect our data (if new to R, you can skip this). The second half covers the insights gained from analyzing a master’s code base. In the next article in our series, we’ll develop a strategic learning plan built on our knowledge of the master. Last, there’s a bonus at the end of the article that shows how you can analyze your own code base using the new
fs package. Enjoy.
We’re happy to announce the third release of the
tibbletime package. This is a huge update, mainly due to a complete rewrite of the package. It contains a ton of new functionality and a number of breaking changes that existing users need to be aware of. All of the changes have been well documented in the NEWS file, but it’s worthwhile to touch on a few of them here and discuss the future of the package. We’re super excited so let’s check out the vision for
tibbletime and its new functionality!
Learn R for business - Data science for business is the future of business analytics. Here are 6 reasons why R is the right choice.