How To Learn R, Part 1: Learn From A Master Data Scientist's Code

    Written by Matt Dancho on March 3, 2018

    The R programming language is a powerful tool used in data science for business (DS4B), but R can be unnecessarily challenging to learn. We believe you can learn R quickly by taking an 80/20 approach to learning the most in-demand functions and packages. In this article, we seek to ultimately understand what techniques are most critical to a beginners success through analyzing a master data scientist’s code base. Half of this article covers the web scraping procedure (using rvest and purrr) we used to collect our data (if new to R, you can skip this). The second half covers the insights gained from analyzing a master’s code base. In the next article in our series, we’ll develop a strategic learning plan built on our knowledge of the master. Last, there’s a bonus at the end of the article that shows how you can analyze your own code base using the new fs package. Enjoy.

    Read More...

    The Tidy Time Series Platform: tibbletime 0.1.0

    Written by Davis Vaughan on January 4, 2018

    We’re happy to announce the third release of the tibbletime package. This is a huge update, mainly due to a complete rewrite of the package. It contains a ton of new functionality and a number of breaking changes that existing users need to be aware of. All of the changes have been well documented in the NEWS file, but it’s worthwhile to touch on a few of them here and discuss the future of the package. We’re super excited so let’s check out the vision for tibbletime and its new functionality!

    Read More...

    EARL Presentation on HR Analytics: Using ML to Predict Employee Turnover

    Written by Matt Dancho on November 6, 2017

    The EARL Boston 2017 conference was held November 1 - 3 in Boston, Mass. There were some excellent presentations illustrating how R is being embraced in enterprises, especially in the financial and pharmaceutical industries. Matt Dancho, founder of Business Science, presented on using machine learning to predict and explain employee turnover, a hot topic in HR! We’ve uploaded the HR Analytics presentation to YouTube. Check out the presentation, and don’t forget to follow us on social media to stay up on the latest Business Science news, events and information!

    Read More...

    Demo Week: Tidy Time Series Analysis with tibbletime

    Written by Matt Dancho on October 26, 2017

    We’re into the fourth day of Business Science Demo Week. We have a really cool one in store today: tibbletime, which uses a new tbl_time class that is time-aware!! For those that may have missed it, every day this week we are demo-ing an R package: tidyquant (Monday), timetk (Tuesday), sweep (Wednesday), tibbletime (Thursday) and h2o (Friday)! That’s five packages in five days! We’ll give you intel on what you need to know about these packages to go from zero to hero. Let’s take tibbletime for a spin!

    Read More...

    LIVE DataTalk on HR Analytics Tonight: Using Machine Learning to Predict Employee Turnover

    Written by Matt Dancho on October 26, 2017

    Tonight at 7PM EST, we will be giving a LIVE #DataTalk on Using Machine Learning to Predict Employee Turnover. Employee turnover (attrition) is a major cost to an organization, and predicting turnover is at the forefront of needs of Human Resources (HR) in many organizations. Until now the mainstream approach has been to use logistic regression or survival curves to model employee attrition. However, with advancements in machine learning (ML), we can now get both better predictive performance and better explanations of what critical features are linked to employee attrition. We used two cutting edge techniques: the h2o package’s new FREE automatic machine learning algorithm, h2o.automl(), to develop a predictive model that is in the same ballpark as commercial products in terms of ML accuracy. Then we used the new lime package that enables breakdown of complex, black-box machine learning models into variable importance plots. The talk will cover HR Analytics and how we used R, H2O, and LIME to predict employee turnover.

    Read More...

    Demo Week: Tidy Forecasting with sweep

    Written by Matt Dancho on October 25, 2017

    We’re into the third day of Business Science Demo Week. Hopefully by now you’re getting a taste of some interesting and useful packages. For those that may have missed it, every day this week we are demo-ing an R package: tidyquant (Monday), timetk (Tuesday), sweep (Wednesday), tibbletime (Thursday) and h2o (Friday)! That’s five packages in five days! We’ll give you intel on what you need to know about these packages to go from zero to hero. Today is sweep, which has broom-style tidiers for forecasting. Let’s get going!

    Read More...

    Demo Week: Time Series Machine Learning with timetk

    Written by Matt Dancho on October 24, 2017

    We’re into the second day of Business Science Demo Week. What’s demo week? Every day this week we are demoing an R package: tidyquant (Monday), timetk (Tuesday), sweep (Wednesday), tibbletime (Thursday) and h2o (Friday)! That’s five packages in five days! We’ll give you intel on what you need to know about these packages to go from zero to hero. Second up is timetk, your toolkit for time series in R. Here we go!

    Read More...