Part 5 - Five Reasons to Learn H2O for High-Performance Machine Learning

Written by Matt Dancho on January 13, 2020



H2O is the scalable, open-source Machine Learning library that features AutoML. Here are 5 Reasons why it's an essential library for creating production data science code.

Articles in Series

  1. Part 1 - Five Full-Stack Data Science Technologies for 2020 (and Beyond)
  2. Part 2 - AWS Cloud
  3. Part 3 - Docker
  4. Part 4 - Git Version Control
  5. Part 5 - H2O Automated Machine Learning (AutoML) (You Are Here)
  6. Part 6 - R Shiny vs Tableau (3 Business Application Examples)
  7. [NEW BOOK] - The Shiny Production with AWS Book

Machine Learning
Up 440% vs 5-Years Ago

Before I jump into H2O, let’s first understand the demand for ML. The 5-year trends in Technology Job Postings show a 440% increase in “Machine Learning” skills being requested, capturing a 7% share in all technology-related job postings.

Not just “Data Scientist” Jobs… ALL Technology Jobs.

Today's Top Tech Skills

Top 20 Tech Skills 2014-2019
Source: Indeed Hiring Lab.

My point: Learning ML is essential

We can safely say that if you are in a technology job (or seeking one) then you need to learn how to apply AI and Machine Learning to solve business problems.

The problem: There are a dozen machine learning and deep learning frameworks - TensorFlow, Scikit-Learn, H2O, MLR3, PyTorch, … These all take time and effort to learn. So, which framework should you learn for business?

Why I use and recommend H2O: H2O has singlehandedly produced results in hours that would have otherwise taken days or weeks. I recommend learning H2O for applying Machine Learning to business data. I’ve been using H2O for several years now on both consulting projects and teaching it to clients. I have 5 reasons that explain how I have gotten this productivity enhancement using H2O on my business projects.

5-Reasons why I use and teach H2O

My Top 5-Reasons why I use and recommend learning H2O.

1. AutoML
Massive Productivity Booster

H2O AutoML automates the machine learning workflow, which includes automatic training and tuning of many models. This allows you to spend your time on more important tasks like feature engineering and understanding the problem.

H2O AutoML Hex Sticker

Me holding my H2O AutoML Hex Sticker
H2O is my go-to for production ML

2. Scalable on Local Compute
Distributed, In-Memory Processing speeds up computations

In-memory processing with fast serialization between nodes and clusters to support massive datasets enables problems that traditionally need bigger tools to be solved in-memory on your local computer.

3. Spark Integration & GPU Support
Big Data

The result is 100x faster training than traditional ML.

Sparkling Water

rsparkling - The Spark + H2O Big Data Solution

4. Best Algorithms, Optimized and Ensembled
Superior Performance

H2O’s algorithms are developed from the ground up for distributed computing. The most popular algorithms are incorporated including:

  • XGBoost
  • GBM
  • GLM
  • Random Forest
  • and more.

AutoML ensembles (combines) these models to provide superior performance.

5. Production Ready
Docker Containers

I love using Docker (learn why) + H2O to integrate AutoML models into Shiny Web Applications. H2O is built on (and depends on) Java, which traditionally creates overhead. But, with H2O Docker Images, it makes deploying H2O Models super easy with all necessary software inside the pre-built Docker Image.

H2O in Production

H2O can be integrated into Shiny Applications like this one - an Employee Attrition Prediction & Prevention App.

Employee Attrition App

Employee Attrition Prevention App
(Course coming to BSU soon)

H2O is the underlying prediction technology

You need to learn H2O AutoML to build the Employee Attrition Shiny App. H2O AutoML generates the “Employee Attrition Machine Learning Model” that scores the employees based on features like tenure, over time, stock option level, etc.

Employee Attrition Machine Learing Model

H2O AutoML - Employee Attrition Machine Learning Model
Built in DS4B 201-R Course

The H2O Course

If you are ready to learn H2O AutoML along with critical supporting technologies and data science workflow processes that follow an enterprise-grade system, then look no further: DS4B 201-R (Advanced Machine Learning & Business Consulting Course).

You follow a 10-week program for solving Business Problems with Data Science that teaches each of the tools needed to solve a $15M/year employee attrition problem using Machine Learning (H2O), Explainable ML (LIME), and Optimization (purrr).

DS4B 201-R - 10-Week Program

10-Week System for Solving Business Problems with Machine Learning
DS4B 201-R Course

In weeks 5 & 6, you learn H2O AutoML in-depth as part of your learning journey.

Learn H2O AutoML

Learn H2O AutoML - Weeks 5 and 6
DS4B 201-R Course

No Machine Learning Experience?
Don't worry. You're covered.

You are probably thinking, “How do I learn H2O if I have no Machine Learning background or coding experience?”

That’s why I created the 4-Course R-Track Program.

Go from beginner to expert in 6-months or less with no prior experience required.

You learn:

  • Data Science Foundations
  • Advanced Machine Learning & Business Consulting - H2O AutoML
  • Shiny Dashboards
  • Shiny Developer with AWS (NEW)



I look forward to providing you the best data science for business education.

Matt Dancho

Founder, Business Science

Lead Data Science Instructor, Business Science University