Part 5 - Five Reasons to Learn H2O for High-Performance Machine Learning
Written by Matt Dancho
H2O is the scalable, open-source Machine Learning library that features
AutoML. Here are 5 Reasons why it's an essential library for creating production data science code.
Articles in Series
- Part 1 - Five Full-Stack Data Science Technologies for 2020 (and Beyond)
- Part 2 - AWS Cloud
- Part 3 - Docker
- Part 4 - Git Version Control
- Part 5 - H2O Automated Machine Learning (AutoML) (You Are Here)
- Part 6 - R Shiny vs Tableau (3 Business Application Examples)
- [NEW BOOK] - The Shiny Production with AWS Book
Up 440% vs 5-Years Ago
Before I jump into H2O, let’s first understand the demand for ML. The 5-year trends in Technology Job Postings show a 440% increase in “Machine Learning” skills being requested, capturing a 7% share in all technology-related job postings.
Not just “Data Scientist” Jobs… ALL Technology Jobs.
Top 20 Tech Skills 2014-2019
Source: Indeed Hiring Lab.
My point: Learning ML is essential
We can safely say that if you are in a technology job (or seeking one) then you need to learn how to apply AI and Machine Learning to solve business problems.
The problem: There are a dozen machine learning and deep learning frameworks -
PyTorch, … These all take time and effort to learn. So, which framework should you learn for business?
Why I use and recommend H2O: H2O has singlehandedly produced results in hours that would have otherwise taken days or weeks. I recommend learning
H2O for applying Machine Learning to business data. I’ve been using H2O for several years now on both consulting projects and teaching it to clients. I have 5 reasons that explain how I have gotten this productivity enhancement using H2O on my business projects.
5-Reasons why I use and teach H2O
My Top 5-Reasons why I use and recommend learning
Massive Productivity Booster
H2O AutoML automates the machine learning workflow, which includes automatic training and tuning of many models. This allows you to spend your time on more important tasks like feature engineering and understanding the problem.
Me holding my H2O AutoML Hex Sticker
H2O is my go-to for production ML
2. Scalable on Local Compute
Distributed, In-Memory Processing speeds up computations
In-memory processing with fast serialization between nodes and clusters to support massive datasets enables problems that traditionally need bigger tools to be solved in-memory on your local computer.
3. Spark Integration & GPU Support
- H2O’s Spark integration (Sparkling Water) enables distributed processing on Big Data.
- H2O4GPU enables running H2O’s R and Python libraries using GPUs.
The result is 100x faster training than traditional ML.
4. Best Algorithms, Optimized and Ensembled
H2O’s algorithms are developed from the ground up for distributed computing. The most popular algorithms are incorporated including:
- Random Forest
- and more.
AutoML ensembles (combines) these models to provide superior performance.
5. Production Ready
I love using Docker (learn why) +
H2O to integrate
AutoML models into
Shiny Web Applications. H2O is built on (and depends on) Java, which traditionally creates overhead. But, with H2O Docker Images, it makes deploying H2O Models super easy with all necessary software inside the pre-built Docker Image.
H2O in Production
H2O can be integrated into
Shiny Applications like this one - an Employee Attrition Prediction & Prevention App.
Employee Attrition Prevention App
(Course coming to BSU soon)
H2O is the underlying prediction technology
You need to learn H2O AutoML to build the Employee Attrition Shiny App.
H2O AutoML generates the “Employee Attrition Machine Learning Model” that scores the employees based on features like tenure, over time, stock option level, etc.
H2O AutoML - Employee Attrition Machine Learning Model
Built in DS4B 201-R Course
The H2O Course
If you are ready to learn
H2O AutoML along with critical supporting technologies and data science workflow processes that follow an enterprise-grade system, then look no further: DS4B 201-R (Advanced Machine Learning & Business Consulting Course).
You follow a 10-week program for solving Business Problems with Data Science that teaches each of the tools needed to solve a $15M/year employee attrition problem using Machine Learning (
H2O), Explainable ML (
LIME), and Optimization (
10-Week System for Solving Business Problems with Machine Learning
DS4B 201-R Course
In weeks 5 & 6, you learn
H2O AutoML in-depth as part of your learning journey.
Learn H2O AutoML - Weeks 5 and 6
DS4B 201-R Course
No Machine Learning Experience?
Don't worry. You're covered.
You are probably thinking, “How do I learn H2O if I have no Machine Learning background or coding experience?”
That’s why I created the 4-Course R-Track Program.
Go from beginner to expert in 6-months or less with no prior experience required.
- Data Science Foundations
- Advanced Machine Learning & Business Consulting -
- Shiny Dashboards
- Shiny Developer with AWS (NEW)
I look forward to providing you the best data science for business education.
Founder, Business Science
Lead Data Science Instructor, Business Science University