Part 1 - Five Full Stack Data Science Technologies for 2020 (and Beyond)
Written by Matt Dancho
Moving into 2020, three things are clear - Organizations want Data Science, Cloud, and Apps. Here are the Top 5 essential skills for Data Scientists that need to build and deploy applications in 2020 and beyond.
Articles in Series
- Part 1 - Five Full-Stack Data Science Technologies for 2020 (and Beyond) (You Are Here)
- Part 2 - AWS Cloud
- Part 3 - Docker
- Part 4 - Git Version Control
- Part 5 - H2O Automated Machine Learning (AutoML)
- Part 6 - R Shiny vs Tableau (3 Business Application Examples)
- [NEW BOOK] - The Shiny Production with AWS Book
Top 20 Tech Skills 2014-2019
Indeed, the popular employment-related search engine, released an article showing changing trends from 2015 to 2019 in “Technology-Related Job Postings” examining the 5-Year Change of the most requested technology skills.
Top 20 Tech Skills 2014-2019
Source: Indeed Hiring Lab.
I’m generally not a big fan of these reports because the technology landscape changes so quickly. But, I was pleasantly surprised at the length of time from the analysis - Indeed looked at changes over a 5-year period, which gives a much better sense of the long term trends.
Why No R, Shiny, Tableau PowerBI, Alteryx?
The skills reported are not “Data Science”-specific (which is why you don’t see R, Tableau, PowerBI, Alteryx, on the list).
However, we can glean insights based on the technologies present…
Cloud, Machine Learning, Apps Driving Growth
From the technology growth, it’s clear that Businesses need Cloud + ML + Apps.
Technologies Driving Tech Skill Growth
My Takeaway
This assessment has led me to my key technologies for Data Scientists heading into 2020. I focus on key technologies related to Cloud + ML + Apps.
Top 5 Data Science Technologies for Cloud + ML + Apps
That Data Scientists should learn for 2020 and beyond - these are geared towards the Business Demands: Cloud + ML + Apps. In other words, businesses need data-science and machine learning-powered web applications deployed into production via the Cloud.
Here's what you need to learn to build ML-Powered Web Applications and deploy in the Cloud.
*Note that R and Python are skills that you should be learning before you jump into these.
5 Key Data Science Technologies for Cloud + Machine Learning + Applications
1. AWS Cloud Services
The most popular cloud service provider. EC2 is a staple for apps, running jupyter/rstudio in the cloud, and leveraging cloud resources rather than investing in expensive computers & servers.
Learn More: Data Science with AWS (A Top Skill for 2020)
2. Docker for Web Apps
Creating docker environments drastically reduces the risk of software incompatibility in production. DockerHub makes it easy to share your environment with other Data Scientists or DevOps. Further, Docker and DockerHub make it easy to deploy applications into production.
Learn More: Docker for Data Scientists (A Top Skill for 2020)
3. Git Version Control
Git and GitHub are staples for reproducible research and web application development. Git tracks past versions and enables software upgrades to be performed on branches. GitHub makes it easy to share your research and/or web applications with other Data Scientists, DevOps, or Data Engineering. Further, Git and GitHub make it easy to deploy changes to apps in production.
Learn More: Git for Data Science Applications (A Top Skill for 2020)
4. H2O Machine Learning
Automated machine learning library available in Python and R. Works well on structured data (format for 95% of business problems). Automation drastically increases productivity in machine learning.
Learn More: 5 Reasons to Learn H2O Machine Learning
5. Shiny Web Apps
A comprehensive web framework designed for data scientists with a rich ecosystem of extension libraries (dubbed the “shinyverse”).
Learn More: Shiny vs Tableau (3 Business Application Examples)
Other Technologies Worth Mentioning
-
SQL - For data scientists that need to create complex SQL queries, but don’t have time to deal with messy SQL. dbplyr
is a massive productivity booster - It converts R (dplyr) to SQL. You can use it for 95% of SQL queries.
-
Bootstrap - For data scientists that build apps, Bootstrap is a Front-End web framework that Shiny is built on top of and it powers much of the web (e.g. Twitter’s app). Bootstrap makes it easy to control the User Interface (UI) of your application.
-
MongoDB - For data scientists that build apps, MongoDB is a NoSQL database that is useful for storing complex user information of your application in one table. Much easier than creating a multi-table SQL database.
Real Shiny App + AWS + Docker Case Example
In my Shiny Developer with AWS Course (NEW), you use the following application architecture that uses AWS EC2
to create an Ubuntu Linux Server
that hosts a Shiny
App in the cloud called the Stock Analyzer.
Data Science Web Application Architecture
From Shiny Developer with AWS Course
You use AWS EC2
to build a server to run your Stock Analyzer application along with several other web apps.
AWS EC2 Instance used for Cloud Deployment
From Shiny Developer with AWS Course
Next, you use a DockerFile
to containerize the application’s software environment.
DockerFile for Stock Analyzer App
From Shiny Developer with AWS Course
You then deploy your “Stock Analyzer” application so it’s accessible anywhere via the AWS Cloud
.
DockerFile for Stock Analyzer App
From Shiny Developer with AWS Course
If you are ready to learn how to build and deploy Shiny
Applications in the cloud using AWS
, then I recommend my NEW 4-Course R-Track System.
I look forward to providing you the best data science for business education.
Matt Dancho
Founder, Business Science
Lead Data Science Instructor, Business Science University