Part 2 - Data Science with AWS (A Top Skill for 2020)

Written by Matt Dancho



Moving into 2020, three things are clear - Organizations want Data Science, Cloud, and Apps. Here's what's happening and how AWS Cloud Services play a part in the essential skills of 2020.

Articles in Series

  1. Part 1 - Five Full-Stack Data Science Technologies for 2020 (and Beyond)
  2. Part 2 - AWS Cloud (You Are Here)
  3. Part 3 - Docker
  4. Part 4 - Git Version Control
  5. Part 5 - H2O Automated Machine Learning (AutoML)
  6. Part 6 - R Shiny vs Tableau (3 Business Application Examples)
  7. [NEW BOOK] - The Shiny Production with AWS Book

JEDI Contract - Cloud Wars

If you follow the news, you’ve seen that articles like “Microsoft snags hotly contested $10 billion defense contract, beating out Amazon.”

The CNBC article describes the hotly contested “JEDI” (Joint Enterprise Defense Infrastructure) contract, the largest Cloud Contract ever.

According to The Verge, “the contract will provide the Pentagon with cloud services for basic storage and power all the way up to artificial intelligence processing, machine learning, and the ability to process mission-critical workloads.”

Don’t feel bad about Amazon losing the contract to Microsoft. Amazon already has the CIA and the majority of cloud service contracts in enterprises.

The point I’m making is not that AWS is the loser and you should learn Microsoft Azure, but rather that no matter where you go, Government or Enterprise, the cloud infrastructure is going to be an important part of what we do as Data Scientists - and you will need to know AWS, Azure, and possibly other cloud service providers too.

Indeed, the popular employment-related search engine, released an article showing changing trends from 2015 to 2019 in “Technology-Related Job Postings”. We can see a number of changes in key technologies - One that we are particularly interested in is the 14.2% share, 400% increase in AWS - Making AWS my top skill to learn for 2020.

Today's Top Tech Skills

Source: Indeed Hiring Lab.

Azure (No. 17, 1107% Growth) is in the same boat along with Google Cloud Platform for Data Scientists in Digital Marketing.

Here’s what you need to know about AWS (and Cloud in general).

The main cloud players

The 3 main cloud players are:

  • Amazon Web Services (AWS) - The market leader in enterprise & beyond; Tools have grown exponentially; Full-featured & popular with coders, app developers, and IT professionals

  • Microsoft Azure - 2nd in Popularity; Popular with Enterprise, offers “hybrid” cloud that interoperates with customer data centers

  • Google Cloud Platform (GCP) - Popular with Digital Marketing because of integration with Google Analytics

AWS vs Azure Comparison

The smart choice is to learn one of the cloud service providers because switching is relatively simple - when you learn one cloud solution, you learn them all.

Here’s a switching guide from Microsoft Azure vs AWS Cloud Comparison. The major services in AWS are also available in Azure. The same goes for GCP if Google is your preference.

Key Point - Learn how to use one cloud service, and you’ll be able to switch back and forth no matter what cloud service provider your organization standardizes on - AWS, Azure, or GCP.

Switching Guide - Amazon AWS to Microsoft Azure

Which services are important to data scientists?

AWS has a lot of services to choose from. Here’s what I recommend (and teach several of these in my NEW Shiny Developer with AWS Course).

AWS ToolKit - Overwhelming to say the least!

Here are the key tools to have on your Data Science radar:

Amazon EC2 - Elastic Compute - Taught in 202A Course

EC stands for “Elastic Compute” - these are virtual servers that you can spin up and rent. You can set the servers up however you want, and you can scale them up or down (to provide more juice) as needed.

Amazon S3 - Simple Storage Service

S3 is like Dropbox or Google Drive, but at scale and designed to work with applications rather than people. You store files in S3 Buckets (root folder). You interface with S3 through R (aws.s3) or Python (boto3).

Databases

  • Amazon RDS - Relational (SQL) Databases that are pre-configured to run on EC2 Servers
  • Redshift - Petabyte Scale Data Warehouse
  • DynamoDB - NoSQL (similar to MongoDB Atlas, which is taught in 202A Course)

Building Apps for Data Scientists

Organizations depend on the Data Science team to build distributed applications that solve business needs.

With enterprises shifting towards cloud services, I created the Shiny Developer with AWS Course (NEW) to teach data scientists how to build scalable web applications hosted on Amazon EC2 - Elastic Compute.


The Shiny Developer with AWS Course uses an end-to-end web app project to teach the core skills of app development for data scientists. The final application architecture includes:

  • Amazon EC2 - Used to deploy the Web Application
  • Shiny Server - Runs Shiny on Amazon EC2
  • MongoDB Atlas - A cloud NoSQL database equivalent to DynamoDB but free of cost up to 512 MB.


Watch on YouTube    Download the Slides


If you are ready to Learn Data Science, How to Build Web Applications, and Cloud Computing, then I recommend my NEW 4-Course R-Track System, which includes:

  • Business Analysis with R (Beginner)
  • Data Science for Business (Advanced)
  • Shiny Web Applications (Intermediate)
  • Expert Shiny Developer with AWS (Advanced) - NEW COURSE!!



I look forward to providing you the best data science for business education.

Matt Dancho

Founder, Business Science

Lead Data Science Instructor, Business Science University