Git for Data Science Applications (A Top Skill for 2020)

Written by Matt Dancho on December 9, 2019



Moving into 2020, three things are clear - Organizations want Data Science, Cloud, and Apps. A key skill that companies need is Git for application development (I call this Full Stack Data Science). Here's what is driving Git's growth, and why you should learn Git for data science application development.

Full-Stack Data Science Series

This is part of a series of articles on essential Data Science and Web Application skills for 2020 and beyond:

  1. Part 1 - 5 Full-Stack Data Science Technologies for 2020 (and Beyond)
  2. Part 2 - AWS Cloud
  3. Part 3 - Docker
  4. Part 4 - Git Version Control
  5. Part 5 - H2O Automated Machine Learning (AutoML)
  6. Part 6 - Shiny Web Applications (Coming Soon)
  7. [NEW BOOK] - Shiny Production with AWS, Docker, Git Book

Top 20 Tech Skills 2014-2019

Indeed, the popular employment-related search engine, released an article showing changing trends from 2014 to 2019 in “Technology-Related Job Postings” examining the 5-Year Change of the most requested technology skills.

Today's Top Tech Skills

Top 20 Tech Skills 2014-2019
Source: Indeed Hiring Lab.

I’m generally not a big fan of these reports because the technology landscape changes so quickly. But, I was pleasantly surprised at the length of time from the analysis - Indeed looked at changes over a 5-year period, which gives a much better sense of the long term trends.

Cloud, Machine Learning, Apps Driving Growth

3 Technology Trends show that organizations are transitioning from Business Reporting to Application Development (Read 5 Data Science Technologies for 2020 (and Beyond) for more insights on Key Skills for Data Science and App Development):

  1. Cloud - AWS (14% Share, 400% Growth) and Azure (1100% Growth)

  2. Machine Learning - Machine Learning (400% Growth), Python (18% Share, 123% Growth)

  3. Applications - Git (8% Share, 150% Growth), Docker (4000% Growth)

The changing business needs is challenging Data Scientists to learn new technologies for Data Science Application Development… And, Git and Docker are the future for app development.

We can see that both Git and Docker are experiencing explosive, multi-year growth trends in “Google Search Interest”, further supporting the need to learn these key technologies that drive application development. (Read Docker for Data Science Applications (4000% Growth) to learn about how Docker helps facilitate data science applications.)

What Is Git?

Let’s look at a (Shiny) web application to see what Git does and how it helps.

Git Workflow

Git Workflow
From Shiny Developer with AWS Course

Git and GitHub facilitate a workflow for developing and deploying applications:

  1. Application Development begins locally (Local Repository) on your computer. Changes are tracked with Git.

  2. Code is pushed to GitHub, a Remote Repository designed for sharing version controlled files.

  3. The remote repository can be cloned to an AWS EC2 Instance, which is a Host for the production application.

Git Version Control

The most important concept of git is version control. Let’s dive into the application to see how git helps.

AWS Application Development with Git and Docker

We can see that application consists of 2 things:

  • Files (Git Control - The set of instructions for the app. For a Shiny App this includes an app.R file that contains layout instructions, server control instructions, database instructions, etc

  • Software (Docker Control) - The code external to your files that your application files depend on. For a Shiny App, this is R, Shiny Server, and any libraries your app uses.

Git applies version control to the files. This is a lifeline in case you make a change that adversely impacts production. You can always go backwards.

Git Commands

Version Control Status & Git Command Workflow. When a codebase has git initialized, the files are untracked in your Working Directory. As changes are made, the user wants to track these changes. We track them using git commands.

Git Commands

Git commands change the status by moving files through the version control workflow. The most important commands are:

  • commit - This is when a snapshot of the file is added to your local repository. You can always go back to this version.

  • push - To push any committed files from a local repo (e.g. your computer) to a remote repo (e.g. GitHub)

  • pull - To pull down files on a remote repository to your local computer

  • reset - To undo a change to a committed file

Real Shiny App + AWS + Git Example

In my Shiny Developer with AWS Course (NEW), you use the following application architecture that uses AWS EC2 to create an Ubuntu Linux Server that hosts a Shiny App in the cloud called the Stock Analyzer.

Data Science Web Application Architecture
From Shiny Developer with AWS Course

We use Git to track our files as we move into Production. Here’s an example of the files stored on GitHub in a Private Repo.

GitHub Repository for Stock Analzyer
From Shiny Developer with AWS Course

You then deploy your “Stock Analyzer” application into Production so it’s accessible anywhere via the AWS Cloud via AWS EC2 Instance.

Stock Analyzer App
From Shiny Developer with AWS Course

If you are ready to learn how to build and deploy Shiny Applications in the cloud using AWS, then I recommend my NEW 4-Course R-Track System.



I look forward to providing you the best data science for business education.

Matt Dancho

Founder, Business Science

Lead Data Science Instructor, Business Science University