Part 4 - Git for Data Science Applications (A Top Skill for 2020)
Written by Matt Dancho
Moving into 2020, three things are clear - Organizations want Data Science, Cloud, and Apps. A key skill that companies need is Git
for application development (I call this Full Stack Data Science). Here's what is driving Git's growth, and why you should learn Git for data science application development.
Articles in Series
- Part 1 - Five Full-Stack Data Science Technologies for 2020 (and Beyond)
- Part 2 - AWS Cloud
- Part 3 - Docker
- Part 4 - Git Version Control (You Are Here)
- Part 5 - H2O Automated Machine Learning (AutoML)
- Part 6 - R Shiny vs Tableau (3 Business Application Examples)
- [NEW BOOK] - The Shiny Production with AWS Book
Top 20 Tech Skills 2014-2019
Indeed, the popular employment-related search engine, released an article showing changing trends from 2014 to 2019 in “Technology-Related Job Postings” examining the 5-Year Change of the most requested technology skills.
Top 20 Tech Skills 2014-2019
Source: Indeed Hiring Lab.
I’m generally not a big fan of these reports because the technology landscape changes so quickly. But, I was pleasantly surprised at the length of time from the analysis - Indeed looked at changes over a 5-year period, which gives a much better sense of the long term trends.
Cloud, Machine Learning, Apps Driving Growth
3 Technology Trends show that organizations are transitioning from Business Reporting to Application Development (Read 5 Data Science Technologies for 2020 (and Beyond) for more insights on Key Skills for Data Science and App Development):
-
Cloud - AWS (14% Share, 400% Growth) and Azure (1100% Growth)
-
Machine Learning - Machine Learning (400% Growth), Python (18% Share, 123% Growth)
-
Applications - Git (8% Share, 150% Growth), Docker (4000% Growth)
The changing business needs is challenging Data Scientists to learn new technologies for Data Science Application Development… And, Git
and Docker
are the future for app development.
Git & Docker Trends
We can see that both Git
and Docker
are experiencing explosive, multi-year growth trends in “Google Search Interest”, further supporting the need to learn these key technologies that drive application development. (Read Docker for Data Science Applications (4000% Growth) to learn about how Docker
helps facilitate data science applications.)
What Is Git?
Let’s look at a (Shiny
) web application to see what Git
does and how it helps.
Git Workflow
From Shiny Developer with AWS Course
Git
and GitHub
facilitate a workflow for developing and deploying applications:
-
Application Development begins locally (Local Repository) on your computer. Changes are tracked with Git
.
-
Code is pushed to GitHub
, a Remote Repository designed for sharing version controlled files.
-
The remote repository can be cloned to an AWS EC2 Instance
, which is a Host for the production application.
Git Version Control
The most important concept of git
is version control. Let’s dive into the application to see how git
helps.
We can see that application consists of 2 things:
-
Files (Git
Control - The set of instructions for the app. For a Shiny App this includes an app.R file that contains layout instructions, server control instructions, database instructions, etc
-
Software (Docker
Control) - The code external to your files that your application files depend on. For a Shiny App, this is R, Shiny Server, and any libraries your app uses.
Git
applies version control to the files. This is a lifeline in case you make a change that adversely impacts production. You can always go backwards.
Git Commands
Version Control Status & Git
Command Workflow. When a codebase has git
initialized, the files are untracked in your Working Directory. As changes are made, the user wants to track these changes. We track them using git commands.
Git commands change the status by moving files through the version control workflow. The most important commands are:
-
commit
- This is when a snapshot of the file is added to your local repository. You can always go back to this version.
-
push
- To push any committed files from a local repo (e.g. your computer) to a remote repo (e.g. GitHub)
-
pull
- To pull down files on a remote repository to your local computer
-
reset
- To undo a change to a committed file
Real Shiny App + AWS + Git Example
In my Shiny Developer with AWS Course (NEW), you use the following application architecture that uses AWS EC2
to create an Ubuntu Linux Server
that hosts a Shiny
App in the cloud called the Stock Analyzer.
Data Science Web Application Architecture
From Shiny Developer with AWS Course
We use Git
to track our files as we move into Production. Here’s an example of the files stored on GitHub in a Private Repo.
GitHub Repository for Stock Analzyer
From Shiny Developer with AWS Course
You then deploy your “Stock Analyzer” application into Production so it’s accessible anywhere via the AWS Cloud via AWS EC2 Instance.
Stock Analyzer App
From Shiny Developer with AWS Course
If you are ready to learn how to build and deploy Shiny
Applications in the cloud using AWS
, then I recommend my NEW 4-Course R-Track System.
I look forward to providing you the best data science for business education.
Matt Dancho
Founder, Business Science
Lead Data Science Instructor, Business Science University