R is for Research, Python is for Production

Written by Matt Dancho and Jarrell Chalmers on July 12, 2021




Updated July 2021

Both R and Python are great. We’ll showcase some of the strengths of each language in this article by showcasing where the major development efforts are within each ecosystem.

R is for Research


If I had to describe R in one word, it would be: tidyverse. It has made research tasks - wrangling data, visualizing outcomes, iterating from idea to code - painless. In fact, it’s a joy. I’ll explain why R is for Research using the Ultimate R Cheat Sheet, a one-stop shop for the R-ecosystem.


When starting with R, Tidyverse is an ideal place to begin your journey. This is the formalized set of packages and tools that have a consistently structured programming interface, as opposed to the base version of R that was notably more complex and less user friendly.

We see many smaller packages that tackle specific problems. The following are the most important packages:

Dplyr & ggplot2

Two great packages in R that you’ll make daily decisions from are dplyr and ggplot2, which amongst other things, are great for data manipulation and visualization. These are the two most important skills a data scientist or data analyst can have.

Rmarkdown

One of the most exceptional aspects of R is without a doubt Rmarkdown, which is a framework for creating reproducible reports, presentations, blogs, journals and more! Imagine having a report that runs itself, and creates an easily shareable HTML page or PDF to share with your team. Definitely a more streamlined approach than hundreds of clicks in Excel every Monday morning.

Shiny

Shiny is a framework within R that is used to create interactive web applications. One of the best features of Shiny is providing the non data focused members of your team with the data science tools they need for decision making through an easy to use GUI (graphical user interface). Imagine your team getting together for a Monday afternoon planning session, having already reviewed the previous week’s report created in Rmarkdown, and running simulations using your collaborative Shiny web application to determine where the data is guiding you next.

Where is R Growing?

Next, if we scroll through to the “Special Topics Page”, we can see the R ecosystem is growing. This is a key feature that distinguishes the R Ecosystem from the Python Ecosystem.

We can see that R has expanded into:

  • Time Series and Forecasting: Modeltime and Timetk
  • Financial Analysis (and other domains): Tidyquant, Quantmod
  • Network Analysis and Visualization: Tidygraph and ggraph
  • Text Analysis: Tidytext and Text Recipes
  • Geospatial Analysis and Visualization: Thematic Maps
  • Machine Learning: H2O, Tidymodels, and MLR3

What is R missing?

There is noticeably a gap in the Production. R has Shiny (Apps) and Plumber (APIs, not shown), but Automation Tools like Airflow and Cloud Software Development Kits (SDKs) are primarily available in Python.

R Overall

R is really something special when doing research because of the tidyverse, which streamlines data wrangling and visualization. Honestly, you’ll be 3-5X more productive doing data wrangling in R once you become proficient with the tidyverse.

Why is Python Great?

Python is amazing too, but for different reasons. Let’s take a Python Package like OpenCV - for Computer Vision.

This is a real strength for the Python language because we can do crazy cool things like Object Detection with OpenCV.


But, how much does this apply to my daily life? Around zero. Why? Because I’m a business analyst and data scientist that works with SQL databases. I’m more interested in how Python will help me better mine for information and productionalize the results.


Let’s check out the Python Ecosystem using the Ultimate Python Cheat Sheet (note that this is different from the R cheat sheet shown earlier).


We see that there’s Pandas for essentially everything related to import, tidying and data wrangling. So what is Pandas? Pandas is an object-oriented tool for data wrangling in Python.

Pandas vs Tidyverse

While programmers love pandas, business analysts may initially struggle with the object-oriented (pythonic) way of having Data Frames with methods.

customer_counts_df = df.groupby('customer_id').value_counts()

Everything in Python is an object, and we call these methods (e.g. df.groupby(), and df.value_counts()) on the object. This call doesn’t seem too bad. But we are normally trying to do many more wrangling operations. It gets very challenging, less readable, and more complex.

Conversely, in R using the tidyverse we use a different syntax with a pipe (%>%). This is very similar to SQL and the flow of data wrangling how a user thinks.

customer_counts_tbl <- df %>%
    group_by(customer_id) %>%
    summarize(count = n())

This tidyverse data wrangling workflow makes it often much easier for analysts to expand the set of operations into 10 or more data wrangling commands. Remember, the challenge isn’t typing code, it’s turning your thoughts into code. This is where the tidyverse is really powerful.

Key Strengths of Python lie in Production ML

OK, so why is Python great for business? It turns out that it’s strengths lie in Machine Learning and Production!

We can see that Python has well-developed Production ML-oriented tools:

  • Automation - Airflow, Luigi
  • Cloud - AWS, Google Cloud, and Azure software development kits
  • Machine Learning - ScikitLearn
  • Deep Learning and Computer Vision - PyTorch, TensorFlow, MXNet, OpenCV
  • NLP - spaCy, NLTK

These production-oriented tools make it easier to work with others that interact with cloud and operations as part of a larger IT team because they are already in Python. No need to include R and any extra dependencies into a production system.

Python Overall

If you can get over the Pandas learning curve, then Python becomes a great tool. Most IT teams know Python, so your code will fit right into their workflow. Just realize that you may be 3X to 5X less productive at Research than your R counterparts due to the tidyverse boost.

Which Language Should You Learn?

The decision can be challenging because they both Python and R have clear strengths.

  • R is exceptional for Research: Making visualizations, telling the story, producing reports, and making MVP apps with Shiny. From concept (idea) to execution (code), R users tend to be able to accomplish these tasks 3X to 5X faster than Python users, making them very productive for research.
  • Python is exceptional for Production ML: Integrating machine learning models into production systems where your IT infrastructure relies on automation tools like Airflow or Luigi.

Why Not Learn Both R and Python?

Both R and Python are amazing with different strengths. If you know both, you become more valuable to a team. And, with the development of two key technologies, it now possible to use both languages together. What technologies am I talking about? reticulate and rpy2.

What is reticulate? Reticulate is an R package that makes it easy to connect to Python libraries. For example, the Google Adwords API is written in Python, but your research is in R. Now you can use reticulate to connect to adwords using the Python API right from R.

What is rpy2? Rpy2 is a Python package that makes it easy to connect to R libraries. For example, if you need the modeltime forecasting library in R, you can connect up to it allowing you to run Panel Data forecasts from inside of your Python workflow.

Learning to leverage both R and Python means you are immediately valuable to a data science team.

So how should you go about learning to integrate R and Python? Well, it starts by learning both R and python.

Learning Both R and Python

To leverage both R and Python together, you need to know both R and Python. That’s why over the past 3-years, we have been developing a full system of courses. The Course Development Roadmap looks like this. Both the Python and R-Tracks follow parallel paths that lead to using both together with reticulate and py2.

R Python Tracks

Course Status:

Join the Waitlist: Machine Learning for Business with Python (201-P)

The next course in our system teaches Python with scikit learn and a host of powerful tools for Production Machine Learning. Join the Machine Learning for Business with Python Course Waitlist.

This waitlist is for:

  • People that want to learn Machine Learning with the Python Ecosystem
  • R users that want to learn Python
  • Python users that want to learn data science for business

The course pre-requisite is:

Join the Machine Learning for Business with Python Course Waitlist