6 Reasons To Learn R For Business [2021]
Written by Matt Dancho
Updated December 2020
Data science for business (DS4B) is the future of business analytics, yet it is really difficult to figure out where to start. The last thing you want to do is waste time with the wrong tool. Making effective use of your time involves two pieces: (1) selecting the right tool for the job, and (2) efficiently learning how to use the tool to return business value. This article focuses on the first part, explaining why R is the right choice in six points.
If youād like to tackle learning R efficiently, we have another article that covers the 80/20 Rule for Learning R.
Reason 1: R Has The Best Overall Qualities For Business
There are a number of tools available for business analysis/intelligence (with DS4B being a subset of this area). Each tool has its pros and cons, many of which are important in the business context. We can use these attributes to compare how each tool stacks up against the others! We did a qualitative assessment using several criteria:
- Business Capability (1 = Low, 10 = High)
- Ease of Learning (1 = Difficult, 10 = Easy)
- Cost (Free/Minimal, Low, High)
- Trend (0 = Fast Decline, 5 = Stable, 10 = Fast Growth)
Further discussion on the assessment is included in the Appendix at the end of the article.
What we saw was particularly interesting. A trendline developed exposing a tradeoff between learning curve and DS4B capability rating. The most flexible tools are more difficult to learn but tend to have higher business capability. Conversely, the āeasy-to-learnā tools are often not the best long-term tools for business or data science capability. Our opinion is go for capability over ease of use.
Of the top tools in capability, R has the best mix of desirable attributes including high data science for business capability, low cost, growth, and has a massive ecosystem of powerful R libraries. The only downside is the learning curve. The Cheat Sheet below showcases the powerful libraries that are at your fingertips - Download our Ultimate R Cheat Sheet to see what libraries are available to solve specific needs.
The Ultimate R Cheat Sheet showcases the massive ecosystem of powerful R packages (Free Download)
Reason 2: R Is Data Science For Non-Computer Scientists
If you are seeking high-performance data science tools, you really have two options: R or Python. When starting out, you should pick one. Itās a mistake to try to learn both at the same time. Your choice comes down to whatās right for you. The difference between R and Python has been described in numerous infographics and debates online, but the most overlooked reason is person-programming language fit. Donāt understand what we mean? Letās break it down.
Fact 1: Most people interested in learning data science for business are not computer scientists. They are business professionals, non-software engineers (e.g. mechanical, chemical), and other technical-to-business converts. This is important because of where each language excels.
Fact 2: Most activities in business and finance involve communication. This comes in the form of reports, dashboards, and interactive web applications that allow decision makers to recognize when things are not going well and to make well-informed decisions that improve the business.
Now that we recognize whatās important, letās learn about the two major players in data science.
About Python
Python is a general service programming language developed by software engineers that has solid programming libraries for math, statistics and machine learning. Python has best-in-class tools for pure machine learning and deep learning, but lacks much of the infrastructure for subjects like econometrics and communication tools such as reporting. Because of this, Python is well-suited for computer scientists and software engineers.
About R
R is a statistical programming language developed by scientists that has open source libraries for statistics, machine learning, and data science. R lends itself well to business because of its depth of topic-specific packages and its communciation infrastructure. R has packages covering a wide range of topics such as econometrics, finance, and time series. R has best-in-class tools for visualization, reporting, and interactivity, which are as important to business as they are to science. Because of this, R is well-suited for scientists, engineers and business professionals.
Which Should You Learn?
Donāt make the decision tougher than what it is. Think about where you are coming from:
-
Are you a computer scientist or software engineer? If yes, learn Python.
-
Are you an analytics professional or mechanical/industrial/chemical engineer looking to get into data science? If yes, learn R.
Think about what you are trying to do:
-
Are you trying to build a self-driving car? If yes, learn Python.
-
Are you trying to communicate business analytics throughout your organization? If yes, learn R.
Reason 3: Learning R Is Easy With The Tidyverse
Learning R used to be a major challenge. Base R was a complex and inconsistent programming language. Structure and formality was not the top priority as in other programming languages. This all changed with the ātidyverseā, a set of packages and tools that have a consistently structured programming interface.
When tools such as dplyr
and ggplot2
came to fruition, it made the learning curve much easier by providing a consistent and structured approach to working with data. As Hadley Wickham and many others continued to evolve R, the tidyverse
came to be, which includes a series of commonly used packages for data manipulation, visualization, iteration, modeling, and communication. The end result is that R is now much easier to learn - Learn R From A Master Data Scientistās Code.
Source: tidyverse.org
R continues to evolve in a structured manner, with advanced packages that are built on top of the tidyverse
infrastructure. A new focus is being placed on modeling and algorithms, which we are excited to see. Further, the tidyverse
is being extended to cover topical areas such as text (tidytext
) and finance (tidyquant
). For newcomers, this should give you confidence in selecting this language. R has a bright future.
Reason 4: R Has Brains, Muscle, And Heart
Saying R is powerful is actually an understatement. From the business context, R is like Excel on steroids! But more important than just muscle is the combination of what R offers: brains, muscle, and heart. The 2nd page of the R Cheat Sheet (FREE DOWNLOAD) links to all of the tools discussed next (and more tools beyond)!
An expanded set of tools has been added to the R Cheat Sheet (Free Download)
š§ R has brains
R implements cutting-edge algorithms including:
- H2O (
h2o
) - High-end machine learning package
- Keras/TensorFlow (
keras
, tensorflow
) - Go-to deep learning packages
- xgboost - Top Kaggle algorithm
- Modeltime - Time Series forecasting
- And many more!
These tools are used everywhere from AI products to Kaggle Competitions, and you can use them in your business analyses.
šŖ R has muscle
R has powerful tools for:
- Vectorized Operations - R uses vectorized operations to make math computations lightning fast right out of the box
- Loops (
purrr
)
- Parallelizing operations (
parallel
, future
)
- Speeding up code using C++ (
Rcpp
)
- Connecting to other languages (
rJava
, reticulate
)
- Working With Databases - Connecting to databases (
dbplyr
, odbc
, bigrquery
)
- Handling Big Data - Connecting to Apache Spark (
sparklyr
)
- And many more!
ā¤ļø R has heart
We already talked about the infrastructure, the tidyverse
, that enables the ecosystem of applications to be built using a consistent approach. Itās this infrastructure that brings life into your data analysis. The tidyverse
enables:
- Data manipulation (
dplyr
, tidyr
)
- Working with data types (
stringr
for strings, lubridate
for date/datetime, forcats
for categorical/factors)
- Visualization (
ggplot2
)
- Programming (
purrr
, tidyeval
)
- Communication (
Rmarkdown
, shiny
)
Reason 5: R Is Built For Business
Two major advantages of learning R versus every other programming language is that it can produce business-ready reports and machine learning-powered web applications. Neither Python or Tableau or any other tool can currently do this as efficiently as R can. The two capabilities we refer to are rmarkdown
for report generation and shiny
for interactive web applications.
Rmarkdown
Rmarkdown is a framework for creating reproducible reports that has since been extended to building blogs, presentations, websites, books, journals, and more. Itās the technology thatās behind this blog, and it allows us to include the code with the text so that anyone can follow the analysis and see the output right with the explanation. Whatās really cool is that the technology has evolved so much. Here are a few examples of its capability:
Shiny
Shiny is a framework for creating interactive web applications that are powered by R. Shiny is a major consulting area for us as four of five assignments involve building a web application using shiny
. Itās not only powerful, it enables non-data scientists to gain the benefit of data science via interactive decision making tools. Hereās an example of a Google Trend app built with shiny
.
Explore the Web App Gallery of predictive business apps
Being a powerful language alone is not enough. To be successful, a language needs community support. Weāll hit on two ways that R excels in this respects: CRAN and the R Community.
CRAN is like the Apple App store, except everything is free, super useful, and built for R. With over 17,000 packages, it has most everything you can possibly want from machine learning to high-performance computing to finance and econometrics! The task views cover specific areas and are one way to explore Rās offerings. CRAN is community-driven, with top open source authors such as Hadley Wickham and Dirk Eddelbuettel leading the way. Package development is a great way to contribute to the community especially for those looking to showcase their coding skills and give back!
You begin learning R because of its capability, you stay with R because of its community. The R Community is the coolest part. Itās tight-knit, opinionated, fun, silly, and highly knowledgeableā¦ all of the things you want in a high performing team.
Social/Web
R users can be found all over the web. A few of the popular hangouts are:
Conferences
R-focused business conferences are gaining traction in a big way. Here are a few that we attend and/or will be attending in the future:
- EARL - Mango Solutionās conference on enterprise and business applications of R
- R/Finance - Community-hosted conference on financial asset and portfolio analytics and applied finance
- Rstudio Conf - Rstudioās technology conference
- New York R - Business and technology-focused R conference
A full list of R conferences can be found here.
Meetups
A really cool thing about R is that many major cities have a meetup nearby. Meetups are exactly what you think: a group of R-users getting together to talk R. They are usually funded by R-Consortium. You can get a full list of meetups here.
Conclusion
R has a wide range of benefits making it our obvious choice for Data Science for Busienss (DS4B). Thatās not to say that Python isnāt a good choice as well, but, for the wide-range of needs for business, thereās nothing that compares to R. In this article we saw why learning R is a great choice. In the next article weāll show you how to learn R using the 80/20 Rule.
Hereās some additional information on the tool assessment. We have provided the code used to make the visualization, the criteria explanation, and the tool assessment.
Criteria Explanation
Our assessment of the most powerful DS4B tools was based on three criteria:
-
Business Capability (1 = Low, 10 = High): How well-suited is the tool for use in the business? Does it include features needed for the business including advanced analytics, interactivity, communication, interactivity, and web apps?
-
Ease of Learning (1 = Difficult, 10 = Easy): How easy is it to pick up? Can you learn it in a week of short courses or will it take a longer time horizon to become proficient?
-
Cost (Free/Minimal, Low, High): Cost has two undesirable effects. From a first-order perspective, the organization has to spend money. This is not in-and-of-itself undesirable because the software companies can theoretically spend on R&D and other efforts to advance the product. The second-order effect of lowering adoption is much more concerning. High-cost tools tend to have much less discussion in the online world, whereas open source or low-cost tools have great trends.
-
Trend (0 = Fast Decline, 5 = Stable, 10 = Fast Growth): We used StackOverflow Insights of questions as a proxy for the trend of usage over time. A major assumption is that growing number of Stack Overflow questions is that the usage is also increasing in a similar trend.
Source: Stack Overflow Trends
R:
- DS4B Capability = 10: Has it all. Great data science capability, great visualization libraries, Shiny for interactive web apps, rmarkdown for professional reporting.
- Learning Curve = 4: A lot to learn, but learning is getting easier with the tidyverse.
- Trend = 10: Stack overflow questions are growing at a very fast pace.
- Cost = Low: Free and open source
Python:
- DS4B Capability = 7: Has great machine learning and deep learning libraries. Can connect to any major database. Communication is limited by flask / Django web applications, which can be difficult to build. Does not have a business reporting infrastructure comparable to rmarkdown.
- Learning Curve = 4: A lot to learn, but learning is relatively easy compared to other object oriented programming languages like Java.
- Trend = 10: Stack overflow questions are growing at a very fast pace.
- Cost = Low: Free and open source
Excel:
- DS4B Capability = 4: Mainly a spreadsheet software but has programming built in with VBA. Difficult to integrate R, but is possible. No data science libraries.
- Learning Curve = 10: Relatively easy to learn and become an advanced user.
- Trend = 7: Stack overflow questions are growing at a relatively fast pace.
- Cost = Low: Comes with Microsoft Office, which most organizations use.
Tableau:
- DS4B Capability = 6: Has R integrated, but is very difficult to implement advanced algorithms and not as flexible as R+shiny.
- Learning Curve = 7: Very easy to learn.
- Trend = 6: Stack overflow questions are growing at a relatively fast pace.
- Cost = Low: Free public version. Enterprise licenses are relatively affordable.
PowerBI:
- DS4B Capability = 5: Similar to Tableau, but not quite as feature-rich. Can integrate R to some extent.
- Learning Curve = 8: Very easy to learn.
- Trend = 6: Expected to have same trend as Tableau.
- Cost = Low: Free public version. Licenses are very affordable.
Matlab:
- DS4B Capability = 6: Can do a lot with it, but lacks the infrastructure to use for business.
- Learning Curve = 2: Matlab is quite difficult to learn.
- Trend = 1: Stack overflow growth is declining at a rapid pace.
- Cost = High: Matlab licenses are very expensive. Licensing structure does not scale well.
SAS:
- DS4B Capability = 8: Has data science, database connection, business reporting and visualization capabilities. Can also build applications. However, limited by closed-source nature. Does not get latest technologies like tensorflow and H2O.
- Learning Curve = 4: Similar to most data science programming languages for the tough stuff. Has a GUI for the easy stuff.
- Trend = 3: Stack Overflow growth is declining.
- Cost = High: Expensive for licenses. Licensing structure does not scale well.
R Resources