The Most Overlooked R Package (That Can Get You Through A Data Science Job Interview)
Written by Matt Dancho
If you are looking to learn about the most useful R package that can help you get through a data science job interview AND you probably don’t know it yet, you’ve come to the right place, my friend! Here’s what’s in store for you today:
- If you want a job in data science, I’m going to show you how THIS R package can help you get through an interview with 5 lines of code (A SECRET HACK).
- As a BONUS I’m going to share 3 other R packages that can help get you the skills to get a data science job in lucrative fields like finance and time series.
- SURPRISE AT THE END. Yes, for everyone who reads through the end I have a special surprise!!
You and I are going to have some fun today too (I’ll tell you why in a second)!
So buckle up and get ready for some fun along the way.
What’s this fun that Matt speaks of?
I’ve been learning R for the better part of a decade and for 7 of those years I’ve been building R packages many of which are open source (free for everyone to use).
Here’s a short list of my open source R packages, AND I want you to guess which one I’m covering today. I’ve sorted by GitHub stars (We’ll use this as a rough measure of popularity).
Table 1: Matt's R Packages
I have a question…
Guess which R package I’m going to cover
Simple Question: Which R package do you think I’m about to cover?
Let me help you…
Here are 3 hints:
- It can help you get through a Job Interview
- It’s not super popular (meaning it’s overlooked by many)
- It’s listed in Table 1: Matt’s R Packages
What do you think it is?
OK, need some more help? Well here’s the bonus first…
Here’s what it’s not (your 3 BONUS R Packages)
If you don’t know these R packages yet, then I highly suggest learning these first (instead of the one that gets you through the job interview).
These 3 BONUS R packages will help you get the skills needed to get a data science job in lucrative fields like finance and time series.
So let me first show you what it’s not…
1. It’s not tidyquant
Tidyquant is an incredibly useful R package for downloading and working with Financial Data (stocks, investments, and investment portfolios). You can use it to make data science interview portfolio projects AND you can even do your own stock investment portfolios with it (yes - these are 2 different meanings of the term “portfolio”).
It's not tidyquant
But, with 750 GitHub stars and almost 800,000 downloads, there’s a good chance you’ve heard of it or possibly even used it. Definitely NOT overlooked.
Wat to get started with Financial Analysis in R for FREE?
Like I said, if you want a job in a lucrative field like finance, you should first learn
tidyquant or some of the other Financial R packages (before you learn my secret interview HACK coming shortly).
To make learning financial analysis in R easier and faster, I have a special R Cheat Sheet that links to all of the important documentation (Spoiler - It is FREE).
You can download the R cheat sheet for FREE here. My R cheat sheet consolidates 20,000 R packages into the 100 best. So when you want to work in domains like finance, it has all of the R packages you need to get started.
Free R Cheat Sheet (Download Here)
How to Learn Financial Analysis in R for FREE
To start learning Financial Analysis in R, head to Page 3 of the cheat sheet, and you can check out the Financial Analysis Section.
- You’ll see Tidyquant and several other Financial R Packages.
- The links will take you to the
tidyquant& financial software documentation.
Getting Started with Financial Analysis in R
But, I’m NOT covering
tidyquant today. It’s just too popular. Check out the Cheat Sheet for the tidyquant doc’s and tutorials.
Onto the next Bonus R package that IT’s not…
2. It’s not timetk
Timetk is a fantastic R package (yes, I’m proud of my baby
timetk) that helps bring time series analysis to the tidyverse.
If you love
ggplot2, imagine being able to wrangle and visualize time series data the SAME way. That’s what
It's not timetk
But timetk is again too popular. Timetk has been downloaded 1,400,000 times and is a staple of time series analysis using R. Timetk is mainstream… definitely not overlooked.
Want to Learn Time Series Analysis in R for FREE?
Check out the Time Series Analysis section on Page 3 of my R Cheat Sheet.
Getting Started with Time Series Analysis in R
Alright, onto the next one…
3. It’s not modeltime
I have put 2+ years of blood, sweat, and tears into developing
modeltime, an ecosystem of time series forecasting tools that leverages the amazing Tidymodels ecosystem.
It's not modeltime
How to Get a Time Series Forecasting Job
And, of course I’d love to talk about
modeltime because well, modeltime has gotten a bunch of my students data science jobs, senior data science jobs, and promotions (20%+ raises).
Amit got a Data Science job
Matt got a Lead Data Science job
But, it’s not it. Again, way too popular. Not overlooked.
Want to learn Time Series Forecasting for FREE?
I can’t do a full time series forecast analysis in this post, so I’m going to do the next best thing! I’ll point you to some free resources for time series forecasting inside my R Cheat Sheet.
Head to Page 3 of my R Cheat Sheet. Under Forecasting you’ll see links to all of the modeltime ecosystem documentation (I’ve put 100s of hours into so it’s understandable and full-featured).
Getting Started with Forecasting in R
Alright, so what IS it?
Well it’s also not…
Sweep- Sweep is great if you want to “tidy” the
forecastR package, but Timetk and Modeltime are the new time series and forecasting tools that I use.
Portfoliodown- A great package of course! But, it’s for making professional data science interview portfolios. Not actually completing a data science interview.
Anomalize- Another great choice, but actually
timetkhas most of the time series anomaly detection functionality ported over and anomalize depends on tibbletime, an older system for time series analysis that has been superseded.
So what could IT be?
(I can hear it now, “C’mon Matt, tell us already!!!”)
Here IT is… The MOST overlooked R package (that can get you through a data science job interview)!!
The secret of the PROS!
It's Correlation Funnel!
Why Correlation Funnel?
Imagine a world where you’re given a dataset that you’ve never seen before (such as a data science interview), and the interviewer says something like this:
“You have 2 hours to give me some insights.”
Well, this actually happened to my friend Danny Ma in one of his first data science job interviews. Here’s his interview story…
The big interview mistake most data scientists make
What happened is super common (in fact I’ve done it too).
Danny was overthinking the problem in a time-constrained interview. And the result was ZERO insights (in 2-hours).
Danny felt so embarrassed. BUT, I’ve been there too (and I’m sure a ton of other people have as well).
- Overthinking the problem.
- Using the wrong tools (Excel and VBA).
- Pressured for time
It happens. But it didn’t need to.
What if Danny had
correlationfunnel to help people like Danny (and myself) find insights super fast!
correlationfunnel IS A SIMPLE HACK to pull out business insights from data automatically.
How to HACK exploratory data analysis
So let’s say you’re given a new data set like this Customer Churn dataset that comes with
The interview dataset
We’re trying to identify relationships between:
- Customer Churn (Yes or No)
- Important Customer Features (e.g. what product they are enrolled in, how much they have purchased over their lifetime, their tenure, gender, etc)
Quickly go from raw data to insights!
The beauty is how fast you can go from raw data to insights. First, here’s the code.
I’m not gonna spend a lot of time on this other than just comments on the picture. I’ll show you where to find the documentation which explains much more than I can in this post.
Using Correlation Funnel (R Code)
HACKING the Business Insights
But the beauty is in the plot. Here’s how quickly I can make generate business insights in the data. Correlation Funnel is THE HACK!
Hacking Business Insights (with the Correlation Funnel Plot)
With one visualization, I can quickly diagnose potential causes of Customer Churn AND I can come up with potential solutions like:
- Month-to-Month Contracts have high churn: Offer upgrades to long-term contracts
- Customers that opt-out of Online Internet Security have high churn. Inform customers without online security of the dangers of getting hacked and how our solutions provides a superior defense. Offer one-time upgrades.
- Customers with <6 month tenure have high churn. Incentivize 6-month bill. _“Stay for 6-months and the 7th month is on us!” _
How to learn exploratory data analysis (EDA) for FREE?
You guessed it. I have a full Exploratory (EDA) section in my R Cheat Sheet.
Head to Page 3 of my R Cheat Sheet. You’ll see
correlationfunnel and a bunch of other go-to packages for hacking exploratory data analysis.
Recap (Plus a Surprise!)
We learned how to use the
correlationfunnel library to hack exploratory data analysis, a secret technique that’s useful in job interviews, when you are under time constraints or simply don’t know much about your data. Great work! But, there’s a lot more to becoming a data scientist.
If you’d like to become a data scientist (and have an awesome career, improve your quality of life, enjoy your job, and all the fun that comes along), then I can help with that.
What’s the surprise?! 🥳
I’m super stoked to announce MY NEW COURSE that’s coming soon. It’s crafted to get you a data scientist job in 30-days (once you’ve developed your skills through my R-Track program).
I’m teaching you the tools, tricks, and hacks to accelerate getting a data science job (into as little as 30-days). Check out the 30-Day Data Scientist Page to get a sneak peek at what’s inside!