ggside: A new R package for plotting distributions in side-plots
Written by Matt Dancho
I fell in love with a new ggplot2 extension. It made my life much simpler to help me uncover relationships in my complex business data.
ggside is a new R package uses “marginal distribution plots”, which are the density side-plot panels to the top and right of scatter (made popular by the Python Seaborn package). Let’s get you up and running with
ggside in under 5-minutes with this quick R-Tip.
SPECIAL ANNOUNCEMENT: How To Become A 6-Figure Business Scientist (Even In A Recession) on June 28th
Inside the workshop I’ll share how to become exactly what companies need right now (and earn 17% more than a data scientist):
What: How To Become A 6-Figure Business Scientist (Even In A Recession)
When: Wednesday June 28th, 2pm EST
How It Will Help You: Data science in 2023 has changed. The 10+ person data science team is out. And the one-person Business Scientist is in. I’ll show you how to become a 1-person data science team inside my LIVE 6-figure business scientist masterclass.
Price: Does Free sound good?
How To Join: 👉 Register Here
R-Tips Weekly Newsletter
This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks one R-tip at a time.
Here are the links to get set up. 👇
This Tutorial Is Available In Video
I have a companion video tutorial that shows even more secrets (plus mistakes to avoid). And, I’m finding that a lot of my students prefer the dialogue that goes along with coding. So check out this video to see me running the code in this tutorial. 👇
Watch my 5-minute tutorial on YouTube
What are Marginal Distributions?
Marginal Distribution (Density) plots are a way to extend your numeric data with side plots that highlight the density (histogram or boxplots work too).
Marginal Distribution Plots were made popular with the seaborn jointplot() side-panels in Python. These add side plots that highlight distributions.
Side-Plot Tutorial with
Marginal distributions can now be made in R using
ggside, a new ggplot2 extension. You can make linear regression with marginal distributions using histograms, densities, box plots, and more. Bonus - The side panels are super customizable for uncovering complex relationships.
Here are two examples of what you will do in this tutorial! 👇
Plot 1: Linear Regression with Marginal Distribution (Density) Side-Plots (Top and Left)
The first plot you’ll make…
Plot 2: Facet-Plot with Marginal Box Plots (Top)
The second plot you’ll make…
Thank You Developers
I want to thank jtlandis for his amazing software contribution. JT is a data scientist at the University of North Carolina at Chapel Hill and and R Developer who created
ggside. Thank you for all you do!
Before we get started, get the Cheat Sheet
ggside is great for making marginal distribution side plots. But, you'll still need to learn how to visualize data with ggplot2. For those topics, I'll use the Ultimate R Cheat Sheet to refer to
ggplot2 code in my workflow.
Download the Ultimate R Cheat Sheet. Then Click the "CS" next to "ggplot2" which opens the Data Visualization with Dplyr Cheat Sheet.
Now you're ready to quickly reference
Start By Loading The Libraries & Data
The libraries we'll need today are patchwork, ggridges, ggrepel, maps, tidyverse, and lubridate. All packages are available on CRAN and can be installed with
install.packages(). Note - I'm using the development version of
ggside, which is what I recommend in the YouTube Video .
Get the Code
The dataset is the mpg data that comes with ggplot2.
Plot 1: Linear Regression with Marginal Distribution Plot
We'll start by replicating what you can do in Python's Seaborn jointdist() Plot. We'll accomplish this with
We set up the plot just like a normal ggplot.
Refer to the Ultimate R Cheat Sheet for:
Next we add from ggside:
geom_xsidedensity() - Adds a side density panel (top panel).
geom_ysidedensity() - Adds a side density panel (right panel).
The trick is using the
after_stat(density), which makes an awesome looking marginal density side panel plot. I increased the size of the marginal density panels with the
Loess Regression w/ Marginal Density
We generate the regression plot with marginal distributions (density) to highlight key differences between the automobile classes. We can see:
- Pickup, SUV - Have the lowest Highway Fuel Economy (MPG)
- 2seater, Compact, Midsize, Subcompact - Have the highest Highway Fuel Economy
Plot 2. Faceted Side-Panels
Next, let's try out some advanced functionality. I want to see how ggside handles faceted plots, which are subplots that vary based on a categorical feature. We'll use the "cyl" column to facet, which is for engine size (number of cylinders).
Faceted Side Panels? No problem.
Awesome! I have included facets by "cyl", which creates four plots based on the engine size. ggside picked up on the facets and has made 4 side-panel plots.
You learned how to use
ggside. Great work! But, there’s a lot more to becoming a Business Scientist (my term for an incredibly valuable data scientist that has business problem-solving skills).
If you’d like to become a Business Scientist…
With an awesome 6-figure data science career, improved quality of life, a fulfilling job that helps your business, and all the fun that comes along with a career that gives you the freedom to be creative and a problem solver in industry, then I would love to help you.
Do You Need Help Becoming A Business Data Scientist Right Now?
YOU know the feeling. Being unhappy with your current job.
Promotions aren’t happening. You’re stuck. Hopeless. Confused…
And you’re praying that the next data science interview will go better than the last 12…
… But you know it won’t. Not unless you take control of your career.
The good news is…
I Can Help You Speed It Up.
I’ve helped 5,897+ students learn data science for business from an elite business consultant’s perspective.
I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.
And I built a training program that gets my students life-changing data science careers (don’t believe me? see my testimonials here):
6-Figure Data Science Job at CVS Health ($125K)
Senior VP Of Analytics At JP Morgan ($200K)
50%+ Raises & Promotions ($150K)
Lead Data Scientist at Northwestern Mutual ($175K)
2X-ed Salary (From $60K to $120K)
2 Competing ML Job Offers ($150K)
Promotion to Lead Data Scientist ($175K)
Data Scientist Job at Verizon ($125K+)
Data Scientist Job at CitiBank ($100K + Bonus)
Whenever you are ready, here’s how I can help you:
Here’s the system that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…
Join My 5-Course R-Track Program
(And Become The Data Scientist You Were Meant To Be...)
P.S. - Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). This could be you.