# ggdensity: A new R package for plotting high-density regions

Written by Matt Dancho

As data scientists, it can be downright impossible to drill into messy data. Fortunately, there’s a new R package that helps us focus on a “high-density region”, which is simply an area in a scatter plot defined by a high percentage of the data points. It’s called ggdensity.

High Density Regions on a Scatter Plot

In this R-tip, I’m going to show you how to hone in on high-density regions under 5-minutes:

1. Learn how to make high-density scatter plots with ggdensity
2. BONUS: Make faceted density plots to drill into over-plotted high-density region data

By the end of this tutorial, you’ll use of high density regions to make insights from groups within your data. For example, here we can see where each Class of Vehicle compares in terms of engine displacement (displ) and highway fuel economy (hwy), answering questions like:

• Is vehicle class a good way to describe vehicle clusters?
• Which vehicle classes have the greatest variation in highway fuel economy versus displacement?
• Which vehicle classes have the highest / lowest highway fuel economy?

Do you see how powerful ggdensity is?

Uncover insights with ggdensity

# Thank You Developers.

Before we move on, please recognize that ggdensity was developed by James Otto, Doctoral Candidate at the Department of Statistical Science, Baylor University. Thank you for everything you do! Also, the full documentation for ggdensity can be accessed here.

# ggdensity Tutorial

Let’s dive into using ggdensity so we can show you how to make high-density regions on your scatter plots.

## 💡 Step 1: Load the Libraries and Data

### First, run this code to load the R libraries:

Load tidyverse , tidyquant, and ggdensity.

### Next, run this code to pull in the data.

We’ll read in the mpg data set that was comes with ggplot2.

We want to understand how highway fuel economy relates to engine size (displacement) and to see if there are clusters by vehicle class.

## 💡 Step 2: Make a basic ggplot

Next, make a basic ggplot using the following code. This creates a scatter plot with the colors that change by vehicle class. I won’t go into all of the mechanics, but you can download my R cheat sheet to learn more about ggplot and the grammar of graphics.

Here’s what the plot looks like. Do you see how it’s really tough to pull out the clusters in there? Each of the points overlap which makes understanding the group structure in the data very tough.

## Step 3: Add High Density Regions

Ok, now that we have a basic scatter plot, we can make a quick alteration by adding high density regions that capture 90% and 50% of the data. We use geom_hdr(probs = c(0.9, 0.5, alpha = 0.35) to accomplish the next plot.

Let’s see what we have here.

We can now see where the clusters have the highest density. But there’s still a problem called “overplotting”, which is when too many graphics get plot on top of each other.

# 💡 BONUS: Overplotting solved!

Here’s the problem we’re facing: overplotting. We simply have too many groups that are too close together. Let’s see how to fix this.

The fix is pretty simple. Just use facetting from ggplot2.

And, voila! We can easily inspect the clusters by vehicle class.

# 💡 Conclusions

You learned how to use the ggdensity library to create high-density regions that help us understand the clusters within our data. Great work! But, there’s a lot more to becoming a Business Scientist.

