# ggdist: Make a Raincloud Plot to Visualize Distribution in ggplot2

Written by Matt Dancho

The `ggdist`

package is a `ggplot2`

extension that is made for visualizing distributions and uncertainty. We’ll show see how `ggdist`

can be used to make a **raincloud plot.**

# What is a Raincloud Plot?

**The Raincloud Plot** is a visualization that produces a half-density to a distribution plot. It gets the name because the density plot is in the shape of a “raincloud”. The raincloud (half-density) plot enhances the traditional box-plot by highlighting multiple modalities (an indicator that groups may exist). The boxplot does not show where densities are clustered, but the raincloud plot does!

Raincloud Plot (We'll make in this tutorial)

We’ll go through a short tutorial to get you up and running with `ggdist`

to make a raincloud plot.

# Raincloud Plots with `ggdist`

[Tutorial]

This tutorial showcases the awesome power of `ggdist`

for visualizing distributions.

## Load the Libraries and Data

First, run this code to:

**Load Libraries:**Load`ggdist`

,`tidyquant`

, and`tidyverse`

.**Import Data:**We’re using the`mpg`

dataset that comes with`ggplot2`

.

## Raincloud Plot: Using ggplot

Next, we’ll make a Raincloud plot that highlights the distribution of Vehicle Fuel Economy (MPG) by Engine Size (Number of Cylinders). It helps if you have `ggplot2`

visualization experience. If you are interested in learning `ggplot2`

in-depth, check out our R for Business Analysis Course (DS4B 101-R) that contains over 30-hours of video lessons on learning R for data analysis.

### Make the ggplot2 canvas

The first step is to make the `ggplot2`

canvas. We:

**Prep the Data:**Using`filter()`

to isolate the most common (frequent) vehicle engine sizes**Map the columns:**Using`ggplot()`

, we map the cyl and hwy column. We also make a transformation to convert a numeric cyl column to a discrete cyl column with`factor()`

.

This produces a blank plot, which is the first layer. You can see that the x-axis is labeled “factor(cyl)” and the y-axis is “hwy” indicating the data has been mapped to the visualization.

### Add the Rainclouds with `stat_halfeye())`

Next, we add our first geometry layer using `ggdist::stat_halfeye()`

. This produces a Half Eye visualization, which is contains a half-density and a slab-interval. We remove the slab interval by setting `.width = 0`

and `point_colour = NA`

. The half-density remains.

And here’s the output. We can see the half-denisty distributions for fuel economy (hwy) by engine size (cyl).

### Add the Boxplot with `geom_boxplot()`

Next, add the second geometry layer using `ggplot2::geom_boxplot()`

. This produces a narrow boxplot. We reduce the `width`

and adjust the opacity.

And here’s the output. We now have a boxplot and half-density. We can see how the distributions vary compared to the median and inner-quartile range.

### Add the Dot Plots with `stat_dots()`

Next, add the third geometry layer using `ggdist::stat_dots()`

. This produces a half-dotplot, which is similar to a histogram that indicates the number of samples (number of dots) in each bin. We select `side = "left"`

to indicate we want it on the left-hand side.

And here’s the output. We now have the three main geometries completed.

### Making the plot look professional

We can clean up our plot with a professional-looking theme using `tidyquant::theme_tq()`

. We’ll also rotate it with `coord_flip()`

to give it the raincloud appearance.

We’ve just finalized our plot. We can see clearly that the distribution of the 6-cylinder is bi-modal, something you can’t tell with an ordinary boxplot. We should investigate why there are so many dots in 6-cylinder with low highway-fuel economy. We’ll save that for another R-Tip.

# Summary

We learned how to make Raincloud Plots with `ggdist`

. **But, there’s a lot more to visualiztion.**

It’s critical to **learn how to visualize** with `ggplot2`

, which is the premier framework for data visualization in R.

If you’d like to learn `ggplot2`

, data visualizations, and data science for business with R, then read on. 👇

