Super-FAST EDA in R with DataExplorer

Written by Matt Dancho on March 2, 2021

This article is part of a R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks.

Did you know most Data Scientists spend 80% of their time just trying to understand and prepare data for analysis?! This process is called Exploratory Data Analysis (EDA). R has an Insane EDA​ productivity-enhancer. It’s called DataExplorer​.

Here are the links to get set up. 👇

(Click image to play tutorial)

Use DataExplorer for EDA
Exploratory Data Analysis

You're making this DataExplorer EDA Report!

Super-FAST Exploratory Data Analysis (EDA) in R

In this weekly R-Tip, we're making an "EDA Report", created with the DataExplorer R package. The DataExplorer Package is an excellent package for Exploratory Data Analysis. In fact, it's one of my top 3 EDA Packages.

PRO TIP: I've added EDA on Page 3 of my Ultimate R Cheatsheet. 👇

As you follow along, you can use my Ultimate R Cheatsheet. It consolidates the most important R packages (ones I use every day) into one cheatsheet.

EDA Report with Data Explorer
Automatic Exploratory Reporting

One of the coolest features of DataExplorer is the ability to create an EDA Report in 1 line of code. This automates:

  • Basic Statistics
  • Data Structure
  • Missing Data Profiling
  • Continuous and Categorical Distribution Profiling (Histograms, Bar Charts)
  • Relationships (Correlation)

Ultimately, this saves the analyst/data scientist SO MUCH TIME. 🚀

DataExplorer EDA Plots
Add the important DataExplorer report plots to your R-Code

DataExplorer just makes EVERYTHING SO EASY. Here's an example of the output of plot_correlations(). In one line of code, we get a correlation heatmap correlation heatmap with categorical data dummied.

It gets better. Everything is one line of code:

  • plot_intro(): Plots the introduction to the dataset
  • plot_missing(): Plots the missing data
  • plot_density() and plot_histogram(): Plots the continuous feature distributions.
  • plot_bar(): Plots bar charts for categorical distributions
  • plot_correlation(): Plots relationships

Here's the output of plot_bar(). Wow - DataExplorer makes it that easy to make TIME-SAVING EDA VISUALIZATIONS.

You don't need to be Bruce Almighty to do EDA fast anymore.

👇 Top R-Tips Tutorials you might like:

  1. mmtable2: ggplot2 for tables
  2. ggdist: Make a Raincloud Plot to Visualize Distribution in ggplot2
  3. ggside: Plot linear regression with marginal distributions
  4. DataEditR: Interactive Data Editing in R
  5. openxlsx: How to Automate Excel in R
  6. officer: How to Automate PowerPoint in R
  7. DataExplorer: Fast EDA in R
  8. esquisse: Interactive ggplot2 builder
  9. gghalves: Half-plots with ggplot2
  10. rmarkdown: How to Automate PDF Reporting
  11. patchwork: How to combine multiple ggplots
  12. Geospatial Map Visualizations in R

Want these tips every week? Join R-Tips Weekly.