Interactive Principal Component Analysis in R

Written by Matt Dancho



This article is part of a R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks.


Identify Clusters in your Data:

Weโ€™ll make an Interactive PCA visualization to investigate clusters and learn why observations are similar to each other. Here are the links to get set up. ๐Ÿ‘‡


(Click image to play tutorial)

PCA is all about data wrangling

PCA is a great tool for mining your data for clusters. But, most beginners get a few things wrong:

  • PCA only works with numeric data
  • Categorical data must be encoded as numeric data (e.g. one-hot)
  • Numeric data must be scaled (otherwise your PCA will be misleading)


Data Wrangling is SUPER Critical
We need to use dplyr to encode categorical features as numeric.

PCA Data Wrangling

Full code in the video Github Repository


Before Encoding
PCA will not work with Categorical Data
(You'll get a nice error message)

PCA Category Data Conversion


After Encoding
PCA likes data in this format ๐Ÿ˜Š

PCA Encoding Numerical

Visualizing Clusters

What can we do with PCA + ggplot2? Letโ€™s visualize clusters in our data!


First, fit a PCA using prcomp().

Next, use autoplot() from the ggfortify package.


Then visualize. As an added bonus, we can make it interactive with ggplotly()!


That's some XMAS magic. Santa approves. ๐Ÿ‘‡


But if you really want to improve your productivityโ€ฆ

Hereโ€™s how to master R programming and become powered by R. ๐Ÿ‘‡

What happens after you learn R for Business.

Your Job Performance Review after youโ€™ve launched your first Shiny App. ๐Ÿ‘‡

This is career acceleration.


SETUP R-TIPS WEEKLY PROJECT

  1. Get the Code

  2. Check out the R-Tips Setup Video.

Once you take these actions, youโ€™ll be set up to receive R-Tips with Code every week. =)



๐Ÿ‘‡ Top R-Tips Tutorials you might like:

  1. mmtable2: ggplot2 for tables
  2. ggdist: Make a Raincloud Plot to Visualize Distribution in ggplot2
  3. ggside: Plot linear regression with marginal distributions
  4. DataEditR: Interactive Data Editing in R
  5. openxlsx: How to Automate Excel in R
  6. officer: How to Automate PowerPoint in R
  7. DataExplorer: Fast EDA in R
  8. esquisse: Interactive ggplot2 builder
  9. gghalves: Half-plots with ggplot2
  10. rmarkdown: How to Automate PDF Reporting
  11. patchwork: How to combine multiple ggplots
  12. Geospatial Map Visualizations in R

Want these tips every week? Join R-Tips Weekly.