How to Scrape Word Documents with R

Written by Matt Dancho on September 16, 2020

This article is part of a R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks.

Today we discuss an awesome skill for automating data collection from word documents:

(Click image to play video)

Here’s a common situation, you’re company has LOTS OF WORD FILES.

They contain tables of information that look like this:

Word Tables

Thinking like a programmer, you can extract this data using officer:

With a little bit of data wrangling with the tidyverse, you’ve got your table extracted & formatted:

Then you use ggplot2 to make a sweet plot:

Whoa - Look at 201! Getting a high “Activity Ratio” - Ratio of Lessons completed to Number of Students Enrolled:

You’ve just automated extracting word tables in R. BOOM! 💥💥💥


  1. Sign Up to Get the R-Tips Weekly (You’ll get email notifications of NEW R-Tips as they are released):

  2. Set Up the GitHub Repo:

  3. Check out the setup video ( Or, Hit Pull in the Git Menu to get the R-Tips Code

Once you take these actions, you’ll be set up to receive R-Tips with Code every week. =)