Here are some R libraries we will need to reproduce the plots for this week’s #TidyTuesday

library(tidyverse)
library(readxl)
library(albersusa)
library(ggthemes) 

Background

The online community, R for Data Science, has started what’s called #TidyTuesday. Every week they post a new dataset along with an original graphic and challenge people to either 1) recreate the plot or 2) create their own take on the data. For the very first challenge, they posted a data set on student tuition across the United States. The data and original graphic can be found here

Data Cleaning

Let’s begin by loading in our data and taking a quick peek at a few columns:

tuition.raw <- read_xlsx(here::here("data/us_avg_tuition.xlsx")) %>% 
  rename(state = State) # read in raw data and rename a column

pander::pander(head(tuition.raw[,1:6]))
state 2004-05 2005-06 2006-07 2007-08 2008-09
Alabama 5683 5841 5753 6008 6475
Alaska 4328 4633 4919 5070 5075
Arizona 5138 5416 5481 5682 6058
Arkansas 5772 6082 6232 6415 6417
California 5286 5528 5335 5672 5898
Colorado 4704 5407 5596 6227 6284

It looks like our data is in wide format. Wide format data can be useful every once in a while, but for our purposes, we’ll want it in long format. For that, we’ll need to use the gather() function.

tuition.clean <- tuition.raw %>% 
  gather(year, tuition, `2004-05`:`2015-16`) %>% # put data in tidy long format
  group_by(state) %>% 
  mutate(lag = lag(tuition, 5),
         pct.change = (tuition-lag)/lag) %>% # compute rolling 5 yr. percent change
  na.omit()

pander::pander(head(tuition.clean))
state year tuition lag pct.change
Alabama 2009-10 7189 5683 0.265
Alaska 2009-10 5455 4328 0.2602
Arizona 2009-10 7263 5138 0.4135
Arkansas 2009-10 6627 5772 0.1481
California 2009-10 7259 5286 0.3732
Colorado 2009-10 6948 4704 0.4772

As we can see, this format will make plotting our data much easier. Also we need to initialize our map data so that we can plot the spatial data from our dataset

# create our map data
us <- usa_composite()
us_map <- broom::tidy(us, region = "name")

At this point we’ve cleaned our data and have it in log format. We should be able to go ahead and plot our data and take a look at the results.

# Plot choropleth map
ggplot() +
  geom_map(data = us_map, map = us_map,
           aes(x = long, y = lat, map_id = id),
           color="#2b2b2b", size=0.05, fill=NA) +
  geom_map(data = filter(tuition.clean, year %in% "2015-16"), 
           map = us_map,
           aes(fill = pct.change, map_id = state),
           color = "white", size = .1) + 
  scale_fill_viridis_c("", labels = scales::percent, option = "A") +
  labs(title = "5 Year Tuition Growth by State 2011-2016") +
  theme_map() + 
  theme(legend.position = "bottom", 
        legend.justification = "center",
        legend.key.width = unit(1.3, "inches"),
        legend.background = element_blank(),
        plot.title = element_text(hjust=0.5, face="bold"),
        plot.background = element_rect(fill="#f7f7f7", color = "transparent"),
        panel.background = element_rect(fill="#f7f7f7", color = "transparent"))

# Plot barbell graphic showing changes in tuition prices
tuition.clean %>% 
  filter(year %in% c("2010-11", "2015-16")) %>% 
  ggplot(aes(x = tuition, 
             y = fct_reorder(state, tuition, min), 
             color = year, 
             group = state)) + 
  geom_line() +
  geom_point() + 
  scale_x_continuous(labels = scales::dollar) +
  scale_color_manual(values = c("#a3c4dc","#0e668b")) + 
  guides(color = FALSE) +
  labs(title = "Tuition Growth from 2010-2016", x = "Tuition", y = "State") + 
  theme_bw() +
  theme(plot.title = element_text(hjust=0.5, face="bold"),
        plot.background = element_rect(fill="#f7f7f7", color = "transparent"),
        panel.background = element_rect(fill="#f7f7f7", color = "transparent"))