Parks and Recreation Text Analysis

Text Analysis R

Using {tidytext} to analyze the Parks and Rec script

Roupen Khanjian true
02-25-2021
Code
library(plotly) # Create Interactive Web Graphics via 'plotly.js', CRAN v4.9.3
library(tidyverse) # Easily Install and Load the 'Tidyverse', CRAN v1.3.0
library(tidytext) # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools, CRAN v0.3.0
library(textdata) # Download and Load Various Text Datasets, CRAN v0.4.1
library(ggwordcloud) # A Word Cloud Geom for 'ggplot2', CRAN v0.5.0
library(glue) # Interpreted String Literals, CRAN v1.4.2
library(here) # A Simpler Way to Find Your Files, CRAN v1.0.1
library(janitor) # Simple Tools for Examining and Cleaning Dirty Data, CRAN v2.1.0
library(tvthemes) # TV Show Themes and Color Palettes for 'ggplot2' Graphics, CRAN v1.1.1
library(ggimage) # Use Image in 'ggplot2', CRAN v0.2.8
library(ggpubr) # 'ggplot2' Based Publication Ready Plots, CRAN v0.4.0
library(patchwork) # The Composer of Plots, CRAN v1.1.1
library(kableExtra) # Construct Complex Table with 'kable' and Pipe Syntax, CRAN v1.3.2
library(knitr) # A General-Purpose Package for Dynamic Report Generation in R, CRAN v1.31
library(slider) # Sliding Window Functions, CRAN v0.1.5
library(rcartocolor) # 'CARTOColors' Palettes, CRAN v2.0.0

Data Introduction

Parks and recreation was a television comedy show that aired on NBC from 2009 until 2015. I obtained the complete transcripts and performed text analysis on the dialogue of the show.

Citation for dataset: He, Luke. (2019, November 23) Park and Recreation Scripts. Link to data.

Code
file_names <- list.files(here("_texts", 
                              "parks-and-recreation-text-analysis", 
                              "scripts")) # file names for each episode

parks <- str_glue("scripts/{file_names}") %>% 
  map_dfr(read_csv) # read in all the episodes into one data frame!

# Tokenize lines to one word in each row
parks_token <- parks %>% 
  clean_names() %>% 
  unnest_tokens(word, line) %>% # tokenize
  anti_join(stop_words) %>% # remove stop words
  mutate(word = str_extract(word, "[a-z']+")) %>% # extract words only
  drop_na(word) # take out missing values

# Filter the top 9 characters with the most words
top_characters <- parks_token %>%
  dplyr::filter(character != "Extra") %>% 
  count(character, sort = TRUE) %>%
  slice_max(n, n = 10) 

# Obtain words only from the top 10 characters
parks_words <- parks_token %>% 
  inner_join(top_characters) %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% 
  select(-n) %>% 
  count(word, character, sort = TRUE) %>% 
  ungroup() %>% 
  group_by(character) %>% 
  slice_max(n, n = 8, with_ties = FALSE) # top 8

# Sample of a few lines from the show
parks %>% 
  filter(!Line %in% str_extract_all(Line, "\\d+")) %>% # remove lines with only digits
  filter(!Line %in% str_subset(Line, "^#")) %>% # remove lines with '#NAME?'
  slice(sample(1:65884, 20)) %>% 
  kbl(caption = "<b style = 'color:white;'>
       Sample of a few randomly chosen lines from Parks and Recreation.") %>%
  kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
  row_spec(0, color = "white", background = "#222222") %>%
  scroll_box(width = "100%", height = "300px", 
             fixed_thead = list(enabled = T, background = "#222222"))
Table 1: Sample of a few randomly chosen lines from Parks and Recreation.
Character Line
April Ludgate And quiet.
Leslie Knope “It’s hilarious.”
Ron Swanson Death is natural, Andrew.
Andy Dwyer Ew, Starlight Express, the original cast recording, act 1.
Extra No. 
Leslie Knope Although she felt the law unjust, she acknowledged that she had broken it, and she nobly accepted her punishment to be set adrift on Lake Michigan like a human popsicle.
Ben Wyatt Oh, thank God you’re still here.
Leslie Knope Rebecca Varuvian.
Leslie Knope Drilling holes, painting, removing wainscoting, she’s tearing down the gazebo.
Andy Dwyer And it is my very favorite non-alcoholic hot drink, except for hot tea.
April Ludgate We should just directly apply the food to your clothes.
Dave Sanderson So, yeah, I guess I’m in love with the Army.
Leslie Knope Thank you.
Leslie Knope My campaign manager and I are in love.
Donna Meagle It’s great for your back, and your rear.
Chief Trumple But the Newports run this town.
Leslie Knope So, if I’m hearing you correctly, you’re telling me you’re not thinking about leaving Pawnee.
Leslie Knope It’s evidence.
All Recall Knope!
Tom Haverford Really?

Word Count of Major Characters

It’s difficult to choose a favorite character from Parks and Rec, thus I plotted the top 8 most frequently used words from ten characters. Some examples of words that would resonate with fans of the show are Chris Traeger’s literally, Jerry (Gary) Gergich’s geez, or Ben Wyatt’s uh.

Code
ggplot(data = parks_words, 
       aes(x = n, y = word, fill = n)) +
  geom_col() +
  scale_fill_viridis_c(option = "plasma") +
  facet_wrap(~character, scales = "free") +
  theme_brooklyn99() +
  theme(panel.grid.major.y = element_blank(),
        axis.text.x = element_text(size = 8.5),
        axis.text.y = element_text(size = 6.5),
        axis.title = element_blank(),
        panel.grid.minor = element_blank(),
        strip.text = element_text(color = "white",
                                  face = "bold",
                                  size = 9),
        legend.background = element_rect(colour = "#0053CD"),
        legend.title = element_blank())

Wordcloud

Below are four wordclouds of the 25 most frequently used words by the following characters starting from the upper left hand corner going clockwise: Andy Dwyer, April Ludgate, Ron Swanson, and Leslie Knope. We can see Andy Dwyer’s enthusiasm with karate and band, Leslie Knope’s love for pawnee, city, and parks, but also Ron Swanson’s contempt for government and his 2 ex-wives both named tammy.

Code
# Ron Swanson

swanson_words <- parks_token %>% 
  filter(character == "Ron Swanson") %>% # filter for character
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25) # choose top 25 words
  
swanson_pic <- jpeg::readJPEG(here("_texts",
                                   "parks-and-recreation-text-analysis",
                                   "images",
                                   "ron_swanson.jpg")) 

swanson_cloud <- ggplot(data = swanson_words,
                        aes(label = word)) +
  background_image(swanson_pic) + # add image of character
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "circle") +
  scale_size_area(max_size = 6) +
  theme_void()

# Lesile Knope

knope_words <- parks_token %>% 
  filter(character == "Leslie Knope") %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25)
  
knope_pic <- jpeg::readJPEG(here("_texts",
                                 "parks-and-recreation-text-analysis",
                                 "images", 
                                 "knope.jpg"))

knope_cloud <- ggplot(data = knope_words,
                        aes(label = word)) +
  background_image(knope_pic) +
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "star") +
  scale_size_area(max_size = 6) +
  theme_void()

# April Ludgate

april_words <- parks_token %>% 
  filter(character == "April Ludgate") %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25)
  
april_pic <- jpeg::readJPEG(here("_texts",
                                 "parks-and-recreation-text-analysis",
                                 "images", 
                                 "april.jpeg"))

april_cloud <- ggplot(data = april_words,
                        aes(label = word)) +
  background_image(april_pic) +
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "triangle-upright") +
  scale_size_area(max_size = 6) +
  theme_void()

# Andy Dwyer

andy_words <- parks_token %>% 
  filter(character == "Andy Dwyer") %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25)
  
andy_pic <- jpeg::readJPEG(here("_texts",
                                "parks-and-recreation-text-analysis",
                                "images", 
                                "andy.jpg"))

andy_cloud <- ggplot(data = andy_words,
                        aes(label = word)) +
  background_image(andy_pic) +
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "diamond") +
  scale_size_area(max_size = 6) +
  theme_void()

# Final patcwork wordcloud

patchwork <-  (andy_cloud + april_cloud) / (knope_cloud + swanson_cloud) 

patchwork & theme(plot.background = element_rect(fill = "#222222",
                                                 color = "#222222"),
                  strip.background = element_rect(fill = "#222222",
                                                 color = "#222222"))

Character Sentimemnt Analysis

Using the nrc lexicon, which bins 13,901 words into 8 emotions, along with giving them a positive or negative rating, I plotted the counts of each sentiment for ten characters. We see that all the characters shown here use more positive words, and they all used words associated with trust and anticipation.

Citation for NRC lexicon: Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turney, Computational Intelligence, 29 (3), 436-465, 2013. nrc lexicon

Code
characters_sent <-  parks_token %>%
  inner_join(top_characters) %>%
  filter(!word %in% c("hey", "yeah", "gonna")) %>%
  select(-n) %>% 
  inner_join(get_sentiments("nrc")) %>% 
  count(sentiment, character, sort = TRUE)

ggplot(data = characters_sent, 
       aes(x = n, y = sentiment, fill = n)) +
  geom_col() +
  scale_fill_viridis_c(breaks = seq(1000, 5000, 2000),
                       option = "plasma") +
  facet_wrap(~character, scales = "free") +
  theme_brooklyn99() +
  theme(panel.grid.major.y = element_blank(),
        axis.text.x = element_text(size = 6.5),
        axis.text.y = element_text(size = 6),
        axis.title = element_blank(),
        panel.grid.minor = element_blank(),
        strip.text = element_text(color = "white",
                                  face = "bold",
                                  size = 8.5),
        legend.background = element_rect(colour = "#0053CD"),
        legend.title = element_blank(),
        legend.text = element_text(size = 7))

Trajectory of Sentiment

Parks and Recreation is a hilarious comedy show with many enjoyable characters. Thus, it’s no surprise that for most of the show the average sentiment is more positive. Using the AFINN lexicon, which assigns words a score between -5 (negative sentiment) and 5 (positive sentiment), I obtained the moving average with a window size of 151, and plotted the moving average sentiment throughout the entirety of the show.

Citation for AFINN lexicon: AFINN, Nielson, Finn Årup. Informatics and Mathematical Modelling, Technical University of Denmark. March 2011. AFINN lexicon

Code
parks_afinn <- parks_token %>% 
  inner_join(get_sentiments("afinn")) %>%
  drop_na(value) %>% 
  mutate(index = seq(1, length(word) ,1)) %>% # make an index
  mutate(moving_avg = as.numeric(slide(value, # get moving average
                                       mean, 
                                       .before = (151 - 1)/2 , 
                                       .after = (151 - 1)/2 ))) %>% 
  mutate(neg_pos = factor(case_when(
    moving_avg > 0 ~ "Positive",
    moving_avg <= 0 ~ "Negative"
  ),levels = c("Positive", "Negative"),
  labels = c("Positive", "Negative"), ordered = TRUE))

sent_plot <- ggplot(data = parks_afinn, aes(x = index, y = moving_avg)) +
  geom_col(aes(fill = neg_pos)) +
  scale_fill_manual(values = c("Positive" = "springgreen2",
                               "Negative" = "darkred"))+
  theme_minimal() +
  labs(x = "Index",
       y = "Moving Average AFINN Sentiment",
       fill = "") +
  theme(panel.grid.minor.y = element_blank(),
        panel.grid.major.x = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_text(size = 11,
                                   face = "bold",
                                   color = "white"),
        axis.title.y = element_text(color = "white",
                                  size = 12,
                                  face = "bold"),
        axis.title.x = element_blank(),
        panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "#222222", 
                                       color = "#222222"),
        strip.background = element_rect(fill = "#222222", 
                                        color = "#222222"),
        legend.text = element_text(color = "white",
                                  size = 11,
                                  face = "bold"))

sent_plot

Sentiment Anaylsis of Season 4

I decided to take a closer look at the sentiment throughout season 4 since this was one of the more popular seasons, where Leslie Knope is campaigning to be a member of the city council of Pawnee, Indiana. Here I used a moving average window of 51 to plot the AFINN sentiment value. We see that for most of the season the overall average sentiment is positive, except for a noticeable drop near the end of the season where the sentiment score falls around -1.

Code
file_names_season <- str_sub(file_names, start = 3L)

# used this line of code to easily find the episode number of each season
# which(file_names_season == "e01.csv")

season_4 <- str_glue("scripts/{file_names[47:68]}") %>% 
  map_dfr(read_csv)

# Tokenize lines to one word in each row
season_token <-  season_4 %>% 
  clean_names() %>% 
  unnest_tokens(word, line) %>% # tokenize
  anti_join(stop_words) %>% # remove stop words
  mutate(word = str_extract(word, "[a-z']+")) %>% # extract words only
  drop_na(word) # take out missing values

season_afinn <- season_token %>% 
  inner_join(get_sentiments("afinn")) %>%
  drop_na(value) %>% 
  mutate(index = seq(1, length(word) ,1)) %>% 
  mutate(moving_avg = as.numeric(slide(value,
                                       mean, 
                                       .before = (51 - 1)/2 , 
                                       .after = (51 - 1)/2 ))) 


season_plot <- ggplot(data = season_afinn, aes(x = index, y = moving_avg)) +
  geom_col(aes(fill = moving_avg)) +
  # scale_fill_distiller(type = "div",
  #                      palette = "GnPR")+
  scale_fill_carto_c(type = "diverging",
                     palette = "Earth") +
  theme_minimal() +
  labs(x = "Index",
       y = "Moving Average AFINN Sentiment",
       fill = "") +
  theme(panel.grid.minor.y = element_blank(),
        panel.grid.major.x = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_text(size = 11,
                                   face = "bold",
                                   color = "white"),
        axis.title.y = element_text(color = "white",
                                  size = 12,
                                  face = "bold"),
        axis.title.x = element_blank(),
        panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "#222222", 
                                       color = "#222222"),
        strip.background = element_rect(fill = "#222222", 
                                        color = "#222222"),
        legend.text = element_text(color = "white",
                                  size = 11,
                                  face = "bold"))

season_plot

Digging into the data I found that this occurred during the penultimate episode of the season named “Bus Tour”. The episode starts with Lesile Knope behind in polls to her opponent in the city council race, Bobby Newport. During one of her campaign stops, in response to a question by a reporter, Lesile starts saying disparaging things about Bobby’s father. After she is finished, the reporter informs Leslie her question was about if she had any comments about his death earlier in the day. Meanwhile, in order to get people to the polls, Lesile’s team trys to secure vans to transport possible voters. But Bobby Newport’s team has secured all the vans in the city. Thus, most of the episode is spent trying to do damage control for Lesile and her campaign team’s mishaps. Below are the words that have AFINN ratings during this dip in sentiment in season 4.

Code
# Investigate the negative dip of the plot
season_afinn_neg <- season_afinn %>% 
  filter(moving_avg < -0.75) %>% 
  slice(-c(1:2)) %>% 
  select(-index) %>% 
  rename('moving average' = moving_avg)

# How I figured out which episode it was
season_4_subset <- season_4 %>% 
  filter(Character == "Bill") 

# Table of words
season_afinn_neg %>% 
  kbl(caption = "<b style = 'color:white;'>
       What was happening towards the end of season 4 of Park and Recreation when things went south?") %>%
  kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
  row_spec(0, color = "white", background = "#222222") %>%
  scroll_box(width = "100%", height = "300px", 
             fixed_thead = list(enabled = T, background = "#222222"))
Table 2: What was happening towards the end of season 4 of Park and Recreation when things went south?
character word value moving average
Bill grand 3 -0.7647059
Tom Haverford demands -1 -0.7647059
Tom Haverford crying -2 -0.8431373
Leslie Knope promise 1 -0.8235294
Leslie Knope stop -1 -0.8823529
Leslie Knope intimidating -2 -0.8823529
Leslie Knope bullying -2 -0.9019608
Leslie Knope jerk -3 -0.9019608
Leslie Knope wrong -2 -0.9019608
Leslie Knope died -3 -0.8627451
Leslie Knope sad -2 -0.9215686
Extra sad -2 -0.9215686
Leslie Knope bummer -2 -0.8039216
Leslie Knope jerk -3 -0.7647059
Perd Hapley love 3 -0.7647059
Jennifer Barkley cancel -1 -0.7843137
Leslie Knope emergency -2 -0.8235294
Leslie Knope trust 1 -0.8431373
Leslie Knope died -3 -0.9019608
Leslie Knope awful -3 -0.8823529
Leslie Knope died -3 -0.7843137
Ann Perkins dead -3 -0.7843137
Ann Perkins jerk -3 -0.8823529
Leslie Knope jerk -3 -0.9411765
Leslie Knope polluted -2 -0.9803922
Ben Wyatt stop -1 -1.0980392
Ben Wyatt stop -1 -1.1372549
Ann Perkins fine 2 -1.1568627
Ann Perkins stop -1 -1.1960784
Ann Perkins apologize -1 -1.1960784
Chris Traeger worst -3 -1.1764706
Chris Traeger stop -1 -1.0980392
Chris Traeger stops -1 -1.0588235
Chris Traeger stop -1 -1.0392157
Chris Traeger stopping -1 -0.9607843
Chris Traeger death -2 -0.8627451
Leslie Knope beautiful 3 -0.8431373
Leslie Knope classy 3 -0.8235294
Donna Meagle free 1 -0.8235294
Donna Meagle huge 1 -0.7647059
Bill yeah 1 -0.8627451
Bill hell -4 -0.8039216
Bill free 1 -0.8039216
Bill pay -1 -0.7843137

Citation

For attribution, please cite this work as

Khanjian (2021, Feb. 25). Roupen Khanjian: Parks and Recreation Text Analysis. Retrieved from https://khanjian.github.io/roupen-website/texts/parks-and-recreation-text-analysis/

BibTeX citation

@misc{khanjian2021parks,
  author = {Khanjian, Roupen},
  title = {Roupen Khanjian: Parks and Recreation Text Analysis},
  url = {https://khanjian.github.io/roupen-website/texts/parks-and-recreation-text-analysis/},
  year = {2021}
}