Roupen Khanjian: Parks and Recreation Text Analysis

Roupen Khanjian

Code

library(plotly) # Create Interactive Web Graphics via 'plotly.js', CRAN v4.9.3
library(tidyverse) # Easily Install and Load the 'Tidyverse', CRAN v1.3.0
library(tidytext) # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools, CRAN v0.3.0
library(textdata) # Download and Load Various Text Datasets, CRAN v0.4.1
library(ggwordcloud) # A Word Cloud Geom for 'ggplot2', CRAN v0.5.0
library(glue) # Interpreted String Literals, CRAN v1.4.2
library(here) # A Simpler Way to Find Your Files, CRAN v1.0.1
library(janitor) # Simple Tools for Examining and Cleaning Dirty Data, CRAN v2.1.0
library(tvthemes) # TV Show Themes and Color Palettes for 'ggplot2' Graphics, CRAN v1.1.1
library(ggimage) # Use Image in 'ggplot2', CRAN v0.2.8
library(ggpubr) # 'ggplot2' Based Publication Ready Plots, CRAN v0.4.0
library(patchwork) # The Composer of Plots, CRAN v1.1.1
library(kableExtra) # Construct Complex Table with 'kable' and Pipe Syntax, CRAN v1.3.2
library(knitr) # A General-Purpose Package for Dynamic Report Generation in R, CRAN v1.31
library(slider) # Sliding Window Functions, CRAN v0.1.5
library(rcartocolor) # 'CARTOColors' Palettes, CRAN v2.0.0

Data Introduction

Parks and recreation was a television comedy show that aired on NBC from 2009 until 2015. I obtained the complete transcripts and performed text analysis on the dialogue of the show.

Citation for dataset: He, Luke. (2019, November 23) Park and Recreation Scripts. Link to data.

Code

file_names <- list.files(here("_texts", 
                              "parks-and-recreation-text-analysis", 
                              "scripts")) # file names for each episode

parks <- str_glue("scripts/{file_names}") %>% 
  map_dfr(read_csv) # read in all the episodes into one data frame!

# Tokenize lines to one word in each row
parks_token <- parks %>% 
  clean_names() %>% 
  unnest_tokens(word, line) %>% # tokenize
  anti_join(stop_words) %>% # remove stop words
  mutate(word = str_extract(word, "[a-z']+")) %>% # extract words only
  drop_na(word) # take out missing values

# Filter the top 9 characters with the most words
top_characters <- parks_token %>%
  dplyr::filter(character != "Extra") %>% 
  count(character, sort = TRUE) %>%
  slice_max(n, n = 10) 

# Obtain words only from the top 10 characters
parks_words <- parks_token %>% 
  inner_join(top_characters) %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% 
  select(-n) %>% 
  count(word, character, sort = TRUE) %>% 
  ungroup() %>% 
  group_by(character) %>% 
  slice_max(n, n = 8, with_ties = FALSE) # top 8

# Sample of a few lines from the show
parks %>% 
  filter(!Line %in% str_extract_all(Line, "\\d+")) %>% # remove lines with only digits
  filter(!Line %in% str_subset(Line, "^#")) %>% # remove lines with '#NAME?'
  slice(sample(1:65884, 20)) %>% 
  kbl(caption = "<b style = 'color:white;'>
       Sample of a few randomly chosen lines from Parks and Recreation.") %>%
  kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
  row_spec(0, color = "white", background = "#222222") %>%
  scroll_box(width = "100%", height = "300px", 
             fixed_thead = list(enabled = T, background = "#222222"))

Table 1: **Sample of a few randomly chosen lines from Parks and Recreation.**
Character	Line
April Ludgate	And quiet.
Leslie Knope	“It’s hilarious.”
Ron Swanson	Death is natural, Andrew.
Andy Dwyer	Ew, Starlight Express, the original cast recording, act 1.
Extra	No.
Leslie Knope	Although she felt the law unjust, she acknowledged that she had broken it, and she nobly accepted her punishment to be set adrift on Lake Michigan like a human popsicle.
Ben Wyatt	Oh, thank God you’re still here.
Leslie Knope	Rebecca Varuvian.
Leslie Knope	Drilling holes, painting, removing wainscoting, she’s tearing down the gazebo.
Andy Dwyer	And it is my very favorite non-alcoholic hot drink, except for hot tea.
April Ludgate	We should just directly apply the food to your clothes.
Dave Sanderson	So, yeah, I guess I’m in love with the Army.
Leslie Knope	Thank you.
Leslie Knope	My campaign manager and I are in love.
Donna Meagle	It’s great for your back, and your rear.
Chief Trumple	But the Newports run this town.
Leslie Knope	So, if I’m hearing you correctly, you’re telling me you’re not thinking about leaving Pawnee.
Leslie Knope	It’s evidence.
All	Recall Knope!
Tom Haverford	Really?

Word Count of Major Characters

It’s difficult to choose a favorite character from Parks and Rec, thus I plotted the top 8 most frequently used words from ten characters. Some examples of words that would resonate with fans of the show are Chris Traeger’s literally, Jerry (Gary) Gergich’s geez, or Ben Wyatt’s uh.

Code

ggplot(data = parks_words, 
       aes(x = n, y = word, fill = n)) +
  geom_col() +
  scale_fill_viridis_c(option = "plasma") +
  facet_wrap(~character, scales = "free") +
  theme_brooklyn99() +
  theme(panel.grid.major.y = element_blank(),
        axis.text.x = element_text(size = 8.5),
        axis.text.y = element_text(size = 6.5),
        axis.title = element_blank(),
        panel.grid.minor = element_blank(),
        strip.text = element_text(color = "white",
                                  face = "bold",
                                  size = 9),
        legend.background = element_rect(colour = "#0053CD"),
        legend.title = element_blank())

Wordcloud

Below are four wordclouds of the 25 most frequently used words by the following characters starting from the upper left hand corner going clockwise: Andy Dwyer, April Ludgate, Ron Swanson, and Leslie Knope. We can see Andy Dwyer’s enthusiasm with karate and band, Leslie Knope’s love for pawnee, city, and parks, but also Ron Swanson’s contempt for government and his 2 ex-wives both named tammy.

Code

# Ron Swanson

swanson_words <- parks_token %>% 
  filter(character == "Ron Swanson") %>% # filter for character
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25) # choose top 25 words
  
swanson_pic <- jpeg::readJPEG(here("_texts",
                                   "parks-and-recreation-text-analysis",
                                   "images",
                                   "ron_swanson.jpg")) 

swanson_cloud <- ggplot(data = swanson_words,
                        aes(label = word)) +
  background_image(swanson_pic) + # add image of character
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "circle") +
  scale_size_area(max_size = 6) +
  theme_void()

# Lesile Knope

knope_words <- parks_token %>% 
  filter(character == "Leslie Knope") %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25)
  
knope_pic <- jpeg::readJPEG(here("_texts",
                                 "parks-and-recreation-text-analysis",
                                 "images", 
                                 "knope.jpg"))

knope_cloud <- ggplot(data = knope_words,
                        aes(label = word)) +
  background_image(knope_pic) +
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "star") +
  scale_size_area(max_size = 6) +
  theme_void()

# April Ludgate

april_words <- parks_token %>% 
  filter(character == "April Ludgate") %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25)
  
april_pic <- jpeg::readJPEG(here("_texts",
                                 "parks-and-recreation-text-analysis",
                                 "images", 
                                 "april.jpeg"))

april_cloud <- ggplot(data = april_words,
                        aes(label = word)) +
  background_image(april_pic) +
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "triangle-upright") +
  scale_size_area(max_size = 6) +
  theme_void()

# Andy Dwyer

andy_words <- parks_token %>% 
  filter(character == "Andy Dwyer") %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25)
  
andy_pic <- jpeg::readJPEG(here("_texts",
                                "parks-and-recreation-text-analysis",
                                "images", 
                                "andy.jpg"))

andy_cloud <- ggplot(data = andy_words,
                        aes(label = word)) +
  background_image(andy_pic) +
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "diamond") +
  scale_size_area(max_size = 6) +
  theme_void()

# Final patcwork wordcloud

patchwork <-  (andy_cloud + april_cloud) / (knope_cloud + swanson_cloud) 

patchwork & theme(plot.background = element_rect(fill = "#222222",
                                                 color = "#222222"),
                  strip.background = element_rect(fill = "#222222",
                                                 color = "#222222"))

Character Sentimemnt Analysis

Using the nrc lexicon, which bins 13,901 words into 8 emotions, along with giving them a positive or negative rating, I plotted the counts of each sentiment for ten characters. We see that all the characters shown here use more positive words, and they all used words associated with trust and anticipation.

Citation for NRC lexicon: Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turney, Computational Intelligence, 29 (3), 436-465, 2013. nrc lexicon

Code

characters_sent <-  parks_token %>%
  inner_join(top_characters) %>%
  filter(!word %in% c("hey", "yeah", "gonna")) %>%
  select(-n) %>% 
  inner_join(get_sentiments("nrc")) %>% 
  count(sentiment, character, sort = TRUE)

ggplot(data = characters_sent, 
       aes(x = n, y = sentiment, fill = n)) +
  geom_col() +
  scale_fill_viridis_c(breaks = seq(1000, 5000, 2000),
                       option = "plasma") +
  facet_wrap(~character, scales = "free") +
  theme_brooklyn99() +
  theme(panel.grid.major.y = element_blank(),
        axis.text.x = element_text(size = 6.5),
        axis.text.y = element_text(size = 6),
        axis.title = element_blank(),
        panel.grid.minor = element_blank(),
        strip.text = element_text(color = "white",
                                  face = "bold",
                                  size = 8.5),
        legend.background = element_rect(colour = "#0053CD"),
        legend.title = element_blank(),
        legend.text = element_text(size = 7))

Trajectory of Sentiment

Parks and Recreation is a hilarious comedy show with many enjoyable characters. Thus, it’s no surprise that for most of the show the average sentiment is more positive. Using the AFINN lexicon, which assigns words a score between -5 (negative sentiment) and 5 (positive sentiment), I obtained the moving average with a window size of 151, and plotted the moving average sentiment throughout the entirety of the show.

Citation for AFINN lexicon: AFINN, Nielson, Finn Årup. Informatics and Mathematical Modelling, Technical University of Denmark. March 2011. AFINN lexicon

Code

parks_afinn <- parks_token %>% 
  inner_join(get_sentiments("afinn")) %>%
  drop_na(value) %>% 
  mutate(index = seq(1, length(word) ,1)) %>% # make an index
  mutate(moving_avg = as.numeric(slide(value, # get moving average
                                       mean, 
                                       .before = (151 - 1)/2 , 
                                       .after = (151 - 1)/2 ))) %>% 
  mutate(neg_pos = factor(case_when(
    moving_avg > 0 ~ "Positive",
    moving_avg <= 0 ~ "Negative"
  ),levels = c("Positive", "Negative"),
  labels = c("Positive", "Negative"), ordered = TRUE))

sent_plot <- ggplot(data = parks_afinn, aes(x = index, y = moving_avg)) +
  geom_col(aes(fill = neg_pos)) +
  scale_fill_manual(values = c("Positive" = "springgreen2",
                               "Negative" = "darkred"))+
  theme_minimal() +
  labs(x = "Index",
       y = "Moving Average AFINN Sentiment",
       fill = "") +
  theme(panel.grid.minor.y = element_blank(),
        panel.grid.major.x = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_text(size = 11,
                                   face = "bold",
                                   color = "white"),
        axis.title.y = element_text(color = "white",
                                  size = 12,
                                  face = "bold"),
        axis.title.x = element_blank(),
        panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "#222222", 
                                       color = "#222222"),
        strip.background = element_rect(fill = "#222222", 
                                        color = "#222222"),
        legend.text = element_text(color = "white",
                                  size = 11,
                                  face = "bold"))

sent_plot

Sentiment Anaylsis of Season 4

I decided to take a closer look at the sentiment throughout season 4 since this was one of the more popular seasons, where Leslie Knope is campaigning to be a member of the city council of Pawnee, Indiana. Here I used a moving average window of 51 to plot the AFINN sentiment value. We see that for most of the season the overall average sentiment is positive, except for a noticeable drop near the end of the season where the sentiment score falls around -1.

Code

file_names_season <- str_sub(file_names, start = 3L)

# used this line of code to easily find the episode number of each season
# which(file_names_season == "e01.csv")

season_4 <- str_glue("scripts/{file_names[47:68]}") %>% 
  map_dfr(read_csv)

# Tokenize lines to one word in each row
season_token <-  season_4 %>% 
  clean_names() %>% 
  unnest_tokens(word, line) %>% # tokenize
  anti_join(stop_words) %>% # remove stop words
  mutate(word = str_extract(word, "[a-z']+")) %>% # extract words only
  drop_na(word) # take out missing values

season_afinn <- season_token %>% 
  inner_join(get_sentiments("afinn")) %>%
  drop_na(value) %>% 
  mutate(index = seq(1, length(word) ,1)) %>% 
  mutate(moving_avg = as.numeric(slide(value,
                                       mean, 
                                       .before = (51 - 1)/2 , 
                                       .after = (51 - 1)/2 ))) 


season_plot <- ggplot(data = season_afinn, aes(x = index, y = moving_avg)) +
  geom_col(aes(fill = moving_avg)) +
  # scale_fill_distiller(type = "div",
  #                      palette = "GnPR")+
  scale_fill_carto_c(type = "diverging",
                     palette = "Earth") +
  theme_minimal() +
  labs(x = "Index",
       y = "Moving Average AFINN Sentiment",
       fill = "") +
  theme(panel.grid.minor.y = element_blank(),
        panel.grid.major.x = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_text(size = 11,
                                   face = "bold",
                                   color = "white"),
        axis.title.y = element_text(color = "white",
                                  size = 12,
                                  face = "bold"),
        axis.title.x = element_blank(),
        panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "#222222", 
                                       color = "#222222"),
        strip.background = element_rect(fill = "#222222", 
                                        color = "#222222"),
        legend.text = element_text(color = "white",
                                  size = 11,
                                  face = "bold"))

season_plot

Digging into the data I found that this occurred during the penultimate episode of the season named “Bus Tour”. The episode starts with Lesile Knope behind in polls to her opponent in the city council race, Bobby Newport. During one of her campaign stops, in response to a question by a reporter, Lesile starts saying disparaging things about Bobby’s father. After she is finished, the reporter informs Leslie her question was about if she had any comments about his death earlier in the day. Meanwhile, in order to get people to the polls, Lesile’s team trys to secure vans to transport possible voters. But Bobby Newport’s team has secured all the vans in the city. Thus, most of the episode is spent trying to do damage control for Lesile and her campaign team’s mishaps. Below are the words that have AFINN ratings during this dip in sentiment in season 4.

Code

# Investigate the negative dip of the plot
season_afinn_neg <- season_afinn %>% 
  filter(moving_avg < -0.75) %>% 
  slice(-c(1:2)) %>% 
  select(-index) %>% 
  rename('moving average' = moving_avg)

# How I figured out which episode it was
season_4_subset <- season_4 %>% 
  filter(Character == "Bill") 

# Table of words
season_afinn_neg %>% 
  kbl(caption = "<b style = 'color:white;'>
       What was happening towards the end of season 4 of Park and Recreation when things went south?") %>%
  kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
  row_spec(0, color = "white", background = "#222222") %>%
  scroll_box(width = "100%", height = "300px", 
             fixed_thead = list(enabled = T, background = "#222222"))

Table 2: **What was happening towards the end of season 4 of Park and Recreation when things went south?**
character	word	value	moving average
Bill	grand	3	-0.7647059
Tom Haverford	demands	-1	-0.7647059
Tom Haverford	crying	-2	-0.8431373
Leslie Knope	promise	1	-0.8235294
Leslie Knope	stop	-1	-0.8823529
Leslie Knope	intimidating	-2	-0.8823529
Leslie Knope	bullying	-2	-0.9019608
Leslie Knope	jerk	-3	-0.9019608
Leslie Knope	wrong	-2	-0.9019608
Leslie Knope	died	-3	-0.8627451
Leslie Knope	sad	-2	-0.9215686
Extra	sad	-2	-0.9215686
Leslie Knope	bummer	-2	-0.8039216
Leslie Knope	jerk	-3	-0.7647059
Perd Hapley	love	3	-0.7647059
Jennifer Barkley	cancel	-1	-0.7843137
Leslie Knope	emergency	-2	-0.8235294
Leslie Knope	trust	1	-0.8431373
Leslie Knope	died	-3	-0.9019608
Leslie Knope	awful	-3	-0.8823529
Leslie Knope	died	-3	-0.7843137
Ann Perkins	dead	-3	-0.7843137
Ann Perkins	jerk	-3	-0.8823529
Leslie Knope	jerk	-3	-0.9411765
Leslie Knope	polluted	-2	-0.9803922
Ben Wyatt	stop	-1	-1.0980392
Ben Wyatt	stop	-1	-1.1372549
Ann Perkins	fine	2	-1.1568627
Ann Perkins	stop	-1	-1.1960784
Ann Perkins	apologize	-1	-1.1960784
Chris Traeger	worst	-3	-1.1764706
Chris Traeger	stop	-1	-1.0980392
Chris Traeger	stops	-1	-1.0588235
Chris Traeger	stop	-1	-1.0392157
Chris Traeger	stopping	-1	-0.9607843
Chris Traeger	death	-2	-0.8627451
Leslie Knope	beautiful	3	-0.8431373
Leslie Knope	classy	3	-0.8235294
Donna Meagle	free	1	-0.8235294
Donna Meagle	huge	1	-0.7647059
Bill	yeah	1	-0.8627451
Bill	hell	-4	-0.8039216
Bill	free	1	-0.8039216
Bill	pay	-1	-0.7843137