Roupen Khanjian: Stop and Frisk in New York City

Roupen Khanjian

Code

# Packages 
library(tidyverse) # Easily Install and Load the 'Tidyverse', CRAN v1.3.0
library(patchwork) # The Composer of Plots, CRAN v1.1.1
library(here) # A Simpler Way to Find Your Files, CRAN v1.0.1
library(janitor) # Simple Tools for Examining and Cleaning Dirty Data, CRAN v2.1.0
library(tidytext) # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools, CRAN v0.3.0
library(textdata) # Download and Load Various Text Datasets, CRAN v0.4.1
library(vader) # Valence Aware Dictionary and sEntiment Reasoner (VADER), CRAN v0.2.1
library(readxl) # Read Excel Files, CRAN v1.3.1
library(lubridate) # Make Dealing with Dates a Little Easier, CRAN v1.7.10
library(gt) # Easily Create Presentation-Ready Display Tables, CRAN v0.2.2
library(kableExtra) # Construct Complex Table with 'kable' and Pipe Syntax, CRAN v1.3.4
library(knitr) # A General-Purpose Package for Dynamic Report Generation in R, CRAN v1.31
library(plotly) # Create Interactive Web Graphics via 'plotly.js', CRAN v4.9.3
library(sf) # Simple Features for R, CRAN v0.9-7
library(tigris) # Load Census TIGER/Line Shapefiles, CRAN v1.0

Introduction

This article was originally completed in python and can be found here: link

In mid 2020, protests took place all over the USA and the world in response to the murder of George Floyd by the Minneapolis Police. Since the advent of video recordings via cell phones, there have been many instances of police brutality captured against Black people. The relationship between law enforcement and the Black community has always been tenuous and now it has been brought to the attention of the rest of the world. In the landmark Supreme court case of Terry v. Ohio (1968) it was ruled that police could stop, question, and frisk a person if they have reasonable suspicion that the person had committed a crime (Brandes, S.A. et al., 2019).

In the 2000’s to the early 2010’s the New York City (NYC) stop and frisk policy garnered national attention due to the high number of stops and profiling of Black people. At the height of the policy, in 2011 there were 658,724 stops recorded with over 50% of the stops targeting Black people (NYCLU 2019). Since then, the number of stops per year has substantially decreased to 13,459 stop in 2019. Opponents of this policy argue this is still too many stops, especially since in 2019 about 66% of the people stopped were innocent. It has been also shown that the stopping of white people more likely led to an arrest in comparison to Black and Hispanic people, implying the police may be targeting minorities and being more mindful of stopping white people (Gelman, A., et. al., 2007). There has been substantial research conducted showing the psychological distress of a stop and frisk policy on communities of color in NYC (Sewell, A. et al., 2016).

In this project I analyze the Stop, Question and Frisk Data from the New York Police Department (NYPD) from the most current three years: 2017, 2018, and 2019 (NYC Stop and Frisk Data). I chose these years for the following reasons: The years 2018-2019 was not included in the most recent NYCLU report. In 2017 the NYPD moved to an electronic form, as opposed to manually writing down a response for each question in the handwritten forms used prior to 2017. Lastly 2017 was the first year of the Trump presidency and I was curious to investigate if his rhetoric on race may have affected law enforcement’s behavior toward minorities.

Data was downloaded from the NYPD website link. In each dataset each row is a stop of a specific person, and each column is a variable. There are a total of 83 different variables in each dataset.Below is a table of the first 10 stops from this dataset.

Code

# Read in data
saf_17 <- read_excel(here("_texts", "stop_frisk_R", "data",
                          "sqf_2017.xlsx"))

saf_18 <- read_excel(here("_texts", "stop_frisk_R", "data",
                          "sqf_2018.xlsx"))

saf_19 <- read_excel(here("_texts", "stop_frisk_R", "data",
                          "sqf_2019.xlsx"))

# fix date column and select columns to be used for analysis
saf_17 <- saf_17 %>%
  mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
  select(
    c(
      "STOP_FRISK_DATE",
      "STOP_FRISK_TIME",
      "YEAR2",
      "MONTH2",
      "DAY2",
      "ISSUING_OFFICER_RANK",
      "SUSPECTED_CRIME_DESCRIPTION",
      "SUSPECT_ARRESTED_FLAG",
      "SUSPECT_ARREST_OFFENSE",
      "OFFICER_IN_UNIFORM_FLAG",
      "FRISKED_FLAG",
      "SEARCHED_FLAG",
      "OTHER_CONTRABAND_FLAG",
      "FIREARM_FLAG",
      "KNIFE_CUTTER_FLAG",
      "OTHER_WEAPON_FLAG",
      "WEAPON_FOUND_FLAG",
      "PHYSICAL_FORCE_CEW_FLAG",
      "PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
      "PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
      "PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
      "PHYSICAL_FORCE_OTHER_FLAG",
      "PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
      "PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
      "PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
      "SUSPECTS_ACTIONS_CASING_FLAG",
      "SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
      "DEMEANOR_OF_PERSON_STOPPED",
      "SUSPECT_REPORTED_AGE",
      "SUSPECT_SEX",
      "SUSPECT_RACE_DESCRIPTION",
      "SUSPECT_BODY_BUILD_TYPE",
      "SUSPECT_OTHER_DESCRIPTION",
      "STOP_LOCATION_PRECINCT",
      "STOP_LOCATION_FULL_ADDRESS",
      "STOP_LOCATION_STREET_NAME",
      "STOP_LOCATION_PATROL_BORO_NAME",
      "STOP_LOCATION_BORO_NAME",
      "SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
    )
  )

saf_18 <- saf_18 %>%
  mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
  select(
    c(
      "STOP_FRISK_DATE",
      "STOP_FRISK_TIME",
      "YEAR2",
      "MONTH2",
      "DAY2",
      "ISSUING_OFFICER_RANK",
      "SUSPECTED_CRIME_DESCRIPTION",
      "SUSPECT_ARRESTED_FLAG",
      "SUSPECT_ARREST_OFFENSE",
      "OFFICER_IN_UNIFORM_FLAG",
      "FRISKED_FLAG",
      "SEARCHED_FLAG",
      "OTHER_CONTRABAND_FLAG",
      "FIREARM_FLAG",
      "KNIFE_CUTTER_FLAG",
      "OTHER_WEAPON_FLAG",
      "WEAPON_FOUND_FLAG",
      "PHYSICAL_FORCE_CEW_FLAG",
      "PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
      "PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
      "PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
      "PHYSICAL_FORCE_OTHER_FLAG",
      "PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
      "PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
      "PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
      "SUSPECTS_ACTIONS_CASING_FLAG",
      "SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
      "DEMEANOR_OF_PERSON_STOPPED",
      "SUSPECT_REPORTED_AGE",
      "SUSPECT_SEX",
      "SUSPECT_RACE_DESCRIPTION",
      "SUSPECT_BODY_BUILD_TYPE",
      "SUSPECT_OTHER_DESCRIPTION",
      "STOP_LOCATION_PRECINCT",
      "STOP_LOCATION_FULL_ADDRESS",
      "STOP_LOCATION_STREET_NAME",
      "STOP_LOCATION_PATROL_BORO_NAME",
      "STOP_LOCATION_BORO_NAME",
      "SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
    )
  )

saf_19 <- saf_19 %>%
  mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
  select( # select for the years 
    c(
      "STOP_FRISK_DATE",
      "STOP_FRISK_TIME",
      "YEAR2",
      "MONTH2",
      "DAY2",
      "ISSUING_OFFICER_RANK",
      "SUSPECTED_CRIME_DESCRIPTION",
      "SUSPECT_ARRESTED_FLAG",
      "SUSPECT_ARREST_OFFENSE",
      "OFFICER_IN_UNIFORM_FLAG",
      "FRISKED_FLAG",
      "SEARCHED_FLAG",
      "OTHER_CONTRABAND_FLAG",
      "FIREARM_FLAG",
      "KNIFE_CUTTER_FLAG",
      "OTHER_WEAPON_FLAG",
      "WEAPON_FOUND_FLAG",
      "PHYSICAL_FORCE_CEW_FLAG",
      "PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
      "PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
      "PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
      "PHYSICAL_FORCE_OTHER_FLAG",
      "PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
      "PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
      "PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
      "SUSPECTS_ACTIONS_CASING_FLAG",
      "SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
      "DEMEANOR_OF_PERSON_STOPPED",
      "SUSPECT_REPORTED_AGE",
      "SUSPECT_SEX",
      "SUSPECT_RACE_DESCRIPTION",
      "SUSPECT_BODY_BUILD_TYPE",
      "SUSPECT_OTHER_DESCRIPTION",
      "STOP_LOCATION_PRECINCT",
      "STOP_LOCATION_FULL_ADDRESS",
      "STOP_LOCATION_STREET_NAME",
      "STOP_LOCATION_PATROL_BORO_NAME",
      "STOP_LOCATION_BORO_NAME",
      "SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
    )
  )

# row bind all 3 years into one data frame
saf <- rbind(saf_17, saf_18, saf_19) 

# clean column names
saf <- saf %>% 
  clean_names()

# view first 5 rows
head(saf, n = 10) %>% 
  kbl(caption = "<b style = 'color:black;'>
       First ten rows of stop and frisk dataset.") %>%
  kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
  row_spec(0, color = "white", background = "#222222") %>%
  scroll_box(width = "100%", height = "300px", 
             fixed_thead = list(enabled = T, background = "#222222"))

Table 1: **First ten rows of stop and frisk dataset.**
stop_frisk_date	stop_frisk_time	year2	month2	day2	issuing_officer_rank	suspected_crime_description	suspect_arrested_flag	suspect_arrest_offense	officer_in_uniform_flag	frisked_flag	searched_flag	other_contraband_flag	firearm_flag	knife_cutter_flag	other_weapon_flag	weapon_found_flag	physical_force_cew_flag	physical_force_draw_point_firearm_flag	physical_force_handcuff_suspect_flag	physical_force_oc_spray_used_flag	physical_force_other_flag	physical_force_restraint_used_flag	physical_force_verbal_instruction_flag	physical_force_weapon_impact_flag	suspects_actions_casing_flag	suspects_actions_proximity_to_scene_flag	demeanor_of_person_stopped	suspect_reported_age	suspect_sex	suspect_race_description	suspect_body_build_type	suspect_other_description	stop_location_precinct	stop_location_full_address	stop_location_street_name	stop_location_patrol_boro_name	stop_location_boro_name	suspects_actions_drug_transactions_flag
2017-01-16	1899-12-31 14:26:00	2017	January	Monday	SGT	TERRORISM	N	(null)	Y	N	N	N	(null)	(null)	(null)	N	(null)	(null)	(null)	(null)	(null)	(null)	Y	(null)	(null)	(null)	TERRORISM	39	MALE	(null)	THN	(null)	1	180 GREENWICH STREET	GREENWICH STREET	PBMS	MANHATTAN	(null)
2017-01-16	1899-12-31 14:26:00	2017	January	Monday	SGT	TERRORISM	N	(null)	Y	N	N	N	(null)	(null)	(null)	N	(null)	(null)	(null)	(null)	(null)	(null)	Y	(null)	(null)	(null)	TERRORISM	37	MALE	(null)	MED	(null)	1	180 GREENWICH STREET	GREENWICH STREET	PBMS	MANHATTAN	(null)
2017-02-08	1899-12-31 11:10:00	2017	February	Wednesday	POM	OTHER	N	(null)	N	N	N	N	(null)	(null)	(null)	N	(null)	(null)	(null)	(null)	(null)	(null)	Y	(null)	(null)	(null)	OTHER	(null)	FEMALE	WHITE	THN	N/A	1	WALL STREET && BROADWAY	WALL STREET	PBMS	MANHATTAN	(null)
2017-02-20	1899-12-31 11:35:00	2017	February	Monday	POM	GRAND LARCENY AUTO	N	(null)	Y	Y	Y	N	(null)	(null)	(null)	N	(null)	(null)	Y	(null)	(null)	(null)	(null)	(null)	(null)	(null)	GRAND LARCENY AUTO	31	MALE	BLACK HISPANIC	U	UNK	1	75 GREENE STREET	GREENE STREET	PBMS	MANHATTAN	(null)
2017-02-21	1899-12-31 13:20:00	2017	February	Tuesday	POM	BURGLARY	N	(null)	Y	N	N	N	(null)	(null)	(null)	N	(null)	(null)	Y	(null)	(null)	(null)	(null)	(null)	(null)	Y	BURGLARY	(null)	FEMALE	BLACK	THN	(null)	1	429 WEST BROADWAY	WEST BROADWAY	PBMS	MANHATTAN	(null)
2017-02-17	1899-12-31 21:25:00	2017	February	Friday	POM	CPW	Y	MENACING	Y	Y	Y	N	(null)	(null)	(null)	N	(null)	(null)	(null)	(null)	(null)	(null)	Y	(null)	(null)	(null)	CPW	39	MALE	WHITE HISPANIC	MED	(null)	1	WEST STREET && CHAMBERS STREET	WEST STREET	PBMS	MANHATTAN	(null)
2017-02-25	1899-12-31 20:00:00	2017	February	Saturday	POM	CPW	N	(null)	Y	N	N	N	(null)	(null)	(null)	N	(null)	(null)	Y	(null)	(null)	Y	Y	(null)	(null)	(null)	CPW	19	FEMALE	BLACK	THN	(null)	1	CHAMBERS STREET && WEST BROADWAY	CHAMBERS STREET	PBMS	MANHATTAN	(null)
2017-02-25	1899-12-31 19:58:00	2017	February	Saturday	POM	CPW	N	(null)	Y	Y	Y	N	(null)	(null)	(null)	N	(null)	(null)	(null)	(null)	(null)	Y	Y	(null)	(null)	Y	NORMAL	15	FEMALE	BLACK	THN	(null)	1	CHAMBERS STREET && WEST BROADWAY	CHAMBERS STREET	PBMS	MANHATTAN	(null)
2017-02-21	1899-12-31 13:15:00	2017	February	Tuesday	POM	BURGLARY	N	(null)	Y	Y	Y	N	(null)	(null)	(null)	N	(null)	(null)	Y	(null)	(null)	(null)	Y	(null)	(null)	Y	PLEASANT	43	MALE	BLACK	HEA	(null)	1	429 WEST BROADWAY	WEST BROADWAY	PBMS	MANHATTAN	(null)
2017-03-03	1899-12-31 08:16:00	2017	March	Friday	POM	CRIMINAL MISCHIEF	Y	CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE	Y	Y	Y	Y	(null)	(null)	(null)	N	(null)	(null)	(null)	(null)	(null)	(null)	Y	(null)	(null)	(null)	NERVOUS	33	MALE	BLACK	THN	(null)	1	CORTLANDT STREET && CHURCH STREET	CORTLANDT STREET	PBMS	MANHATTAN	(null)

Code

nrow(saf)

[1] 36096

In total there were 36,096 stops in the years 2017-2019.

Code

saf %>% 
  count(suspect_race_description) %>% 
  summarise(race = suspect_race_description,
            n = n,
            percent = n / sum(n)) %>% 
  gt() %>% 
  tab_header(
    title = "Which Race was Stop and Frisked the Most?",
    subtitle = "Data from 2017-2019"
  ) %>% 
  fmt_percent( 
    columns = vars(percent),
    decimals = 2
    ) %>% 
  data_color(
    columns = vars(n, percent),
    colors = scales::col_numeric(
      palette = c(
        "lightskyblue", "dodgerblue", "royalblue4") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:3
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:3
    )
  )

Which Race was Stop and Frisked the Most?
Data from 2017-2019
race	n	percent
(null)	420	1.16%
AMER IND	9	0.02%
AMERICAN INDIAN/ALASKAN N	8	0.02%
AMERICAN INDIAN/ALASKAN NATIVE	16	0.04%
ASIAN / PACIFIC ISLANDER	522	1.45%
ASIAN/PAC.ISL	206	0.57%
BLACK	20817	57.67%
BLACK HISPANIC	3102	8.59%
MALE	7	0.02%
WHITE	3266	9.05%
WHITE HISPANIC	7723	21.40%

In the above table we see that the people stopped were mostly described as have one of the following four races: Black, White Hispanic, White, or Black Hispanic.

Data Wrangling and Cleaning

Code

saf_sub <- saf %>% 
  # filter for race in one of the four most saf 
  filter(suspect_race_description %in% c("BLACK", 
                                         "WHITE",
                                         "WHITE HISPANIC",
                                         "BLACK HISPANIC")) %>% 
  # Fill in NA values for nonsense descriptions
  # also change misspellings
  mutate(suspect_other_description = case_when(
    suspect_other_description == "(null)" ~ NA_character_,
    suspect_other_description == "NONE" ~ NA_character_,
    suspect_other_description == "UNK" ~ NA_character_,
    suspect_other_description == "UNKNOWN" ~ NA_character_,
    suspect_other_description == "UNKNOW" ~ NA_character_,
    suspect_other_description == "NO" ~ NA_character_,
    suspect_other_description == "UKNOWN" ~ NA_character_,
    suspect_other_description == "UNKOWN" ~ NA_character_,
    suspect_other_description == "NA" ~ NA_character_,
    suspect_other_description == "N/A" ~ NA_character_,
    suspect_other_description == "0" ~ NA_character_,
    suspect_other_description == 0 ~ NA_character_,
    suspect_other_description == "TATOOS" ~ "TATTOOS",
    suspect_other_description == "TATTOS" ~ "TATTOOS",
    suspect_other_description == "TATOO" ~ "TATTOOS",
    TRUE ~ as.character(suspect_other_description)
  )) %>% 
  # Change unknown sex values to NA
  mutate(suspect_sex = case_when(
    suspect_sex == "(null)" ~ NA_character_,
    suspect_sex == "19" ~ NA_character_,
    suspect_sex == "23" ~ NA_character_,
    suspect_sex == "24" ~ NA_character_,
    suspect_sex == "39" ~ NA_character_,
    TRUE ~ as.character(suspect_sex)
  )) %>% 
  # Change (null) value to No (N) (need to find a better way to do this step)
  mutate(physical_force_cew_flag = case_when(
    physical_force_cew_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_cew_flag)
  )) %>% 
  mutate(physical_force_draw_point_firearm_flag = case_when(
    physical_force_draw_point_firearm_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_draw_point_firearm_flag)
  )) %>% 
  mutate(physical_force_handcuff_suspect_flag = case_when(
    physical_force_handcuff_suspect_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_handcuff_suspect_flag)
  )) %>% 
  mutate(physical_force_oc_spray_used_flag = case_when(
    physical_force_oc_spray_used_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_oc_spray_used_flag)
  )) %>% 
  mutate(physical_force_other_flag = case_when(
    physical_force_other_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_other_flag)
  )) %>% 
  mutate(physical_force_restraint_used_flag = case_when(
    physical_force_restraint_used_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_restraint_used_flag)
  )) %>% 
  mutate(physical_force_verbal_instruction_flag = case_when(
    physical_force_verbal_instruction_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_verbal_instruction_flag)
  )) %>% 
  mutate(physical_force_weapon_impact_flag = case_when(
    physical_force_weapon_impact_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_weapon_impact_flag)
  )) %>% 
  mutate(suspects_actions_drug_transactions_flag = case_when(
    suspects_actions_drug_transactions_flag == "(null)" ~ "N",
    TRUE ~ as.character(suspects_actions_drug_transactions_flag)
  )) %>% 
  mutate(suspects_actions_proximity_to_scene_flag = case_when(
    suspects_actions_proximity_to_scene_flag == "(null)" ~ "N",
    TRUE ~ as.character(suspects_actions_proximity_to_scene_flag)
  )) %>% 
  mutate(other_contraband_flag = case_when(
    other_contraband_flag == "(null)" ~ "N",
    TRUE ~ as.character(other_contraband_flag)
  )) %>% 
  mutate(firearm_flag = case_when(
    firearm_flag == "(null)" ~ "N",
    TRUE ~ as.character(firearm_flag)
  )) %>% 
  mutate(knife_cutter_flag = case_when(
    knife_cutter_flag == "(null)" ~ "N",
    TRUE ~ as.character(knife_cutter_flag)
  )) %>% 
  mutate(weapon_found_flag = case_when(
    weapon_found_flag == "(null)" ~ "N",
    weapon_found_flag == "(" ~ "N",
    TRUE ~ as.character(weapon_found_flag)
  )) %>% 
  mutate(suspect_arrest_offense = case_when(
    suspect_arrest_offense == "(null)" ~ "NO ARREST",
    TRUE ~ as.character(suspect_arrest_offense)
  )) %>% 
  mutate(stop_location_boro_name = case_when(
    stop_location_boro_name == "(null)" ~ NA_character_,
    stop_location_boro_name == "STATEN IS" ~ "STATEN ISLAND",
    stop_location_boro_name == "PBBX" ~ "BRONX",
    stop_location_boro_name == "PBBN" ~ "BROOKLYN",
    stop_location_boro_name == "PBMN" ~ "MANHATTAN",
    stop_location_boro_name == "0208760" ~ NA_character_,
    stop_location_boro_name == "0190241" ~ NA_character_,
    stop_location_boro_name == "0986759" ~ NA_character_,
    stop_location_boro_name == "PBMS" ~ "MANHATTAN",
    stop_location_boro_name == "0210334" ~ NA_character_,
    stop_location_boro_name == "PBSI" ~ "STATEN ISLAND",
    stop_location_boro_name == "0237177" ~ NA_character_,
    stop_location_boro_name == "PBBS" ~ "BROOKLYN",
    stop_location_boro_name == "0155070" ~ NA_character_,
    stop_location_boro_name == "0208169" ~ NA_character_,
    TRUE ~ as.character(stop_location_boro_name)
  )) %>% 
  mutate(stop_location_precinct = case_when(
    stop_location_precinct == "(null)" ~ NA_integer_,
    stop_location_precinct == 208760 ~ NA_integer_,
    TRUE ~ as.integer(stop_location_precinct)
  )) %>% 
  mutate(demeanor_of_person_stopped = case_when(
    demeanor_of_person_stopped == "IRRATE" ~ "IRATE",
    demeanor_of_person_stopped == "1" ~ NA_character_,
    demeanor_of_person_stopped == "NEVEVOUS" ~ "NERVOUS",
    demeanor_of_person_stopped == 1 ~ NA_character_,
    demeanor_of_person_stopped == "N/A" ~ NA_character_,
    TRUE ~ as.character(demeanor_of_person_stopped)
  )) %>% 
  mutate(suspect_reported_age = case_when(
    suspect_reported_age == "(null)" ~ NA_character_,
    suspect_reported_age == "0" ~ NA_character_,
    suspect_reported_age == "1" ~ NA_character_,
    TRUE ~ as.character(suspect_reported_age)
  )) %>% 
  mutate(suspect_reported_age = as.integer(suspect_reported_age))

Due to the nature of the dataset, there had to be some data cleaning conducted in order to organize some of the columns:

For most of the columns that are flags, converted (null) or other values to N (No) values.
Changed non male or female values to NA in the suspect_sex column.
In the suspect_other_description, classified null of other values that were obviously null values to NA.
In the demeanor_of_person_stopped corrected some misspellings and classified some observations to NA if no demeanor was noted.
Organized the stop_location_boro_name so that it represents one of the 5 boroughs of NYC or NA if it was not clear how to classify the stop.
Changed the (null) values for the suspect_arrest_offense to “No Arrest”.
Finally, converted missing values in the stop_location_precinct column to NA since the precincts are numbered from 1 to 123.

Exploratory Data Analysis (EDA)

The cleaned and updated dataset includes 34,784 observations omitting the stops conducting on people that were not described as Black, White, Black Hispanic, or White Hispanic.

Code

# Theme for a few of the EDA plots
theme_eda <- 
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 13,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.x = element_line(color = "grey54"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        plot.title = element_text(face = "bold",
                                  size = 14))
# stops by race
saf_sub %>% 
  count(suspect_race_description) %>% 
  ggplot(aes(x = n, y = fct_reorder(suspect_race_description, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Race",
       title = "Number of Stops by Race") +
  theme_minimal() +
  theme_eda

In the above barplot, we see that Black people are stopped substantially more than any other group. They represent over 50% of the total stops, even though they make up 25% of the population in NYC (NYC census data 2019).

Code

# stops by sex
saf_sub %>% 
  count(suspect_sex) %>% 
  drop_na() %>% 
  ggplot(aes(x = n, y = fct_reorder(suspect_sex, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Sex",
       title = "Number of Stops by Sex") +
  theme_minimal() +
  theme_eda

The majority of stops are of males.

Code

# weekday
d1 <- saf_sub %>% 
  count(day2) %>% 
  ggplot(aes(x = n, y = fct_reorder(day2, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Weekday") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 10,
                                   face = "bold"),
        axis.title = element_blank(),
        panel.grid.major.x = element_line(color = "grey54"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))
# month
m1 <- saf_sub %>% 
  count(month2) %>% 
  ggplot(aes(x = n, y = fct_reorder(month2, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Month") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 10,
                                   face = "bold"),
        axis.title = element_blank(),
        panel.grid.major.x = element_line(color = "grey54"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))
# year
y1 <- saf_sub %>% 
  count(year2) %>% 
  mutate(year = fct_reorder(factor(year2), n)) %>% 
  ggplot(aes(x = n, y = year)) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Month") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 12,
                                   face = "bold"),
        axis.title = element_blank(),
        panel.grid.major.x = element_line(color = "grey54"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))


(d1 + m1) / y1 +
  plot_annotation(title = "Number of Stops by Weekday, Month, and Year") &
  theme(plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        strip.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        plot.title = element_text(face = "bold",
                                  size = 14))

Over 5500 of the stops occurred on Saturdays, while less than 4000 stops occurred on Mondays.
The least number of stops were made in December, while the Spring months of March, April and May had the highest number of stops.

Code

saf_sub %>% 
  drop_na(suspect_reported_age) %>% 
  filter(suspect_reported_age >= 10 & 
           suspect_reported_age <= 80) %>% 
  ggplot(aes(x = suspect_reported_age)) +
  geom_histogram(fill = "#7dc8c4",color = "white", 
                 binwidth = 5) +
  scale_x_continuous(breaks = seq(10, 80, 10)) +
  labs(x = "Age",
       y = "Number of Stops",
       title = "Number of Stops by Age") +
  theme_minimal() +
    theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 13,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.y = element_line(color = "grey54"),
        panel.grid.minor.y = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        plot.title = element_text(face = "bold",
                                  size = 14))

Above is a histogram of stop organized by age. We see a right skewed distribution, where the majority of people stopped were between the ages 15-30. I omitted people that were aged 80 or more in this histogram since there were so few observations.

Code

saf_sub %>% 
  count(stop_location_boro_name) %>%
  drop_na() %>% 
  ggplot(aes(x = n, y = fct_reorder(stop_location_boro_name, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Borough",
       title = "Number of Stops by Borough") +
  theme_minimal() +
  theme_eda

We see that the most number of stops is made in Brooklyn. This is expected since Brooklyn has the highest population of all the boroughs (NYC census data). It is interesting that the second most number of stops is made in Manhattan, even though Queens is the second most populous borough and has about 600,000 more residents. Many NYC residents commute to and work in Manhattan, so this could explain this discrepancy between number of stops and population of each borough.

Code

plotly1 <- saf_sub %>% 
  drop_na(suspect_reported_age, suspect_race_description) %>% 
  filter(suspect_reported_age >= 10 & 
           suspect_reported_age <= 80) %>% 
  count(suspect_race_description, suspect_reported_age) %>% 
  ggplot(aes(x = suspect_race_description, y = suspect_reported_age)) +
  geom_tile(aes(fill = n)) +
    scale_fill_gradient(low = "cadetblue",
                       high = "navyblue") +
  scale_y_continuous(breaks = seq(10, 80, 5)) +
  labs(x = "Race") +
  theme_minimal() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 14,
                                   face = "bold"),
        axis.title.y = element_blank(),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))

ggplotly(plotly1)

In the above interactive visualization we see that the highest subgroup of people stopped were Black people aged 16-17 (1107 stops). The highest subgroup of White people that were stopped were of the age 30-31 (210 stops).

Code

saf_sub %>% 
  count(suspect_race_description, suspected_crime_description) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  gt() %>% 
  tab_header(
    title = "Race and Alleged Crime Description",
    subtitle = "Data from 2017-2019"
  )  %>% 
  data_color(
    columns = vars(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
    colors = scales::col_numeric(
      palette = c(
        "cadetblue1", "navyblue") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:5
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:5
    )
  ) %>% 
  cols_label(
    suspected_crime_description = "Crime"
  )

Race and Alleged Crime Description
Data from 2017-2019
Crime	BLACK	BLACK HISPANIC	WHITE	WHITE HISPANIC
ASSAULT	2592	417	342	1021
AUTO STRIPPIG	67	19	17	47
BURGLARY	1213	215	458	631
CPSP	96	19	26	54
CPW	6205	872	551	1948
CRIMINAL MISCHIEF	342	45	82	150
CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE	124	10	69	62
CRIMINAL POSSESSION OF FORGED INSTRUMENT	34	4	3	6
CRIMINAL POSSESSION OF MARIHUANA	365	61	24	166
CRIMINAL SALE OF CONTROLLED SUBSTANCE	115	25	47	80
CRIMINAL SALE OF MARIHUANA	57	9	6	23
CRIMINAL TRESPASS	1020	160	202	482
FORCIBLE TOUCHING	59	7	16	35
GRAND LARCENY	895	152	115	273
GRAND LARCENY AUTO	577	94	136	271
MAKING GRAFFITI	36	15	54	68
MENACING	401	67	56	163
MISD	1	2	NA	1
MISDEMEANOR	1	2	NA	1
MURDER	46	7	4	21
OTHER	871	130	152	321
PETIT LARCENY	1870	216	449	607
PROSTITUTION	13	1	3	8
RAPE	43	8	8	15
RECKLESS ENDANGERMENT	142	18	7	44
ROBBERY	3415	500	290	1104
TERRORISM	7	1	17	3
THEFT OF SERVICES	96	8	25	30
UNAUTHORIZED USE OF A VEHICLE	114	18	106	88
FELONY	NA	NA	1	NA

In the above table we see the breakdown of the alleged crime a person had committed for each stopped by each race. We see for all four races criminal possession of a weapon (CPW) as the number one reason for stopping someone.

Code

saf_sub %>% 
  filter(suspected_crime_description == "CPW") %>% 
  count(suspect_race_description, weapon_found_flag) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  gt() %>% 
  tab_header(
    title = "Race and Weapons for CPW",
    subtitle = "Data from 2017-2019"
  )  %>% 
  data_color(
    columns = vars(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
    colors = scales::col_numeric(
      palette = c(
        "cadetblue1", "navyblue") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:5
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:5
    )
  ) %>% 
  cols_label(
    weapon_found_flag = "Weapon Found?"
  )

Race and Weapons for CPW
Data from 2017-2019
Weapon Found?	BLACK	BLACK HISPANIC	WHITE	WHITE HISPANIC
N	5046	701	425	1506
Y	1159	171	126	442

I decided to subset the dataset to further analyze the stops that involved alleged CPW suspects. I grouped by if from these stops, did the police actually find a weapon on the suspect. In the above table, we see that for all four races, the police did not find a weapon for the majority of CPW stops. Furthermore, we see that for White suspects the “hit rate” (proportion of stops that yield a positive result (Gelman, A., et. al., 2007)) is higher than Black suspects. This could be due to implicit or explicit bias towards Black people from the police.

Code

saf_sub %>% 
  count(suspect_race_description, frisked_flag) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  gt() %>% 
  tab_header(
    title = "Race and Frisked",
    subtitle = "Data from 2017-2019"
  )  %>% 
  data_color(
    columns = vars(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
    colors = scales::col_numeric(
      palette = c(
        "cadetblue1", "navyblue") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:5
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:5
    )
  ) %>% 
  cols_label(
    frisked_flag = "Frisked?"
  )

Race and Frisked
Data from 2017-2019
Frisked?	BLACK	BLACK HISPANIC	WHITE	WHITE HISPANIC
N	8231	1167	1808	3137
Y	12586	1935	1458	4586

Once an officer stops a person, if they deem necessary they can frisk the person for weapons or contraband. In the above table we see how often an officer decided to frisk a person based on their race. Black and Hispanic people were much more likely to be frisked in comparison to White people. About 60% of Black suspects were frisked as opposed to about 45% of White suspects.

Code

saf_sub %>% 
  count(suspect_race_description, suspect_arrested_flag) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  gt() %>% 
  tab_header(
    title = "Race and Arrest",
    subtitle = "Data from 2017-2019"
  )  %>% 
  data_color(
    columns = vars(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
    colors = scales::col_numeric(
      palette = c(
        "cadetblue1", "navyblue") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:5
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:5
    )
  ) %>% 
  cols_label(
    suspect_arrested_flag = "Arrested?"
  )

Race and Arrest
Data from 2017-2019
Arrested?	BLACK	BLACK HISPANIC	WHITE	WHITE HISPANIC
N	14748	2165	2209	5179
Y	6069	937	1057	2544

Here we see what proportion of stops led to arrests. About 30% of stops involving Black or Black Hispanic people led to an arrest, while about 32% of stops involving White or White Hispanic people led to an arrest. Overall about 70% of people stopped were found to be innocent.

Spatial Data Analysis

One of the features of this dataset was the inclusion of where the stop occurred, specifically which precinct. Using the shapefile of the NYPD precincts I was able to spatially visualize where in New York City the stops occurred. Utilizing the sf package and plotly in the interactive plots below, I display some maps of stop and frisk data in New York City. There has been considerable work done to analyze precinct level stops and associated crime rates within each precinct (Levchak P.J. 2017).

Code

## nyc shape file
saf_shape <- read_sf(here("_texts", "stop_frisk_R", "data",
                          "nypp_21a", "nypp.shp")) %>% 
  clean_names()

# rename precinct in df
saf_sub_pre <- saf_sub %>% 
  rename(precinct = stop_location_precinct)

## tigris
# combine spatial df and df
saf_tigris <- geo_join(saf_shape, saf_sub_pre,
                       by = "precinct", how = "inner")

# overall precinct numbers
saf_tigris_counts <- saf_tigris %>%
  count(precinct)

theme_maps <- 
    theme(strip.text = element_text(face = "bold",
                                  size = 10),
        strip.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text = element_blank(),
        axis.title = element_blank(),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        legend.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        legend.text = element_text(size = 11,
                                   face = "bold"),
        axis.ticks = element_blank())

g1 <- ggplot(saf_tigris_counts)  +
  geom_sf(aes(fill = n),
          colour = "white",
          size = 0.1) +
  geom_sf_text(aes(label = precinct),
               size = 1.15) +
  scale_fill_gradient(low = "cadetblue1",
                       high = "navyblue") +
  theme_maps

# Make it interactive!
ggplotly(g1, tooltip = c("precinct", "n"))

In the above visualization, I created an interactive map showing the number of stops in each precinct. The darker the blue the more stops occurred within a precinct. For example we can see that in precincts 23 and 75 over 1000 people of Black, Black Hispanic, White, and White Hispanic races were stopped in the years 2017-2019.

Code

# facet by borough
saf_tigris_boro <- saf_tigris %>%
  count(precinct, stop_location_boro_name) %>%
  group_by(stop_location_boro_name)

g2 <- ggplot(saf_tigris_boro)  +
  geom_sf(aes(fill = n), color = "white", size = 0.05) +
  geom_sf_text(aes(label = precinct),
               size = 0.5) +
  facet_wrap(~stop_location_boro_name) +
  scale_fill_gradient(low = "cadetblue1",
                       high = "navyblue") +
  theme_maps

# Make it interactive!
ggplotly(g2, tooltip = c("precinct", "n"))

The interactive plot above shows the number of stops in each borough.

Code

# facet by race
saf_tigris_race <- saf_tigris %>%
  count(precinct, suspect_race_description) %>% 
  group_by(suspect_race_description)
# no missing values here

# Plot by race
g3 <- ggplot(saf_tigris_race)  +
  geom_sf(aes(fill = n), color = "white", size = 0.05) +
  geom_sf_text(aes(label = precinct),
               size = 0.5,
               color = "white") +
  facet_wrap(~suspect_race_description) +
  scale_fill_gradient(low = "cadetblue1",
                       high = "navyblue") +
  theme_maps

# Make it interactive!
ggplotly(g3, tooltip = c("precinct", "n"))

The interactive plot above shows the number of stops grouped by race. We see that for most of the precincts Black people were stopped the most in comparison to White or White Hispanic people.

Code

saf_precinct_table <- saf_sub_pre %>% 
  count(precinct, suspect_race_description) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  drop_na(precinct) %>% 
  filter(BLACK < WHITE |
         BLACK < `WHITE HISPANIC`) %>% 
  pivot_longer(cols = c(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
               names_to = "race",
               values_to = "count") %>% 
  pivot_wider(names_from = precinct,
              values_from = count) 

table_names <-  names(saf_precinct_table)[-1]

table_names <- str_replace_all(table_names, 
                               pattern = " ", 
                               replacement = ",")

saf_precinct_table %>% 
  gt() %>% 
  tab_header(
    title = "Stops by Precincts",
    subtitle = "Precincts where Black people were stopped less than White and White Hispanic people"
  )  %>%
  tab_spanner(
    label = md("**precinct**"), 
    columns = vars(table_names)
  ) %>% 
  data_color(
    columns = vars(table_names),
    colors = scales::col_numeric(
      palette = c(
        "cadetblue1", "navyblue") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:17
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:17
    )
  )

Stops by Precincts
Precincts where Black people were stopped less than White and White Hispanic people
race	precinct
race	33	34	50	62	66	68	72	104	108	109	110	111	112	115	122	123
BLACK	149	95	61	96	41	38	100	52	54	102	80	62	72	96	142	43
BLACK HISPANIC	120	218	45	13	35	4	42	38	29	39	39	11	19	37	14	4
WHITE	18	28	18	127	29	82	52	69	40	64	27	80	81	19	156	102
WHITE HISPANIC	180	266	67	70	46	55	269	134	99	112	278	61	66	281	53	25

The table above shows which precincts White and White Hispanic people were stopped more often than Black people. We see that this was the case in only 16 of the 77 precincts.

Sentiment Analysis

In these electronic forms there are 2 columns where we can perform sentiment analysis. Sentiment analysis could be used to understand the emotional component of a text. Here I used VADER, a model used to measure the negative, positive, neutral and overall sentiment intensity of a text (Hutto, C.J. et. al., 2014). The first column I chose to run sentiment analysis on was the demeanor_of_person_stopped. Below is a list of the top 10 responses for the demeanor of a person stopped.

Code

saf_sub %>% 
  count(demeanor_of_person_stopped, sort = TRUE) %>% 
  drop_na(demeanor_of_person_stopped) %>% 
  slice(1:10) %>% 
  gt() %>% 
  tab_header(
    title = "Top 10 Descriptions of Demeanor of Person Stopped",
    subtitle = "Data from 2017-2019"
  ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:2
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:2
    )
  )  %>% 
  cols_label(
    demeanor_of_person_stopped = "Demeanor"
  )

Top 10 Descriptions of Demeanor of Person Stopped
Data from 2017-2019
Demeanor	n
CALM	7280
NERVOUS	3384
CPW	1817
UPSET	1525
ROBBERY	1480
NORMAL	1453
COOPERATIVE	793
PETIT LARCENY	727
COMPLIANT	648
ANGRY	542

Code

get_vader(("CALM"))[2:5]

compound      pos      neu      neg 
 "0.318"      "1"      "0"      "0"

Code

get_vader(("NERVOUS"))[2:5]

compound      pos      neu      neg 
"-0.273"      "0"      "0"      "1"

Code

get_vader(("NORMAL"))[2:5]

compound      pos      neu      neg 
     "0"      "0"      "1"      "0"

Above we see what the VADER scores are for the top 3 responses for demeanor_of_person_stopped. The scale for the compound (overall) score is a value between -1 to 1. The closer to -1 the more negative, while the closer to 1, the more positive the sentiment is. “CALM” had an overall score of 0.3183, “NERVOUS” had an overall score of -0.2732, and “NORMAL” had an overall score of 0.0. These scores make sense since calm is usually more positive, nervous is usually more negative, while normal is fairly neutral.

Code

# select columns for further analysis
saf_dem_cols <- saf_sub %>% 
  select(suspect_race_description,
         suspect_sex,
         frisked_flag,
         suspect_arrested_flag,
         demeanor_of_person_stopped)

# obtain vader scores
saf_dem_scores <- 
  vader_df(saf_sub$demeanor_of_person_stopped)[3:6]

# join data frames
saf_dem_join <- cbind(saf_dem_cols, saf_dem_scores)

Code

# boxplot race, arrested
boxplot1 <-  saf_dem_join %>% 
  drop_na(suspect_race_description,
          compound) %>% 
ggplot(aes(x = suspect_arrested_flag,
                           y = compound)) +
  geom_boxplot(aes(fill = suspect_arrested_flag),
               outlier.fill = "red",
               position = "dodge2") +
  facet_grid(.~suspect_race_description) +
  scale_fill_manual(values = c("#7dc8c4", "tomato1")) +
  labs(x = "Arrested",
       fill = "Arrested") +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 10),
        panel.grid.major.y = element_line(color = "grey40"),
        panel.grid.minor.y = element_line(color = "grey40"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        strip.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        strip.text = element_text(size = 13,
                                  face = "bold"),
        legend.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        legend.title = element_blank(),
        legend.text = element_text(size = 11,
                                   face = "bold"))

ggplotly(boxplot1)

The above interactive plotly boxplots are organized on the x axis by whether a person was arrested, the y axis is the overall compound score for the demeanor of the person stopped, and it is grouped by the race of the suspect. In the red is the score of a person who was arrested and in the green is the score of a person who was not arrested. We see a similar pattern to the previous boxplots. The distribution of scores is fairly identical between the 4 races for people that were not arrested, but different for the people that were arrested. A White person who was arrested had about the same score as a White person that was not arrested. On the other hand, Black, White Hispanic, and Black Hispanic people that were stopped AND arrested had lower overall VADER scores in comparison to people of the same race that were not arrested.

Discussion and Conclusion

In the past 3 years NYPD has stopped young Black males most often. Over 50% of the total stops were of Black people and over 90% of the stops were of males. Geographically, the most stops were made in Brooklyn. The sentiment analysis using the VADER model provided some insightful results. Why is there a difference between the boxplots of arrested and not arrested people for Black and Hispanic people but no differences for White people? Further analysis of the text from the `demeanor_of_person_stopped variable would be of interest. The differences in sex for all racial groups is also evident. Females are said to have more negative demeanor in comparison to males. This could be due to the small sample size of females that are stopped.

There are numerous avenues that could be researched for further analysis from this dataset. Some possible questions: When are people of different races stopped during the day? For each race, what proportion of stops lead to physical force? In the precincts that do not stop Black people the most, what is the racial makeup of that precinct neighborhood?

There are a few questions to address from this dataset. First, these are only the recorded stops that are reported by the NYPD. There could be numerous stops that are not recorded by an officer for various reasons that could alter the data. Also, this data is inputted by the officer and thus they are essentially the data collectors. It would be important to somehow validate some of these stops, possibly by looking through police body cam footage.

This analysis further shows the explicit and implicit bias police have towards Black people. It is imperative during this time we all evaluate our bias toward people of different races. Police should be held to the highest standards since they are the ones tasked with keeping our communities safe and are in positions of authority. Unfortunately research has shown that the sometimes invasive and frequent stops have had detrimental health effects on minorities in racially diverse communities (Sewell A.A. et. al., 2016). Research to uncover biases towards minorities is extremely important and should be further investigated.

References

Bandes, SA, Pryor, M, Kerrison, EM, Goff, PA. (2019). The mismeasure of Terry stops: Assessing the psychological and emotional harms of stop and frisk to individuals and communities. Behav Sci Law; 37: 176– 194.
Gelman, A., Fagan, J., & Kiss, A. (2007). An analysis of the New York City Police Department’s “stop‐and frisk” policy in the context of claims of racial bias. Journal of the American Statistical Association, 102(479)), 813–823.
Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
Levchak P.J., (2017) Do Precinct Characteristics Influence Stop-and-Frisk in New York City? A Multi-Level Analysis of Post-Stop Outcomes, Justice Quarterly, 34:3, 377-406.
NYC census data (2019). Available at https://www.census.gov/quickfacts/fact/table/newyorkcitynewyork,NY/PST045219
NYC police precincts shapefile (2020). Available at https://www1.nyc.gov/site/planning/data-maps/open-data/districts-download-metadata.page
NYC crime map (2019). Available at https://maps.nyc.gov/crime/
NYC Stop and Frisk Data. Available at https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page
NYCLU Stop‐and‐Frisk Data (2019). Available at https://www.nyclu.org/en/publications/stop-and-frisk-de-blasio-era-2019
Rengifo, A., & Fowler, K. (2016). Stop, question, and complain: Citizen grievances against the NYPD and the opacity of police stops across New York City precincts, 2007–2013. Journal of Urban Health, 93, 32–41.
Sewell, A. A., Jefferson, K. A., & Lee, H. (2016). Living under surveillance: Gender, psychological distress, and stop‐question‐ and‐frisk policing in New York City. Social Science and Medicine, 159, 1–13.
Sewell, A. A., & Jefferson, K. A. (2016). Collateral Damage: The Health Effects of Invasive Police Encounters in New York City. Journal of urban health : bulletin of the New York Academy of Medicine, 93 Suppl 1(Suppl 1), 42–67.

Stop and Frisk in New York City

Introduction

Data Wrangling and Cleaning

Exploratory Data Analysis (EDA)

Spatial Data Analysis

Sentiment Analysis

Discussion and Conclusion

References

Citation