Stop and Frisk in New York City

Text Analysis R Python Spatial Data Analysis

An analysis of the stop and frisk policy in NYC from 2017 - 2019.

Roupen Khanjian true
03-14-2021
Code
# Packages 
library(tidyverse) # Easily Install and Load the 'Tidyverse', CRAN v1.3.0
library(patchwork) # The Composer of Plots, CRAN v1.1.1
library(here) # A Simpler Way to Find Your Files, CRAN v1.0.1
library(janitor) # Simple Tools for Examining and Cleaning Dirty Data, CRAN v2.1.0
library(tidytext) # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools, CRAN v0.3.0
library(textdata) # Download and Load Various Text Datasets, CRAN v0.4.1
library(vader) # Valence Aware Dictionary and sEntiment Reasoner (VADER), CRAN v0.2.1
library(readxl) # Read Excel Files, CRAN v1.3.1
library(lubridate) # Make Dealing with Dates a Little Easier, CRAN v1.7.10
library(gt) # Easily Create Presentation-Ready Display Tables, CRAN v0.2.2
library(kableExtra) # Construct Complex Table with 'kable' and Pipe Syntax, CRAN v1.3.4
library(knitr) # A General-Purpose Package for Dynamic Report Generation in R, CRAN v1.31
library(plotly) # Create Interactive Web Graphics via 'plotly.js', CRAN v4.9.3
library(sf) # Simple Features for R, CRAN v0.9-7
library(tigris) # Load Census TIGER/Line Shapefiles, CRAN v1.0

Introduction

This article was originally completed in python and can be found here: link

In mid 2020, protests took place all over the USA and the world in response to the murder of George Floyd by the Minneapolis Police. Since the advent of video recordings via cell phones, there have been many instances of police brutality captured against Black people. The relationship between law enforcement and the Black community has always been tenuous and now it has been brought to the attention of the rest of the world. In the landmark Supreme court case of Terry v. Ohio (1968) it was ruled that police could stop, question, and frisk a person if they have reasonable suspicion that the person had committed a crime (Brandes, S.A. et al., 2019).

In the 2000’s to the early 2010’s the New York City (NYC) stop and frisk policy garnered national attention due to the high number of stops and profiling of Black people. At the height of the policy, in 2011 there were 658,724 stops recorded with over 50% of the stops targeting Black people (NYCLU 2019). Since then, the number of stops per year has substantially decreased to 13,459 stop in 2019. Opponents of this policy argue this is still too many stops, especially since in 2019 about 66% of the people stopped were innocent. It has been also shown that the stopping of white people more likely led to an arrest in comparison to Black and Hispanic people, implying the police may be targeting minorities and being more mindful of stopping white people (Gelman, A., et. al., 2007). There has been substantial research conducted showing the psychological distress of a stop and frisk policy on communities of color in NYC (Sewell, A. et al., 2016).

In this project I analyze the Stop, Question and Frisk Data from the New York Police Department (NYPD) from the most current three years: 2017, 2018, and 2019 (NYC Stop and Frisk Data). I chose these years for the following reasons: The years 2018-2019 was not included in the most recent NYCLU report. In 2017 the NYPD moved to an electronic form, as opposed to manually writing down a response for each question in the handwritten forms used prior to 2017. Lastly 2017 was the first year of the Trump presidency and I was curious to investigate if his rhetoric on race may have affected law enforcement’s behavior toward minorities.

Data was downloaded from the NYPD website link. In each dataset each row is a stop of a specific person, and each column is a variable. There are a total of 83 different variables in each dataset.Below is a table of the first 10 stops from this dataset.

Code
# Read in data
saf_17 <- read_excel(here("_texts", "stop_frisk_R", "data",
                          "sqf_2017.xlsx"))

saf_18 <- read_excel(here("_texts", "stop_frisk_R", "data",
                          "sqf_2018.xlsx"))

saf_19 <- read_excel(here("_texts", "stop_frisk_R", "data",
                          "sqf_2019.xlsx"))

# fix date column and select columns to be used for analysis
saf_17 <- saf_17 %>%
  mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
  select(
    c(
      "STOP_FRISK_DATE",
      "STOP_FRISK_TIME",
      "YEAR2",
      "MONTH2",
      "DAY2",
      "ISSUING_OFFICER_RANK",
      "SUSPECTED_CRIME_DESCRIPTION",
      "SUSPECT_ARRESTED_FLAG",
      "SUSPECT_ARREST_OFFENSE",
      "OFFICER_IN_UNIFORM_FLAG",
      "FRISKED_FLAG",
      "SEARCHED_FLAG",
      "OTHER_CONTRABAND_FLAG",
      "FIREARM_FLAG",
      "KNIFE_CUTTER_FLAG",
      "OTHER_WEAPON_FLAG",
      "WEAPON_FOUND_FLAG",
      "PHYSICAL_FORCE_CEW_FLAG",
      "PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
      "PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
      "PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
      "PHYSICAL_FORCE_OTHER_FLAG",
      "PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
      "PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
      "PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
      "SUSPECTS_ACTIONS_CASING_FLAG",
      "SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
      "DEMEANOR_OF_PERSON_STOPPED",
      "SUSPECT_REPORTED_AGE",
      "SUSPECT_SEX",
      "SUSPECT_RACE_DESCRIPTION",
      "SUSPECT_BODY_BUILD_TYPE",
      "SUSPECT_OTHER_DESCRIPTION",
      "STOP_LOCATION_PRECINCT",
      "STOP_LOCATION_FULL_ADDRESS",
      "STOP_LOCATION_STREET_NAME",
      "STOP_LOCATION_PATROL_BORO_NAME",
      "STOP_LOCATION_BORO_NAME",
      "SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
    )
  )

saf_18 <- saf_18 %>%
  mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
  select(
    c(
      "STOP_FRISK_DATE",
      "STOP_FRISK_TIME",
      "YEAR2",
      "MONTH2",
      "DAY2",
      "ISSUING_OFFICER_RANK",
      "SUSPECTED_CRIME_DESCRIPTION",
      "SUSPECT_ARRESTED_FLAG",
      "SUSPECT_ARREST_OFFENSE",
      "OFFICER_IN_UNIFORM_FLAG",
      "FRISKED_FLAG",
      "SEARCHED_FLAG",
      "OTHER_CONTRABAND_FLAG",
      "FIREARM_FLAG",
      "KNIFE_CUTTER_FLAG",
      "OTHER_WEAPON_FLAG",
      "WEAPON_FOUND_FLAG",
      "PHYSICAL_FORCE_CEW_FLAG",
      "PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
      "PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
      "PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
      "PHYSICAL_FORCE_OTHER_FLAG",
      "PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
      "PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
      "PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
      "SUSPECTS_ACTIONS_CASING_FLAG",
      "SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
      "DEMEANOR_OF_PERSON_STOPPED",
      "SUSPECT_REPORTED_AGE",
      "SUSPECT_SEX",
      "SUSPECT_RACE_DESCRIPTION",
      "SUSPECT_BODY_BUILD_TYPE",
      "SUSPECT_OTHER_DESCRIPTION",
      "STOP_LOCATION_PRECINCT",
      "STOP_LOCATION_FULL_ADDRESS",
      "STOP_LOCATION_STREET_NAME",
      "STOP_LOCATION_PATROL_BORO_NAME",
      "STOP_LOCATION_BORO_NAME",
      "SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
    )
  )

saf_19 <- saf_19 %>%
  mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
  select( # select for the years 
    c(
      "STOP_FRISK_DATE",
      "STOP_FRISK_TIME",
      "YEAR2",
      "MONTH2",
      "DAY2",
      "ISSUING_OFFICER_RANK",
      "SUSPECTED_CRIME_DESCRIPTION",
      "SUSPECT_ARRESTED_FLAG",
      "SUSPECT_ARREST_OFFENSE",
      "OFFICER_IN_UNIFORM_FLAG",
      "FRISKED_FLAG",
      "SEARCHED_FLAG",
      "OTHER_CONTRABAND_FLAG",
      "FIREARM_FLAG",
      "KNIFE_CUTTER_FLAG",
      "OTHER_WEAPON_FLAG",
      "WEAPON_FOUND_FLAG",
      "PHYSICAL_FORCE_CEW_FLAG",
      "PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
      "PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
      "PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
      "PHYSICAL_FORCE_OTHER_FLAG",
      "PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
      "PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
      "PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
      "SUSPECTS_ACTIONS_CASING_FLAG",
      "SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
      "DEMEANOR_OF_PERSON_STOPPED",
      "SUSPECT_REPORTED_AGE",
      "SUSPECT_SEX",
      "SUSPECT_RACE_DESCRIPTION",
      "SUSPECT_BODY_BUILD_TYPE",
      "SUSPECT_OTHER_DESCRIPTION",
      "STOP_LOCATION_PRECINCT",
      "STOP_LOCATION_FULL_ADDRESS",
      "STOP_LOCATION_STREET_NAME",
      "STOP_LOCATION_PATROL_BORO_NAME",
      "STOP_LOCATION_BORO_NAME",
      "SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
    )
  )

# row bind all 3 years into one data frame
saf <- rbind(saf_17, saf_18, saf_19) 

# clean column names
saf <- saf %>% 
  clean_names()

# view first 5 rows
head(saf, n = 10) %>% 
  kbl(caption = "<b style = 'color:black;'>
       First ten rows of stop and frisk dataset.") %>%
  kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
  row_spec(0, color = "white", background = "#222222") %>%
  scroll_box(width = "100%", height = "300px", 
             fixed_thead = list(enabled = T, background = "#222222"))
Table 1: First ten rows of stop and frisk dataset.
stop_frisk_date stop_frisk_time year2 month2 day2 issuing_officer_rank suspected_crime_description suspect_arrested_flag suspect_arrest_offense officer_in_uniform_flag frisked_flag searched_flag other_contraband_flag firearm_flag knife_cutter_flag other_weapon_flag weapon_found_flag physical_force_cew_flag physical_force_draw_point_firearm_flag physical_force_handcuff_suspect_flag physical_force_oc_spray_used_flag physical_force_other_flag physical_force_restraint_used_flag physical_force_verbal_instruction_flag physical_force_weapon_impact_flag suspects_actions_casing_flag suspects_actions_proximity_to_scene_flag demeanor_of_person_stopped suspect_reported_age suspect_sex suspect_race_description suspect_body_build_type suspect_other_description stop_location_precinct stop_location_full_address stop_location_street_name stop_location_patrol_boro_name stop_location_boro_name suspects_actions_drug_transactions_flag
2017-01-16 1899-12-31 14:26:00 2017 January Monday SGT TERRORISM N (null) Y N N N (null) (null) (null) N (null) (null) (null) (null) (null) (null) Y (null) (null) (null) TERRORISM 39 MALE (null) THN (null) 1 180 GREENWICH STREET GREENWICH STREET PBMS MANHATTAN (null)
2017-01-16 1899-12-31 14:26:00 2017 January Monday SGT TERRORISM N (null) Y N N N (null) (null) (null) N (null) (null) (null) (null) (null) (null) Y (null) (null) (null) TERRORISM 37 MALE (null) MED (null) 1 180 GREENWICH STREET GREENWICH STREET PBMS MANHATTAN (null)
2017-02-08 1899-12-31 11:10:00 2017 February Wednesday POM OTHER N (null) N N N N (null) (null) (null) N (null) (null) (null) (null) (null) (null) Y (null) (null) (null) OTHER (null) FEMALE WHITE THN N/A 1 WALL STREET && BROADWAY WALL STREET PBMS MANHATTAN (null)
2017-02-20 1899-12-31 11:35:00 2017 February Monday POM GRAND LARCENY AUTO N (null) Y Y Y N (null) (null) (null) N (null) (null) Y (null) (null) (null) (null) (null) (null) (null) GRAND LARCENY AUTO 31 MALE BLACK HISPANIC U UNK 1 75 GREENE STREET GREENE STREET PBMS MANHATTAN (null)
2017-02-21 1899-12-31 13:20:00 2017 February Tuesday POM BURGLARY N (null) Y N N N (null) (null) (null) N (null) (null) Y (null) (null) (null) (null) (null) (null) Y BURGLARY (null) FEMALE BLACK THN (null) 1 429 WEST BROADWAY WEST BROADWAY PBMS MANHATTAN (null)
2017-02-17 1899-12-31 21:25:00 2017 February Friday POM CPW Y MENACING Y Y Y N (null) (null) (null) N (null) (null) (null) (null) (null) (null) Y (null) (null) (null) CPW 39 MALE WHITE HISPANIC MED (null) 1 WEST STREET && CHAMBERS STREET WEST STREET PBMS MANHATTAN (null)
2017-02-25 1899-12-31 20:00:00 2017 February Saturday POM CPW N (null) Y N N N (null) (null) (null) N (null) (null) Y (null) (null) Y Y (null) (null) (null) CPW 19 FEMALE BLACK THN (null) 1 CHAMBERS STREET && WEST BROADWAY CHAMBERS STREET PBMS MANHATTAN (null)
2017-02-25 1899-12-31 19:58:00 2017 February Saturday POM CPW N (null) Y Y Y N (null) (null) (null) N (null) (null) (null) (null) (null) Y Y (null) (null) Y NORMAL 15 FEMALE BLACK THN (null) 1 CHAMBERS STREET && WEST BROADWAY CHAMBERS STREET PBMS MANHATTAN (null)
2017-02-21 1899-12-31 13:15:00 2017 February Tuesday POM BURGLARY N (null) Y Y Y N (null) (null) (null) N (null) (null) Y (null) (null) (null) Y (null) (null) Y PLEASANT 43 MALE BLACK HEA (null) 1 429 WEST BROADWAY WEST BROADWAY PBMS MANHATTAN (null)
2017-03-03 1899-12-31 08:16:00 2017 March Friday POM CRIMINAL MISCHIEF Y CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE Y Y Y Y (null) (null) (null) N (null) (null) (null) (null) (null) (null) Y (null) (null) (null) NERVOUS 33 MALE BLACK THN (null) 1 CORTLANDT STREET && CHURCH STREET CORTLANDT STREET PBMS MANHATTAN (null)
Code
nrow(saf)
[1] 36096
Code
saf %>% 
  count(suspect_race_description) %>% 
  summarise(race = suspect_race_description,
            n = n,
            percent = n / sum(n)) %>% 
  gt() %>% 
  tab_header(
    title = "Which Race was Stop and Frisked the Most?",
    subtitle = "Data from 2017-2019"
  ) %>% 
  fmt_percent( 
    columns = vars(percent),
    decimals = 2
    ) %>% 
  data_color(
    columns = vars(n, percent),
    colors = scales::col_numeric(
      palette = c(
        "lightskyblue", "dodgerblue", "royalblue4") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:3
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:3
    )
  )
Which Race was Stop and Frisked the Most?
Data from 2017-2019
race n percent
(null) 420 1.16%
AMER IND 9 0.02%
AMERICAN INDIAN/ALASKAN N 8 0.02%
AMERICAN INDIAN/ALASKAN NATIVE 16 0.04%
ASIAN / PACIFIC ISLANDER 522 1.45%
ASIAN/PAC.ISL 206 0.57%
BLACK 20817 57.67%
BLACK HISPANIC 3102 8.59%
MALE 7 0.02%
WHITE 3266 9.05%
WHITE HISPANIC 7723 21.40%

In the above table we see that the people stopped were mostly described as have one of the following four races: Black, White Hispanic, White, or Black Hispanic.

Data Wrangling and Cleaning

Code
saf_sub <- saf %>% 
  # filter for race in one of the four most saf 
  filter(suspect_race_description %in% c("BLACK", 
                                         "WHITE",
                                         "WHITE HISPANIC",
                                         "BLACK HISPANIC")) %>% 
  # Fill in NA values for nonsense descriptions
  # also change misspellings
  mutate(suspect_other_description = case_when(
    suspect_other_description == "(null)" ~ NA_character_,
    suspect_other_description == "NONE" ~ NA_character_,
    suspect_other_description == "UNK" ~ NA_character_,
    suspect_other_description == "UNKNOWN" ~ NA_character_,
    suspect_other_description == "UNKNOW" ~ NA_character_,
    suspect_other_description == "NO" ~ NA_character_,
    suspect_other_description == "UKNOWN" ~ NA_character_,
    suspect_other_description == "UNKOWN" ~ NA_character_,
    suspect_other_description == "NA" ~ NA_character_,
    suspect_other_description == "N/A" ~ NA_character_,
    suspect_other_description == "0" ~ NA_character_,
    suspect_other_description == 0 ~ NA_character_,
    suspect_other_description == "TATOOS" ~ "TATTOOS",
    suspect_other_description == "TATTOS" ~ "TATTOOS",
    suspect_other_description == "TATOO" ~ "TATTOOS",
    TRUE ~ as.character(suspect_other_description)
  )) %>% 
  # Change unknown sex values to NA
  mutate(suspect_sex = case_when(
    suspect_sex == "(null)" ~ NA_character_,
    suspect_sex == "19" ~ NA_character_,
    suspect_sex == "23" ~ NA_character_,
    suspect_sex == "24" ~ NA_character_,
    suspect_sex == "39" ~ NA_character_,
    TRUE ~ as.character(suspect_sex)
  )) %>% 
  # Change (null) value to No (N) (need to find a better way to do this step)
  mutate(physical_force_cew_flag = case_when(
    physical_force_cew_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_cew_flag)
  )) %>% 
  mutate(physical_force_draw_point_firearm_flag = case_when(
    physical_force_draw_point_firearm_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_draw_point_firearm_flag)
  )) %>% 
  mutate(physical_force_handcuff_suspect_flag = case_when(
    physical_force_handcuff_suspect_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_handcuff_suspect_flag)
  )) %>% 
  mutate(physical_force_oc_spray_used_flag = case_when(
    physical_force_oc_spray_used_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_oc_spray_used_flag)
  )) %>% 
  mutate(physical_force_other_flag = case_when(
    physical_force_other_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_other_flag)
  )) %>% 
  mutate(physical_force_restraint_used_flag = case_when(
    physical_force_restraint_used_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_restraint_used_flag)
  )) %>% 
  mutate(physical_force_verbal_instruction_flag = case_when(
    physical_force_verbal_instruction_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_verbal_instruction_flag)
  )) %>% 
  mutate(physical_force_weapon_impact_flag = case_when(
    physical_force_weapon_impact_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_weapon_impact_flag)
  )) %>% 
  mutate(suspects_actions_drug_transactions_flag = case_when(
    suspects_actions_drug_transactions_flag == "(null)" ~ "N",
    TRUE ~ as.character(suspects_actions_drug_transactions_flag)
  )) %>% 
  mutate(suspects_actions_proximity_to_scene_flag = case_when(
    suspects_actions_proximity_to_scene_flag == "(null)" ~ "N",
    TRUE ~ as.character(suspects_actions_proximity_to_scene_flag)
  )) %>% 
  mutate(other_contraband_flag = case_when(
    other_contraband_flag == "(null)" ~ "N",
    TRUE ~ as.character(other_contraband_flag)
  )) %>% 
  mutate(firearm_flag = case_when(
    firearm_flag == "(null)" ~ "N",
    TRUE ~ as.character(firearm_flag)
  )) %>% 
  mutate(knife_cutter_flag = case_when(
    knife_cutter_flag == "(null)" ~ "N",
    TRUE ~ as.character(knife_cutter_flag)
  )) %>% 
  mutate(weapon_found_flag = case_when(
    weapon_found_flag == "(null)" ~ "N",
    weapon_found_flag == "(" ~ "N",
    TRUE ~ as.character(weapon_found_flag)
  )) %>% 
  mutate(suspect_arrest_offense = case_when(
    suspect_arrest_offense == "(null)" ~ "NO ARREST",
    TRUE ~ as.character(suspect_arrest_offense)
  )) %>% 
  mutate(stop_location_boro_name = case_when(
    stop_location_boro_name == "(null)" ~ NA_character_,
    stop_location_boro_name == "STATEN IS" ~ "STATEN ISLAND",
    stop_location_boro_name == "PBBX" ~ "BRONX",
    stop_location_boro_name == "PBBN" ~ "BROOKLYN",
    stop_location_boro_name == "PBMN" ~ "MANHATTAN",
    stop_location_boro_name == "0208760" ~ NA_character_,
    stop_location_boro_name == "0190241" ~ NA_character_,
    stop_location_boro_name == "0986759" ~ NA_character_,
    stop_location_boro_name == "PBMS" ~ "MANHATTAN",
    stop_location_boro_name == "0210334" ~ NA_character_,
    stop_location_boro_name == "PBSI" ~ "STATEN ISLAND",
    stop_location_boro_name == "0237177" ~ NA_character_,
    stop_location_boro_name == "PBBS" ~ "BROOKLYN",
    stop_location_boro_name == "0155070" ~ NA_character_,
    stop_location_boro_name == "0208169" ~ NA_character_,
    TRUE ~ as.character(stop_location_boro_name)
  )) %>% 
  mutate(stop_location_precinct = case_when(
    stop_location_precinct == "(null)" ~ NA_integer_,
    stop_location_precinct == 208760 ~ NA_integer_,
    TRUE ~ as.integer(stop_location_precinct)
  )) %>% 
  mutate(demeanor_of_person_stopped = case_when(
    demeanor_of_person_stopped == "IRRATE" ~ "IRATE",
    demeanor_of_person_stopped == "1" ~ NA_character_,
    demeanor_of_person_stopped == "NEVEVOUS" ~ "NERVOUS",
    demeanor_of_person_stopped == 1 ~ NA_character_,
    demeanor_of_person_stopped == "N/A" ~ NA_character_,
    TRUE ~ as.character(demeanor_of_person_stopped)
  )) %>% 
  mutate(suspect_reported_age = case_when(
    suspect_reported_age == "(null)" ~ NA_character_,
    suspect_reported_age == "0" ~ NA_character_,
    suspect_reported_age == "1" ~ NA_character_,
    TRUE ~ as.character(suspect_reported_age)
  )) %>% 
  mutate(suspect_reported_age = as.integer(suspect_reported_age))

Due to the nature of the dataset, there had to be some data cleaning conducted in order to organize some of the columns:

Exploratory Data Analysis (EDA)

The cleaned and updated dataset includes 34,784 observations omitting the stops conducting on people that were not described as Black, White, Black Hispanic, or White Hispanic.

Code
# Theme for a few of the EDA plots
theme_eda <- 
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 13,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.x = element_line(color = "grey54"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        plot.title = element_text(face = "bold",
                                  size = 14))
# stops by race
saf_sub %>% 
  count(suspect_race_description) %>% 
  ggplot(aes(x = n, y = fct_reorder(suspect_race_description, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Race",
       title = "Number of Stops by Race") +
  theme_minimal() +
  theme_eda

Code
# stops by sex
saf_sub %>% 
  count(suspect_sex) %>% 
  drop_na() %>% 
  ggplot(aes(x = n, y = fct_reorder(suspect_sex, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Sex",
       title = "Number of Stops by Sex") +
  theme_minimal() +
  theme_eda

Code
# weekday
d1 <- saf_sub %>% 
  count(day2) %>% 
  ggplot(aes(x = n, y = fct_reorder(day2, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Weekday") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 10,
                                   face = "bold"),
        axis.title = element_blank(),
        panel.grid.major.x = element_line(color = "grey54"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))
# month
m1 <- saf_sub %>% 
  count(month2) %>% 
  ggplot(aes(x = n, y = fct_reorder(month2, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Month") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 10,
                                   face = "bold"),
        axis.title = element_blank(),
        panel.grid.major.x = element_line(color = "grey54"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))
# year
y1 <- saf_sub %>% 
  count(year2) %>% 
  mutate(year = fct_reorder(factor(year2), n)) %>% 
  ggplot(aes(x = n, y = year)) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Month") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 12,
                                   face = "bold"),
        axis.title = element_blank(),
        panel.grid.major.x = element_line(color = "grey54"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))


(d1 + m1) / y1 +
  plot_annotation(title = "Number of Stops by Weekday, Month, and Year") &
  theme(plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        strip.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        plot.title = element_text(face = "bold",
                                  size = 14))

Code
saf_sub %>% 
  drop_na(suspect_reported_age) %>% 
  filter(suspect_reported_age >= 10 & 
           suspect_reported_age <= 80) %>% 
  ggplot(aes(x = suspect_reported_age)) +
  geom_histogram(fill = "#7dc8c4",color = "white", 
                 binwidth = 5) +
  scale_x_continuous(breaks = seq(10, 80, 10)) +
  labs(x = "Age",
       y = "Number of Stops",
       title = "Number of Stops by Age") +
  theme_minimal() +
    theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 13,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.y = element_line(color = "grey54"),
        panel.grid.minor.y = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        plot.title = element_text(face = "bold",
                                  size = 14))

Code
saf_sub %>% 
  count(stop_location_boro_name) %>%
  drop_na() %>% 
  ggplot(aes(x = n, y = fct_reorder(stop_location_boro_name, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Borough",
       title = "Number of Stops by Borough") +
  theme_minimal() +
  theme_eda

Code
plotly1 <- saf_sub %>% 
  drop_na(suspect_reported_age, suspect_race_description) %>% 
  filter(suspect_reported_age >= 10 & 
           suspect_reported_age <= 80) %>% 
  count(suspect_race_description, suspect_reported_age) %>% 
  ggplot(aes(x = suspect_race_description, y = suspect_reported_age)) +
  geom_tile(aes(fill = n)) +
    scale_fill_gradient(low = "cadetblue",
                       high = "navyblue") +
  scale_y_continuous(breaks = seq(10, 80, 5)) +
  labs(x = "Race") +
  theme_minimal() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 14,
                                   face = "bold"),
        axis.title.y = element_blank(),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))

ggplotly(plotly1)
Code
saf_sub %>% 
  count(suspect_race_description, suspected_crime_description) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  gt() %>% 
  tab_header(
    title = "Race and Alleged Crime Description",
    subtitle = "Data from 2017-2019"
  )  %>% 
  data_color(
    columns = vars(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
    colors = scales::col_numeric(
      palette = c(
        "cadetblue1", "navyblue") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:5
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:5
    )
  ) %>% 
  cols_label(
    suspected_crime_description = "Crime"
  )
Race and Alleged Crime Description
Data from 2017-2019
Crime BLACK BLACK HISPANIC WHITE WHITE HISPANIC
ASSAULT 2592 417 342 1021
AUTO STRIPPIG 67 19 17 47
BURGLARY 1213 215 458 631
CPSP 96 19 26 54
CPW 6205 872 551 1948
CRIMINAL MISCHIEF 342 45 82 150
CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE 124 10 69 62
CRIMINAL POSSESSION OF FORGED INSTRUMENT 34 4 3 6
CRIMINAL POSSESSION OF MARIHUANA 365 61 24 166
CRIMINAL SALE OF CONTROLLED SUBSTANCE 115 25 47 80
CRIMINAL SALE OF MARIHUANA 57 9 6 23
CRIMINAL TRESPASS 1020 160 202 482
FORCIBLE TOUCHING 59 7 16 35
GRAND LARCENY 895 152 115 273
GRAND LARCENY AUTO 577 94 136 271
MAKING GRAFFITI 36 15 54 68
MENACING 401 67 56 163
MISD 1 2 NA 1
MISDEMEANOR 1 2 NA 1
MURDER 46 7 4 21
OTHER 871 130 152 321
PETIT LARCENY 1870 216 449 607
PROSTITUTION 13 1 3 8
RAPE 43 8 8 15
RECKLESS ENDANGERMENT 142 18 7 44
ROBBERY 3415 500 290 1104
TERRORISM 7 1 17 3
THEFT OF SERVICES 96 8 25 30
UNAUTHORIZED USE OF A VEHICLE 114 18 106 88
FELONY NA NA 1 NA
Code
saf_sub %>% 
  filter(suspected_crime_description == "CPW") %>% 
  count(suspect_race_description, weapon_found_flag) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  gt() %>% 
  tab_header(
    title = "Race and Weapons for CPW",
    subtitle = "Data from 2017-2019"
  )  %>% 
  data_color(
    columns = vars(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
    colors = scales::col_numeric(
      palette = c(
        "cadetblue1", "navyblue") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:5
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:5
    )
  ) %>% 
  cols_label(
    weapon_found_flag = "Weapon Found?"
  )
Race and Weapons for CPW
Data from 2017-2019
Weapon Found? BLACK BLACK HISPANIC WHITE WHITE HISPANIC
N 5046 701 425 1506
Y 1159 171 126 442
Code
saf_sub %>% 
  count(suspect_race_description, frisked_flag) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  gt() %>% 
  tab_header(
    title = "Race and Frisked",
    subtitle = "Data from 2017-2019"
  )  %>% 
  data_color(
    columns = vars(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
    colors = scales::col_numeric(
      palette = c(
        "cadetblue1", "navyblue") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:5
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:5
    )
  ) %>% 
  cols_label(
    frisked_flag = "Frisked?"
  )
Race and Frisked
Data from 2017-2019
Frisked? BLACK BLACK HISPANIC WHITE WHITE HISPANIC
N 8231 1167 1808 3137
Y 12586 1935 1458 4586
Code
saf_sub %>% 
  count(suspect_race_description, suspect_arrested_flag) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  gt() %>% 
  tab_header(
    title = "Race and Arrest",
    subtitle = "Data from 2017-2019"
  )  %>% 
  data_color(
    columns = vars(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
    colors = scales::col_numeric(
      palette = c(
        "cadetblue1", "navyblue") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:5
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:5
    )
  ) %>% 
  cols_label(
    suspect_arrested_flag = "Arrested?"
  )
Race and Arrest
Data from 2017-2019
Arrested? BLACK BLACK HISPANIC WHITE WHITE HISPANIC
N 14748 2165 2209 5179
Y 6069 937 1057 2544

Spatial Data Analysis

One of the features of this dataset was the inclusion of where the stop occurred, specifically which precinct. Using the shapefile of the NYPD precincts I was able to spatially visualize where in New York City the stops occurred. Utilizing the sf package and plotly in the interactive plots below, I display some maps of stop and frisk data in New York City. There has been considerable work done to analyze precinct level stops and associated crime rates within each precinct (Levchak P.J. 2017).

Code
## nyc shape file
saf_shape <- read_sf(here("_texts", "stop_frisk_R", "data",
                          "nypp_21a", "nypp.shp")) %>% 
  clean_names()

# rename precinct in df
saf_sub_pre <- saf_sub %>% 
  rename(precinct = stop_location_precinct)

## tigris
# combine spatial df and df
saf_tigris <- geo_join(saf_shape, saf_sub_pre,
                       by = "precinct", how = "inner")

# overall precinct numbers
saf_tigris_counts <- saf_tigris %>%
  count(precinct)

theme_maps <- 
    theme(strip.text = element_text(face = "bold",
                                  size = 10),
        strip.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text = element_blank(),
        axis.title = element_blank(),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        legend.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        legend.text = element_text(size = 11,
                                   face = "bold"),
        axis.ticks = element_blank())

g1 <- ggplot(saf_tigris_counts)  +
  geom_sf(aes(fill = n),
          colour = "white",
          size = 0.1) +
  geom_sf_text(aes(label = precinct),
               size = 1.15) +
  scale_fill_gradient(low = "cadetblue1",
                       high = "navyblue") +
  theme_maps

# Make it interactive!
ggplotly(g1, tooltip = c("precinct", "n"))
Code
# facet by borough
saf_tigris_boro <- saf_tigris %>%
  count(precinct, stop_location_boro_name) %>%
  group_by(stop_location_boro_name)

g2 <- ggplot(saf_tigris_boro)  +
  geom_sf(aes(fill = n), color = "white", size = 0.05) +
  geom_sf_text(aes(label = precinct),
               size = 0.5) +
  facet_wrap(~stop_location_boro_name) +
  scale_fill_gradient(low = "cadetblue1",
                       high = "navyblue") +
  theme_maps

# Make it interactive!
ggplotly(g2, tooltip = c("precinct", "n"))
Code
# facet by race
saf_tigris_race <- saf_tigris %>%
  count(precinct, suspect_race_description) %>% 
  group_by(suspect_race_description)
# no missing values here

# Plot by race
g3 <- ggplot(saf_tigris_race)  +
  geom_sf(aes(fill = n), color = "white", size = 0.05) +
  geom_sf_text(aes(label = precinct),
               size = 0.5,
               color = "white") +
  facet_wrap(~suspect_race_description) +
  scale_fill_gradient(low = "cadetblue1",
                       high = "navyblue") +
  theme_maps

# Make it interactive!
ggplotly(g3, tooltip = c("precinct", "n"))
Code
saf_precinct_table <- saf_sub_pre %>% 
  count(precinct, suspect_race_description) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  drop_na(precinct) %>% 
  filter(BLACK < WHITE |
         BLACK < `WHITE HISPANIC`) %>% 
  pivot_longer(cols = c(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
               names_to = "race",
               values_to = "count") %>% 
  pivot_wider(names_from = precinct,
              values_from = count) 

table_names <-  names(saf_precinct_table)[-1]

table_names <- str_replace_all(table_names, 
                               pattern = " ", 
                               replacement = ",")

saf_precinct_table %>% 
  gt() %>% 
  tab_header(
    title = "Stops by Precincts",
    subtitle = "Precincts where Black people were stopped less than White and White Hispanic people"
  )  %>%
  tab_spanner(
    label = md("**precinct**"), 
    columns = vars(table_names)
  ) %>% 
  data_color(
    columns = vars(table_names),
    colors = scales::col_numeric(
      palette = c(
        "cadetblue1", "navyblue") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:17
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:17
    )
  ) 
Stops by Precincts
Precincts where Black people were stopped less than White and White Hispanic people
race precinct
33 34 50 62 66 68 72 104 108 109 110 111 112 115 122 123
BLACK 149 95 61 96 41 38 100 52 54 102 80 62 72 96 142 43
BLACK HISPANIC 120 218 45 13 35 4 42 38 29 39 39 11 19 37 14 4
WHITE 18 28 18 127 29 82 52 69 40 64 27 80 81 19 156 102
WHITE HISPANIC 180 266 67 70 46 55 269 134 99 112 278 61 66 281 53 25

Sentiment Analysis

In these electronic forms there are 2 columns where we can perform sentiment analysis. Sentiment analysis could be used to understand the emotional component of a text. Here I used VADER, a model used to measure the negative, positive, neutral and overall sentiment intensity of a text (Hutto, C.J. et. al., 2014). The first column I chose to run sentiment analysis on was the demeanor_of_person_stopped. Below is a list of the top 10 responses for the demeanor of a person stopped.

Code
saf_sub %>% 
  count(demeanor_of_person_stopped, sort = TRUE) %>% 
  drop_na(demeanor_of_person_stopped) %>% 
  slice(1:10) %>% 
  gt() %>% 
  tab_header(
    title = "Top 10 Descriptions of Demeanor of Person Stopped",
    subtitle = "Data from 2017-2019"
  ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'))
    ),
    locations = cells_body(
      columns = 1:2
    )
    ) %>% 
  tab_style(
    style = list(
      cell_text(size = "large",
                font = google_font('Lato'),
                weight = "bold")
    ),
    locations = cells_column_labels(
      columns = 1:2
    )
  )  %>% 
  cols_label(
    demeanor_of_person_stopped = "Demeanor"
  )
Top 10 Descriptions of Demeanor of Person Stopped
Data from 2017-2019
Demeanor n
CALM 7280
NERVOUS 3384
CPW 1817
UPSET 1525
ROBBERY 1480
NORMAL 1453
COOPERATIVE 793
PETIT LARCENY 727
COMPLIANT 648
ANGRY 542
Code
get_vader(("CALM"))[2:5]
compound      pos      neu      neg 
 "0.318"      "1"      "0"      "0" 
Code
get_vader(("NERVOUS"))[2:5]
compound      pos      neu      neg 
"-0.273"      "0"      "0"      "1" 
Code
get_vader(("NORMAL"))[2:5]
compound      pos      neu      neg 
     "0"      "0"      "1"      "0" 
Code
# select columns for further analysis
saf_dem_cols <- saf_sub %>% 
  select(suspect_race_description,
         suspect_sex,
         frisked_flag,
         suspect_arrested_flag,
         demeanor_of_person_stopped)

# obtain vader scores
saf_dem_scores <- 
  vader_df(saf_sub$demeanor_of_person_stopped)[3:6]

# join data frames
saf_dem_join <- cbind(saf_dem_cols, saf_dem_scores)
Code
# boxplot race, arrested
boxplot1 <-  saf_dem_join %>% 
  drop_na(suspect_race_description,
          compound) %>% 
ggplot(aes(x = suspect_arrested_flag,
                           y = compound)) +
  geom_boxplot(aes(fill = suspect_arrested_flag),
               outlier.fill = "red",
               position = "dodge2") +
  facet_grid(.~suspect_race_description) +
  scale_fill_manual(values = c("#7dc8c4", "tomato1")) +
  labs(x = "Arrested",
       fill = "Arrested") +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 10),
        panel.grid.major.y = element_line(color = "grey40"),
        panel.grid.minor.y = element_line(color = "grey40"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        strip.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        strip.text = element_text(size = 13,
                                  face = "bold"),
        legend.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"),
        legend.title = element_blank(),
        legend.text = element_text(size = 11,
                                   face = "bold"))

ggplotly(boxplot1)

Discussion and Conclusion

In the past 3 years NYPD has stopped young Black males most often. Over 50% of the total stops were of Black people and over 90% of the stops were of males. Geographically, the most stops were made in Brooklyn. The sentiment analysis using the VADER model provided some insightful results. Why is there a difference between the boxplots of arrested and not arrested people for Black and Hispanic people but no differences for White people? Further analysis of the text from the `demeanor_of_person_stopped variable would be of interest. The differences in sex for all racial groups is also evident. Females are said to have more negative demeanor in comparison to males. This could be due to the small sample size of females that are stopped.

There are numerous avenues that could be researched for further analysis from this dataset. Some possible questions: When are people of different races stopped during the day? For each race, what proportion of stops lead to physical force? In the precincts that do not stop Black people the most, what is the racial makeup of that precinct neighborhood?

There are a few questions to address from this dataset. First, these are only the recorded stops that are reported by the NYPD. There could be numerous stops that are not recorded by an officer for various reasons that could alter the data. Also, this data is inputted by the officer and thus they are essentially the data collectors. It would be important to somehow validate some of these stops, possibly by looking through police body cam footage.

This analysis further shows the explicit and implicit bias police have towards Black people. It is imperative during this time we all evaluate our bias toward people of different races. Police should be held to the highest standards since they are the ones tasked with keeping our communities safe and are in positions of authority. Unfortunately research has shown that the sometimes invasive and frequent stops have had detrimental health effects on minorities in racially diverse communities (Sewell A.A. et. al., 2016). Research to uncover biases towards minorities is extremely important and should be further investigated.

References

Citation

For attribution, please cite this work as

Khanjian (2021, March 14). Roupen Khanjian: Stop and Frisk in New York City. Retrieved from https://khanjian.github.io/roupen-website/texts/stop_frisk_R/

BibTeX citation

@misc{khanjian2021stop,
  author = {Khanjian, Roupen},
  title = {Roupen Khanjian: Stop and Frisk in New York City},
  url = {https://khanjian.github.io/roupen-website/texts/stop_frisk_R/},
  year = {2021}
}