An analysis of the stop and frisk policy in NYC from 2017 - 2019.
# Packages
library(tidyverse) # Easily Install and Load the 'Tidyverse', CRAN v1.3.0
library(patchwork) # The Composer of Plots, CRAN v1.1.1
library(here) # A Simpler Way to Find Your Files, CRAN v1.0.1
library(janitor) # Simple Tools for Examining and Cleaning Dirty Data, CRAN v2.1.0
library(tidytext) # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools, CRAN v0.3.0
library(textdata) # Download and Load Various Text Datasets, CRAN v0.4.1
library(vader) # Valence Aware Dictionary and sEntiment Reasoner (VADER), CRAN v0.2.1
library(readxl) # Read Excel Files, CRAN v1.3.1
library(lubridate) # Make Dealing with Dates a Little Easier, CRAN v1.7.10
library(gt) # Easily Create Presentation-Ready Display Tables, CRAN v0.2.2
library(kableExtra) # Construct Complex Table with 'kable' and Pipe Syntax, CRAN v1.3.4
library(knitr) # A General-Purpose Package for Dynamic Report Generation in R, CRAN v1.31
library(plotly) # Create Interactive Web Graphics via 'plotly.js', CRAN v4.9.3
library(sf) # Simple Features for R, CRAN v0.9-7
library(tigris) # Load Census TIGER/Line Shapefiles, CRAN v1.0
This article was originally completed in python and can be found here: link
In mid 2020, protests took place all over the USA and the world in response to the murder of George Floyd by the Minneapolis Police. Since the advent of video recordings via cell phones, there have been many instances of police brutality captured against Black people. The relationship between law enforcement and the Black community has always been tenuous and now it has been brought to the attention of the rest of the world. In the landmark Supreme court case of Terry v. Ohio (1968) it was ruled that police could stop, question, and frisk a person if they have reasonable suspicion that the person had committed a crime (Brandes, S.A. et al., 2019).
In the 2000’s to the early 2010’s the New York City (NYC) stop and frisk policy garnered national attention due to the high number of stops and profiling of Black people. At the height of the policy, in 2011 there were 658,724 stops recorded with over 50% of the stops targeting Black people (NYCLU 2019). Since then, the number of stops per year has substantially decreased to 13,459 stop in 2019. Opponents of this policy argue this is still too many stops, especially since in 2019 about 66% of the people stopped were innocent. It has been also shown that the stopping of white people more likely led to an arrest in comparison to Black and Hispanic people, implying the police may be targeting minorities and being more mindful of stopping white people (Gelman, A., et. al., 2007). There has been substantial research conducted showing the psychological distress of a stop and frisk policy on communities of color in NYC (Sewell, A. et al., 2016).
In this project I analyze the Stop, Question and Frisk Data from the New York Police Department (NYPD) from the most current three years: 2017, 2018, and 2019 (NYC Stop and Frisk Data). I chose these years for the following reasons: The years 2018-2019 was not included in the most recent NYCLU report. In 2017 the NYPD moved to an electronic form, as opposed to manually writing down a response for each question in the handwritten forms used prior to 2017. Lastly 2017 was the first year of the Trump presidency and I was curious to investigate if his rhetoric on race may have affected law enforcement’s behavior toward minorities.
Data was downloaded from the NYPD website link. In each dataset each row is a stop of a specific person, and each column is a variable. There are a total of 83 different variables in each dataset.Below is a table of the first 10 stops from this dataset.
# Read in data
saf_17 <- read_excel(here("_texts", "stop_frisk_R", "data",
"sqf_2017.xlsx"))
saf_18 <- read_excel(here("_texts", "stop_frisk_R", "data",
"sqf_2018.xlsx"))
saf_19 <- read_excel(here("_texts", "stop_frisk_R", "data",
"sqf_2019.xlsx"))
# fix date column and select columns to be used for analysis
saf_17 <- saf_17 %>%
mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
select(
c(
"STOP_FRISK_DATE",
"STOP_FRISK_TIME",
"YEAR2",
"MONTH2",
"DAY2",
"ISSUING_OFFICER_RANK",
"SUSPECTED_CRIME_DESCRIPTION",
"SUSPECT_ARRESTED_FLAG",
"SUSPECT_ARREST_OFFENSE",
"OFFICER_IN_UNIFORM_FLAG",
"FRISKED_FLAG",
"SEARCHED_FLAG",
"OTHER_CONTRABAND_FLAG",
"FIREARM_FLAG",
"KNIFE_CUTTER_FLAG",
"OTHER_WEAPON_FLAG",
"WEAPON_FOUND_FLAG",
"PHYSICAL_FORCE_CEW_FLAG",
"PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
"PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
"PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
"PHYSICAL_FORCE_OTHER_FLAG",
"PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
"PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
"PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
"SUSPECTS_ACTIONS_CASING_FLAG",
"SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
"DEMEANOR_OF_PERSON_STOPPED",
"SUSPECT_REPORTED_AGE",
"SUSPECT_SEX",
"SUSPECT_RACE_DESCRIPTION",
"SUSPECT_BODY_BUILD_TYPE",
"SUSPECT_OTHER_DESCRIPTION",
"STOP_LOCATION_PRECINCT",
"STOP_LOCATION_FULL_ADDRESS",
"STOP_LOCATION_STREET_NAME",
"STOP_LOCATION_PATROL_BORO_NAME",
"STOP_LOCATION_BORO_NAME",
"SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
)
)
saf_18 <- saf_18 %>%
mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
select(
c(
"STOP_FRISK_DATE",
"STOP_FRISK_TIME",
"YEAR2",
"MONTH2",
"DAY2",
"ISSUING_OFFICER_RANK",
"SUSPECTED_CRIME_DESCRIPTION",
"SUSPECT_ARRESTED_FLAG",
"SUSPECT_ARREST_OFFENSE",
"OFFICER_IN_UNIFORM_FLAG",
"FRISKED_FLAG",
"SEARCHED_FLAG",
"OTHER_CONTRABAND_FLAG",
"FIREARM_FLAG",
"KNIFE_CUTTER_FLAG",
"OTHER_WEAPON_FLAG",
"WEAPON_FOUND_FLAG",
"PHYSICAL_FORCE_CEW_FLAG",
"PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
"PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
"PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
"PHYSICAL_FORCE_OTHER_FLAG",
"PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
"PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
"PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
"SUSPECTS_ACTIONS_CASING_FLAG",
"SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
"DEMEANOR_OF_PERSON_STOPPED",
"SUSPECT_REPORTED_AGE",
"SUSPECT_SEX",
"SUSPECT_RACE_DESCRIPTION",
"SUSPECT_BODY_BUILD_TYPE",
"SUSPECT_OTHER_DESCRIPTION",
"STOP_LOCATION_PRECINCT",
"STOP_LOCATION_FULL_ADDRESS",
"STOP_LOCATION_STREET_NAME",
"STOP_LOCATION_PATROL_BORO_NAME",
"STOP_LOCATION_BORO_NAME",
"SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
)
)
saf_19 <- saf_19 %>%
mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
select( # select for the years
c(
"STOP_FRISK_DATE",
"STOP_FRISK_TIME",
"YEAR2",
"MONTH2",
"DAY2",
"ISSUING_OFFICER_RANK",
"SUSPECTED_CRIME_DESCRIPTION",
"SUSPECT_ARRESTED_FLAG",
"SUSPECT_ARREST_OFFENSE",
"OFFICER_IN_UNIFORM_FLAG",
"FRISKED_FLAG",
"SEARCHED_FLAG",
"OTHER_CONTRABAND_FLAG",
"FIREARM_FLAG",
"KNIFE_CUTTER_FLAG",
"OTHER_WEAPON_FLAG",
"WEAPON_FOUND_FLAG",
"PHYSICAL_FORCE_CEW_FLAG",
"PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
"PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
"PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
"PHYSICAL_FORCE_OTHER_FLAG",
"PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
"PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
"PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
"SUSPECTS_ACTIONS_CASING_FLAG",
"SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
"DEMEANOR_OF_PERSON_STOPPED",
"SUSPECT_REPORTED_AGE",
"SUSPECT_SEX",
"SUSPECT_RACE_DESCRIPTION",
"SUSPECT_BODY_BUILD_TYPE",
"SUSPECT_OTHER_DESCRIPTION",
"STOP_LOCATION_PRECINCT",
"STOP_LOCATION_FULL_ADDRESS",
"STOP_LOCATION_STREET_NAME",
"STOP_LOCATION_PATROL_BORO_NAME",
"STOP_LOCATION_BORO_NAME",
"SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
)
)
# row bind all 3 years into one data frame
saf <- rbind(saf_17, saf_18, saf_19)
# clean column names
saf <- saf %>%
clean_names()
# view first 5 rows
head(saf, n = 10) %>%
kbl(caption = "<b style = 'color:black;'>
First ten rows of stop and frisk dataset.") %>%
kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
row_spec(0, color = "white", background = "#222222") %>%
scroll_box(width = "100%", height = "300px",
fixed_thead = list(enabled = T, background = "#222222"))
stop_frisk_date | stop_frisk_time | year2 | month2 | day2 | issuing_officer_rank | suspected_crime_description | suspect_arrested_flag | suspect_arrest_offense | officer_in_uniform_flag | frisked_flag | searched_flag | other_contraband_flag | firearm_flag | knife_cutter_flag | other_weapon_flag | weapon_found_flag | physical_force_cew_flag | physical_force_draw_point_firearm_flag | physical_force_handcuff_suspect_flag | physical_force_oc_spray_used_flag | physical_force_other_flag | physical_force_restraint_used_flag | physical_force_verbal_instruction_flag | physical_force_weapon_impact_flag | suspects_actions_casing_flag | suspects_actions_proximity_to_scene_flag | demeanor_of_person_stopped | suspect_reported_age | suspect_sex | suspect_race_description | suspect_body_build_type | suspect_other_description | stop_location_precinct | stop_location_full_address | stop_location_street_name | stop_location_patrol_boro_name | stop_location_boro_name | suspects_actions_drug_transactions_flag |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2017-01-16 | 1899-12-31 14:26:00 | 2017 | January | Monday | SGT | TERRORISM | N | (null) | Y | N | N | N | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | (null) | Y | (null) | (null) | (null) | TERRORISM | 39 | MALE | (null) | THN | (null) | 1 | 180 GREENWICH STREET | GREENWICH STREET | PBMS | MANHATTAN | (null) |
2017-01-16 | 1899-12-31 14:26:00 | 2017 | January | Monday | SGT | TERRORISM | N | (null) | Y | N | N | N | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | (null) | Y | (null) | (null) | (null) | TERRORISM | 37 | MALE | (null) | MED | (null) | 1 | 180 GREENWICH STREET | GREENWICH STREET | PBMS | MANHATTAN | (null) |
2017-02-08 | 1899-12-31 11:10:00 | 2017 | February | Wednesday | POM | OTHER | N | (null) | N | N | N | N | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | (null) | Y | (null) | (null) | (null) | OTHER | (null) | FEMALE | WHITE | THN | N/A | 1 | WALL STREET && BROADWAY | WALL STREET | PBMS | MANHATTAN | (null) |
2017-02-20 | 1899-12-31 11:35:00 | 2017 | February | Monday | POM | GRAND LARCENY AUTO | N | (null) | Y | Y | Y | N | (null) | (null) | (null) | N | (null) | (null) | Y | (null) | (null) | (null) | (null) | (null) | (null) | (null) | GRAND LARCENY AUTO | 31 | MALE | BLACK HISPANIC | U | UNK | 1 | 75 GREENE STREET | GREENE STREET | PBMS | MANHATTAN | (null) |
2017-02-21 | 1899-12-31 13:20:00 | 2017 | February | Tuesday | POM | BURGLARY | N | (null) | Y | N | N | N | (null) | (null) | (null) | N | (null) | (null) | Y | (null) | (null) | (null) | (null) | (null) | (null) | Y | BURGLARY | (null) | FEMALE | BLACK | THN | (null) | 1 | 429 WEST BROADWAY | WEST BROADWAY | PBMS | MANHATTAN | (null) |
2017-02-17 | 1899-12-31 21:25:00 | 2017 | February | Friday | POM | CPW | Y | MENACING | Y | Y | Y | N | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | (null) | Y | (null) | (null) | (null) | CPW | 39 | MALE | WHITE HISPANIC | MED | (null) | 1 | WEST STREET && CHAMBERS STREET | WEST STREET | PBMS | MANHATTAN | (null) |
2017-02-25 | 1899-12-31 20:00:00 | 2017 | February | Saturday | POM | CPW | N | (null) | Y | N | N | N | (null) | (null) | (null) | N | (null) | (null) | Y | (null) | (null) | Y | Y | (null) | (null) | (null) | CPW | 19 | FEMALE | BLACK | THN | (null) | 1 | CHAMBERS STREET && WEST BROADWAY | CHAMBERS STREET | PBMS | MANHATTAN | (null) |
2017-02-25 | 1899-12-31 19:58:00 | 2017 | February | Saturday | POM | CPW | N | (null) | Y | Y | Y | N | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | Y | Y | (null) | (null) | Y | NORMAL | 15 | FEMALE | BLACK | THN | (null) | 1 | CHAMBERS STREET && WEST BROADWAY | CHAMBERS STREET | PBMS | MANHATTAN | (null) |
2017-02-21 | 1899-12-31 13:15:00 | 2017 | February | Tuesday | POM | BURGLARY | N | (null) | Y | Y | Y | N | (null) | (null) | (null) | N | (null) | (null) | Y | (null) | (null) | (null) | Y | (null) | (null) | Y | PLEASANT | 43 | MALE | BLACK | HEA | (null) | 1 | 429 WEST BROADWAY | WEST BROADWAY | PBMS | MANHATTAN | (null) |
2017-03-03 | 1899-12-31 08:16:00 | 2017 | March | Friday | POM | CRIMINAL MISCHIEF | Y | CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE | Y | Y | Y | Y | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | (null) | Y | (null) | (null) | (null) | NERVOUS | 33 | MALE | BLACK | THN | (null) | 1 | CORTLANDT STREET && CHURCH STREET | CORTLANDT STREET | PBMS | MANHATTAN | (null) |
nrow(saf)
[1] 36096
saf %>%
count(suspect_race_description) %>%
summarise(race = suspect_race_description,
n = n,
percent = n / sum(n)) %>%
gt() %>%
tab_header(
title = "Which Race was Stop and Frisked the Most?",
subtitle = "Data from 2017-2019"
) %>%
fmt_percent(
columns = vars(percent),
decimals = 2
) %>%
data_color(
columns = vars(n, percent),
colors = scales::col_numeric(
palette = c(
"lightskyblue", "dodgerblue", "royalblue4") ,
domain = NULL
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:3
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:3
)
)
Which Race was Stop and Frisked the Most? | ||
---|---|---|
Data from 2017-2019 | ||
race | n | percent |
(null) | 420 | 1.16% |
AMER IND | 9 | 0.02% |
AMERICAN INDIAN/ALASKAN N | 8 | 0.02% |
AMERICAN INDIAN/ALASKAN NATIVE | 16 | 0.04% |
ASIAN / PACIFIC ISLANDER | 522 | 1.45% |
ASIAN/PAC.ISL | 206 | 0.57% |
BLACK | 20817 | 57.67% |
BLACK HISPANIC | 3102 | 8.59% |
MALE | 7 | 0.02% |
WHITE | 3266 | 9.05% |
WHITE HISPANIC | 7723 | 21.40% |
In the above table we see that the people stopped were mostly described as have one of the following four races: Black, White Hispanic, White, or Black Hispanic.
saf_sub <- saf %>%
# filter for race in one of the four most saf
filter(suspect_race_description %in% c("BLACK",
"WHITE",
"WHITE HISPANIC",
"BLACK HISPANIC")) %>%
# Fill in NA values for nonsense descriptions
# also change misspellings
mutate(suspect_other_description = case_when(
suspect_other_description == "(null)" ~ NA_character_,
suspect_other_description == "NONE" ~ NA_character_,
suspect_other_description == "UNK" ~ NA_character_,
suspect_other_description == "UNKNOWN" ~ NA_character_,
suspect_other_description == "UNKNOW" ~ NA_character_,
suspect_other_description == "NO" ~ NA_character_,
suspect_other_description == "UKNOWN" ~ NA_character_,
suspect_other_description == "UNKOWN" ~ NA_character_,
suspect_other_description == "NA" ~ NA_character_,
suspect_other_description == "N/A" ~ NA_character_,
suspect_other_description == "0" ~ NA_character_,
suspect_other_description == 0 ~ NA_character_,
suspect_other_description == "TATOOS" ~ "TATTOOS",
suspect_other_description == "TATTOS" ~ "TATTOOS",
suspect_other_description == "TATOO" ~ "TATTOOS",
TRUE ~ as.character(suspect_other_description)
)) %>%
# Change unknown sex values to NA
mutate(suspect_sex = case_when(
suspect_sex == "(null)" ~ NA_character_,
suspect_sex == "19" ~ NA_character_,
suspect_sex == "23" ~ NA_character_,
suspect_sex == "24" ~ NA_character_,
suspect_sex == "39" ~ NA_character_,
TRUE ~ as.character(suspect_sex)
)) %>%
# Change (null) value to No (N) (need to find a better way to do this step)
mutate(physical_force_cew_flag = case_when(
physical_force_cew_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_cew_flag)
)) %>%
mutate(physical_force_draw_point_firearm_flag = case_when(
physical_force_draw_point_firearm_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_draw_point_firearm_flag)
)) %>%
mutate(physical_force_handcuff_suspect_flag = case_when(
physical_force_handcuff_suspect_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_handcuff_suspect_flag)
)) %>%
mutate(physical_force_oc_spray_used_flag = case_when(
physical_force_oc_spray_used_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_oc_spray_used_flag)
)) %>%
mutate(physical_force_other_flag = case_when(
physical_force_other_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_other_flag)
)) %>%
mutate(physical_force_restraint_used_flag = case_when(
physical_force_restraint_used_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_restraint_used_flag)
)) %>%
mutate(physical_force_verbal_instruction_flag = case_when(
physical_force_verbal_instruction_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_verbal_instruction_flag)
)) %>%
mutate(physical_force_weapon_impact_flag = case_when(
physical_force_weapon_impact_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_weapon_impact_flag)
)) %>%
mutate(suspects_actions_drug_transactions_flag = case_when(
suspects_actions_drug_transactions_flag == "(null)" ~ "N",
TRUE ~ as.character(suspects_actions_drug_transactions_flag)
)) %>%
mutate(suspects_actions_proximity_to_scene_flag = case_when(
suspects_actions_proximity_to_scene_flag == "(null)" ~ "N",
TRUE ~ as.character(suspects_actions_proximity_to_scene_flag)
)) %>%
mutate(other_contraband_flag = case_when(
other_contraband_flag == "(null)" ~ "N",
TRUE ~ as.character(other_contraband_flag)
)) %>%
mutate(firearm_flag = case_when(
firearm_flag == "(null)" ~ "N",
TRUE ~ as.character(firearm_flag)
)) %>%
mutate(knife_cutter_flag = case_when(
knife_cutter_flag == "(null)" ~ "N",
TRUE ~ as.character(knife_cutter_flag)
)) %>%
mutate(weapon_found_flag = case_when(
weapon_found_flag == "(null)" ~ "N",
weapon_found_flag == "(" ~ "N",
TRUE ~ as.character(weapon_found_flag)
)) %>%
mutate(suspect_arrest_offense = case_when(
suspect_arrest_offense == "(null)" ~ "NO ARREST",
TRUE ~ as.character(suspect_arrest_offense)
)) %>%
mutate(stop_location_boro_name = case_when(
stop_location_boro_name == "(null)" ~ NA_character_,
stop_location_boro_name == "STATEN IS" ~ "STATEN ISLAND",
stop_location_boro_name == "PBBX" ~ "BRONX",
stop_location_boro_name == "PBBN" ~ "BROOKLYN",
stop_location_boro_name == "PBMN" ~ "MANHATTAN",
stop_location_boro_name == "0208760" ~ NA_character_,
stop_location_boro_name == "0190241" ~ NA_character_,
stop_location_boro_name == "0986759" ~ NA_character_,
stop_location_boro_name == "PBMS" ~ "MANHATTAN",
stop_location_boro_name == "0210334" ~ NA_character_,
stop_location_boro_name == "PBSI" ~ "STATEN ISLAND",
stop_location_boro_name == "0237177" ~ NA_character_,
stop_location_boro_name == "PBBS" ~ "BROOKLYN",
stop_location_boro_name == "0155070" ~ NA_character_,
stop_location_boro_name == "0208169" ~ NA_character_,
TRUE ~ as.character(stop_location_boro_name)
)) %>%
mutate(stop_location_precinct = case_when(
stop_location_precinct == "(null)" ~ NA_integer_,
stop_location_precinct == 208760 ~ NA_integer_,
TRUE ~ as.integer(stop_location_precinct)
)) %>%
mutate(demeanor_of_person_stopped = case_when(
demeanor_of_person_stopped == "IRRATE" ~ "IRATE",
demeanor_of_person_stopped == "1" ~ NA_character_,
demeanor_of_person_stopped == "NEVEVOUS" ~ "NERVOUS",
demeanor_of_person_stopped == 1 ~ NA_character_,
demeanor_of_person_stopped == "N/A" ~ NA_character_,
TRUE ~ as.character(demeanor_of_person_stopped)
)) %>%
mutate(suspect_reported_age = case_when(
suspect_reported_age == "(null)" ~ NA_character_,
suspect_reported_age == "0" ~ NA_character_,
suspect_reported_age == "1" ~ NA_character_,
TRUE ~ as.character(suspect_reported_age)
)) %>%
mutate(suspect_reported_age = as.integer(suspect_reported_age))
Due to the nature of the dataset, there had to be some data cleaning conducted in order to organize some of the columns:
For most of the columns that are flags, converted (null) or other values to N
(No) values.
Changed non male or female values to NA
in the suspect_sex
column.
In the suspect_other_description
, classified null of other values that were obviously null values to NA
.
In the demeanor_of_person_stopped
corrected some misspellings and classified some observations to NA
if no demeanor was noted.
Organized the stop_location_boro_name
so that it represents one of the 5 boroughs of NYC or NA
if it was not clear how to classify the stop.
Changed the (null) values for the suspect_arrest_offense
to “No Arrest”.
Finally, converted missing values in the stop_location_precinct
column to NA
since the precincts are numbered from 1 to 123.
The cleaned and updated dataset includes 34,784 observations omitting the stops conducting on people that were not described as Black, White, Black Hispanic, or White Hispanic.
# Theme for a few of the EDA plots
theme_eda <-
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text = element_text(size = 11,
face = "bold"),
axis.title.x = element_text(size = 13,
face = "bold"),
axis.title.y = element_blank(),
panel.grid.major.x = element_line(color = "grey54"),
panel.grid.minor.x = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
plot.title = element_text(face = "bold",
size = 14))
# stops by race
saf_sub %>%
count(suspect_race_description) %>%
ggplot(aes(x = n, y = fct_reorder(suspect_race_description, n))) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Race",
title = "Number of Stops by Race") +
theme_minimal() +
theme_eda
# stops by sex
saf_sub %>%
count(suspect_sex) %>%
drop_na() %>%
ggplot(aes(x = n, y = fct_reorder(suspect_sex, n))) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Sex",
title = "Number of Stops by Sex") +
theme_minimal() +
theme_eda
# weekday
d1 <- saf_sub %>%
count(day2) %>%
ggplot(aes(x = n, y = fct_reorder(day2, n))) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Weekday") +
theme_minimal() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text = element_text(size = 10,
face = "bold"),
axis.title = element_blank(),
panel.grid.major.x = element_line(color = "grey54"),
panel.grid.minor.x = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
# month
m1 <- saf_sub %>%
count(month2) %>%
ggplot(aes(x = n, y = fct_reorder(month2, n))) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Month") +
theme_minimal() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text = element_text(size = 10,
face = "bold"),
axis.title = element_blank(),
panel.grid.major.x = element_line(color = "grey54"),
panel.grid.minor.x = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
# year
y1 <- saf_sub %>%
count(year2) %>%
mutate(year = fct_reorder(factor(year2), n)) %>%
ggplot(aes(x = n, y = year)) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Month") +
theme_minimal() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text = element_text(size = 12,
face = "bold"),
axis.title = element_blank(),
panel.grid.major.x = element_line(color = "grey54"),
panel.grid.minor.x = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
(d1 + m1) / y1 +
plot_annotation(title = "Number of Stops by Weekday, Month, and Year") &
theme(plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
strip.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
plot.title = element_text(face = "bold",
size = 14))
Over 5500 of the stops occurred on Saturdays, while less than 4000 stops occurred on Mondays.
The least number of stops were made in December, while the Spring months of March, April and May had the highest number of stops.
saf_sub %>%
drop_na(suspect_reported_age) %>%
filter(suspect_reported_age >= 10 &
suspect_reported_age <= 80) %>%
ggplot(aes(x = suspect_reported_age)) +
geom_histogram(fill = "#7dc8c4",color = "white",
binwidth = 5) +
scale_x_continuous(breaks = seq(10, 80, 10)) +
labs(x = "Age",
y = "Number of Stops",
title = "Number of Stops by Age") +
theme_minimal() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.text = element_text(size = 11,
face = "bold"),
axis.title.x = element_text(size = 13,
face = "bold"),
axis.title.y = element_blank(),
panel.grid.major.y = element_line(color = "grey54"),
panel.grid.minor.y = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
plot.title = element_text(face = "bold",
size = 14))
saf_sub %>%
count(stop_location_boro_name) %>%
drop_na() %>%
ggplot(aes(x = n, y = fct_reorder(stop_location_boro_name, n))) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Borough",
title = "Number of Stops by Borough") +
theme_minimal() +
theme_eda
plotly1 <- saf_sub %>%
drop_na(suspect_reported_age, suspect_race_description) %>%
filter(suspect_reported_age >= 10 &
suspect_reported_age <= 80) %>%
count(suspect_race_description, suspect_reported_age) %>%
ggplot(aes(x = suspect_race_description, y = suspect_reported_age)) +
geom_tile(aes(fill = n)) +
scale_fill_gradient(low = "cadetblue",
high = "navyblue") +
scale_y_continuous(breaks = seq(10, 80, 5)) +
labs(x = "Race") +
theme_minimal() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.text = element_text(size = 11,
face = "bold"),
axis.title.x = element_text(size = 14,
face = "bold"),
axis.title.y = element_blank(),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
ggplotly(plotly1)
saf_sub %>%
count(suspect_race_description, suspected_crime_description) %>%
pivot_wider(names_from = suspect_race_description,
values_from = n) %>%
gt() %>%
tab_header(
title = "Race and Alleged Crime Description",
subtitle = "Data from 2017-2019"
) %>%
data_color(
columns = vars(BLACK,`BLACK HISPANIC`,
WHITE,`WHITE HISPANIC`),
colors = scales::col_numeric(
palette = c(
"cadetblue1", "navyblue") ,
domain = NULL
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:5
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:5
)
) %>%
cols_label(
suspected_crime_description = "Crime"
)
Race and Alleged Crime Description | ||||
---|---|---|---|---|
Data from 2017-2019 | ||||
Crime | BLACK | BLACK HISPANIC | WHITE | WHITE HISPANIC |
ASSAULT | 2592 | 417 | 342 | 1021 |
AUTO STRIPPIG | 67 | 19 | 17 | 47 |
BURGLARY | 1213 | 215 | 458 | 631 |
CPSP | 96 | 19 | 26 | 54 |
CPW | 6205 | 872 | 551 | 1948 |
CRIMINAL MISCHIEF | 342 | 45 | 82 | 150 |
CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE | 124 | 10 | 69 | 62 |
CRIMINAL POSSESSION OF FORGED INSTRUMENT | 34 | 4 | 3 | 6 |
CRIMINAL POSSESSION OF MARIHUANA | 365 | 61 | 24 | 166 |
CRIMINAL SALE OF CONTROLLED SUBSTANCE | 115 | 25 | 47 | 80 |
CRIMINAL SALE OF MARIHUANA | 57 | 9 | 6 | 23 |
CRIMINAL TRESPASS | 1020 | 160 | 202 | 482 |
FORCIBLE TOUCHING | 59 | 7 | 16 | 35 |
GRAND LARCENY | 895 | 152 | 115 | 273 |
GRAND LARCENY AUTO | 577 | 94 | 136 | 271 |
MAKING GRAFFITI | 36 | 15 | 54 | 68 |
MENACING | 401 | 67 | 56 | 163 |
MISD | 1 | 2 | NA | 1 |
MISDEMEANOR | 1 | 2 | NA | 1 |
MURDER | 46 | 7 | 4 | 21 |
OTHER | 871 | 130 | 152 | 321 |
PETIT LARCENY | 1870 | 216 | 449 | 607 |
PROSTITUTION | 13 | 1 | 3 | 8 |
RAPE | 43 | 8 | 8 | 15 |
RECKLESS ENDANGERMENT | 142 | 18 | 7 | 44 |
ROBBERY | 3415 | 500 | 290 | 1104 |
TERRORISM | 7 | 1 | 17 | 3 |
THEFT OF SERVICES | 96 | 8 | 25 | 30 |
UNAUTHORIZED USE OF A VEHICLE | 114 | 18 | 106 | 88 |
FELONY | NA | NA | 1 | NA |
saf_sub %>%
filter(suspected_crime_description == "CPW") %>%
count(suspect_race_description, weapon_found_flag) %>%
pivot_wider(names_from = suspect_race_description,
values_from = n) %>%
gt() %>%
tab_header(
title = "Race and Weapons for CPW",
subtitle = "Data from 2017-2019"
) %>%
data_color(
columns = vars(BLACK,`BLACK HISPANIC`,
WHITE,`WHITE HISPANIC`),
colors = scales::col_numeric(
palette = c(
"cadetblue1", "navyblue") ,
domain = NULL
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:5
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:5
)
) %>%
cols_label(
weapon_found_flag = "Weapon Found?"
)
Race and Weapons for CPW | ||||
---|---|---|---|---|
Data from 2017-2019 | ||||
Weapon Found? | BLACK | BLACK HISPANIC | WHITE | WHITE HISPANIC |
N | 5046 | 701 | 425 | 1506 |
Y | 1159 | 171 | 126 | 442 |
saf_sub %>%
count(suspect_race_description, frisked_flag) %>%
pivot_wider(names_from = suspect_race_description,
values_from = n) %>%
gt() %>%
tab_header(
title = "Race and Frisked",
subtitle = "Data from 2017-2019"
) %>%
data_color(
columns = vars(BLACK,`BLACK HISPANIC`,
WHITE,`WHITE HISPANIC`),
colors = scales::col_numeric(
palette = c(
"cadetblue1", "navyblue") ,
domain = NULL
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:5
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:5
)
) %>%
cols_label(
frisked_flag = "Frisked?"
)
Race and Frisked | ||||
---|---|---|---|---|
Data from 2017-2019 | ||||
Frisked? | BLACK | BLACK HISPANIC | WHITE | WHITE HISPANIC |
N | 8231 | 1167 | 1808 | 3137 |
Y | 12586 | 1935 | 1458 | 4586 |
saf_sub %>%
count(suspect_race_description, suspect_arrested_flag) %>%
pivot_wider(names_from = suspect_race_description,
values_from = n) %>%
gt() %>%
tab_header(
title = "Race and Arrest",
subtitle = "Data from 2017-2019"
) %>%
data_color(
columns = vars(BLACK,`BLACK HISPANIC`,
WHITE,`WHITE HISPANIC`),
colors = scales::col_numeric(
palette = c(
"cadetblue1", "navyblue") ,
domain = NULL
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:5
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:5
)
) %>%
cols_label(
suspect_arrested_flag = "Arrested?"
)
Race and Arrest | ||||
---|---|---|---|---|
Data from 2017-2019 | ||||
Arrested? | BLACK | BLACK HISPANIC | WHITE | WHITE HISPANIC |
N | 14748 | 2165 | 2209 | 5179 |
Y | 6069 | 937 | 1057 | 2544 |
One of the features of this dataset was the inclusion of where the stop occurred, specifically which precinct. Using the shapefile of the NYPD precincts I was able to spatially visualize where in New York City the stops occurred. Utilizing the sf
package and plotly
in the interactive plots below, I display some maps of stop and frisk data in New York City. There has been considerable work done to analyze precinct level stops and associated crime rates within each precinct (Levchak P.J. 2017).
## nyc shape file
saf_shape <- read_sf(here("_texts", "stop_frisk_R", "data",
"nypp_21a", "nypp.shp")) %>%
clean_names()
# rename precinct in df
saf_sub_pre <- saf_sub %>%
rename(precinct = stop_location_precinct)
## tigris
# combine spatial df and df
saf_tigris <- geo_join(saf_shape, saf_sub_pre,
by = "precinct", how = "inner")
# overall precinct numbers
saf_tigris_counts <- saf_tigris %>%
count(precinct)
theme_maps <-
theme(strip.text = element_text(face = "bold",
size = 10),
strip.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
legend.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
legend.text = element_text(size = 11,
face = "bold"),
axis.ticks = element_blank())
g1 <- ggplot(saf_tigris_counts) +
geom_sf(aes(fill = n),
colour = "white",
size = 0.1) +
geom_sf_text(aes(label = precinct),
size = 1.15) +
scale_fill_gradient(low = "cadetblue1",
high = "navyblue") +
theme_maps
# Make it interactive!
ggplotly(g1, tooltip = c("precinct", "n"))
# facet by borough
saf_tigris_boro <- saf_tigris %>%
count(precinct, stop_location_boro_name) %>%
group_by(stop_location_boro_name)
g2 <- ggplot(saf_tigris_boro) +
geom_sf(aes(fill = n), color = "white", size = 0.05) +
geom_sf_text(aes(label = precinct),
size = 0.5) +
facet_wrap(~stop_location_boro_name) +
scale_fill_gradient(low = "cadetblue1",
high = "navyblue") +
theme_maps
# Make it interactive!
ggplotly(g2, tooltip = c("precinct", "n"))
# facet by race
saf_tigris_race <- saf_tigris %>%
count(precinct, suspect_race_description) %>%
group_by(suspect_race_description)
# no missing values here
# Plot by race
g3 <- ggplot(saf_tigris_race) +
geom_sf(aes(fill = n), color = "white", size = 0.05) +
geom_sf_text(aes(label = precinct),
size = 0.5,
color = "white") +
facet_wrap(~suspect_race_description) +
scale_fill_gradient(low = "cadetblue1",
high = "navyblue") +
theme_maps
# Make it interactive!
ggplotly(g3, tooltip = c("precinct", "n"))