US Police Killings Analysis
Posted on Dim 23 septembre 2018 in Data Analysis
US Police Killings¶
The data set represent shootings of civilians by police in the US. It contains information on each police killing in the US from January 2015 to June 2015.
The goal is to investigates on these shootings.
import pandas as pd
police_killings = pd.read_csv("police_killings.csv", encoding="ISO-8859-1")
police_killings.head(5)
police_killings.columns
count_race = police_killings["raceethnicity"].value_counts()
%matplotlib inline
import matplotlib.pyplot as plt
Shooting by Race¶
plt.bar(range(6), count_race.values)
plt.xticks(range(6), count_race.index, rotation="vertical")
plt.show()
count_race / sum(count_race)
Shootings By Regional Income¶
income = police_killings["p_income"][police_killings["p_income"] != '-'].astype('int')
plt.hist(income,bins=30)
plt.show()
police_killings["p_income"][police_killings["p_income"] != '-'].astype('int').median()
According to the Census, median personal income in the US is 28,567, and our median is 22,348, which means that shootings tend to happen in less affluent areas. Our sample size is relatively small, though, so it's hard to make conclusions.
Shootings By State¶
state_pop = pd.read_csv("state_population.csv")
counts = police_killings["state_fp"].value_counts()
#counts : Pandas Series, where the index is the code for each state,
#and the values are the numbers of police killings in each state.
states = pd.DataFrame({"STATE": counts.index, "shootings": counts})
states = state_pop.merge(states, on = "STATE")
# STATE is the common column that both states and state_pop share.
states["pop_millions"] = states["POPESTIMATE2015"]/1000000
states["rate"] = states["shootings"]/states["pop_millions"]
states.sort("rate")
States in the midwest and south seem to have the highest police killing rates, whereas those in the northeast seem to have the lowest.
pk = police_killings[(police_killings["share_white"] != "-")
& (police_killings["share_black"] != "-")
& (police_killings["share_hispanic"] != "-")]
pk["share_white"] = pk["share_white"].astype('float')
pk["share_black"] = pk["share_black"].astype('float')
pk["share_hispanic"] = pk["share_hispanic"].astype('float')
lowest_states = ["CT", "PA", "IA", "NY", "MA", "NH", "ME", "IL", "OH", "WI"]
highest_states = ["OK", "AZ", "NE", "HI", "AK", "ID", "NM", "LA", "CO", "DE"]
ls = pk[pk["state"].isin(lowest_states)]
hs = pk[pk["state"].isin(highest_states)]
Mean of the Lowest Shooting Rate¶
ls[["pop", "county_income",
"share_white", "share_black", "share_hispanic"]].mean()
Mean of the Highest Shooting Rate¶
hs[["pop", "county_income",
"share_white", "share_black", "share_hispanic"]].mean()
It looks like the states with low rates of shootings tend to have a higher proportion of blacks in the population, and a lower proportion of hispanics in the census regions where the shootings occur. It looks like the income of the counties where the shootings occur is higher.
States with high rates of shootings tend to have high hispanic population shares in the counties where shootings occur.
hs[["pop", "county_income",
"share_white", "share_black", "share_hispanic"]].describe()