Everyday, residents call New York City's 311 hotline (a 911 equivalent for nonemergencies) to request government services. These services range from simple requests for basic government services to responding to storm damage to filling potholes to confronting noisy neighbors. 311 allows New Yorkers to voice their needs and influence how the City allocates its resources. Unfortunately, New York City does not collect or report information on the callers themselves, making it difficult to evaluate the equity or efficacy of that allocation. This project attempts to address that gap, and ultimately, to help develop a better understanding of which New Yorkers use 311.
We dive into a slice of 311 data specific to street tree damage reported immediately following major storms. The City of New York makes 311 data available on its Open Data Portal, and we will investigate it using the software R. We will wrangle the data with spatial packages like sp, rgdal and maptools. Then, we will plot the data using shapefiles from the City's planning department and R's famous ggplot2 package. Finally, we will combine 311 with data from the US Census Bureau and the City's tree census to construct a statistical model using the package RStan.
Identifying which residents report storm damage has a variety of applications. For example, we could spread awareness of the 311 system to neighborhoods where, for whatever reason, residents report fewer incidents. Or, we can better understand spikes of 311 activity and create policies that more fairly distribute resources after a major storm. For this project, however, our choice of storms is of purely academic interest. We choose to look at tree damage because storms are well-defined, observable events. After each storm, we can safely assume that the tree damage being reported by a 311 user is damage generated from that storm. Since we can safely assume that service requests coming right after a storm will be due only to the storm, we can take advantage of spatial and temporal variations in the call rate to identify who uses 311.
However, the natural experiment provided to us by storms is a luxury not available to users of 311 data in general. In most cases, 311 use rates partially reflect agency specific strategies for addressing service requests. And these strategies can greatly complicate the interpretation of the data. For example, 311 rat sightings are heavily determined by neighborhood specific baiting campaigns by the Department of Health and Mental Hygiene which are themselves set according to reported 311 rat sightings. In situations such as these, it is not possible to adjust for the spatial and temporal changes due to agency policy.
This project began as a collaboration between Columbia University's School of International and Public Affairs and the New York City Mayor's Office and Department of Parks and Recreation. For more information on the data, partnership or preliminary findings, see Eshleman et al, 2013 [1].
The 311 data used here can be found on New York City's Open Data Portal. We've already downloaded the data into a directory called "data". The goal of this section is to explore 311 requests and link it to census tract-level socio-demographic and tree data, which will help us understand who is (and isn't) asking the City for help following major storms. We start by plotting the data over time. First, set up a working drive and load packages. If you have never used these packages, they will need to be "installed" first.
First, let's check to see if we have all the right packages to do this work. We'll first borrow the pkgTest script introduced in the Zillow tutorial, then loop through through a vector of relevant packages to check if a given package is present or needs to be installed:
## This function will check if a package is installed, and if not, install it
pkgTest <- function(x) {
if (!require(x, character.only = TRUE)) {
print(paste("installing",x))
install.packages(x, dep = TRUE)
if (!require(x, character.only = TRUE))
stop("Package not found")
} else{
print(paste(x,"found"))
}
}
## These lines load the required packages
packages <- c("rgeos", "rgdal", "maptools", "plyr", "reshape2", "ggplot2", "ggmap", "rstan","StanHeaders","gridExtra")
##Apply the pkgTest using lapply()
lapply(packages, pkgTest)
## [1] "rgeos found"
## [1] "rgdal found"
## [1] "maptools found"
## [1] "plyr found"
## [1] "reshape2 found"
## [1] "ggplot2 found"
## [1] "ggmap found"
## [1] "rstan found"
## [1] "StanHeaders found"
## [1] "gridExtra found"
## [[1]]
## [1] "rgeos found"
##
## [[2]]
## [1] "rgdal found"
##
## [[3]]
## [1] "maptools found"
##
## [[4]]
## [1] "plyr found"
##
## [[5]]
## [1] "reshape2 found"
##
## [[6]]
## [1] "ggplot2 found"
##
## [[7]]
## [1] "ggmap found"
##
## [[8]]
## [1] "rstan found"
##
## [[9]]
## [1] "StanHeaders found"
##
## [[10]]
## [1] "gridExtra found"
Next, we'll need to read-in 311 reports. There are two files in the /data directory: one for before 2010, and another for 2010 and after. The original files were obtained from here for pre-2010 and here for post-2010.
#311 Reports are divided into two datasets, each is provided in zipped form
#We read both datasets into R separately and then combine them together
temp = tempfile()
unzip("data/311_Service_Requests_for_2009.zip",exdir=getwd())
reports09 <- read.csv("311_Service_Requests_for_2009.csv")
unlink(temp)
temp = tempfile()
unzip("data/311_Service_Requests_from_2010_to_Present.zip",exdir=getwd())
reports10plus <- read.csv("311_Service_Requests_from_2010_to_Present.csv")
unlink(temp)
reports10plus$Resolution.Description <- NULL
#Append 09 with 10+ together
reports = rbind(reports09,reports10plus)
rm(reports09,reports10plus)
#To facilitate our analysis, we convert the report created date into a date object
reports$Date <- as.Date(reports$Created.Date,format="%m/%d/%Y")
#Here we provide the dates for 8 major storms
storm.dates <- as.Date(c("2009-08-18", #BILL
"2009-10-07",
"2010-03-13",
"2010-05-08",
"2010-09-16", #TORNADO
"2011-08-30", #IRENE
"2011-10-29",
"2012-10-29"), #SANDY
format = "%Y-%m-%d")
We plot the log of 311 requests by day over the study period and denote each storm by a vertical line. Daily 311 reports span several orders of magnitude, and the log transformation helps us to visualize these different orders on the same plot.
ggplot() +
theme_bw() +
geom_point(aes(x,log(freq,base=10),alpha=log(freq,base=10)),data=count(reports$Date)) +
geom_vline(aes(xintercept = as.numeric(storm.dates)),color="red",alpha=.7,linetype=2,
data = data.frame(storm.dates)) +
xlim(as.Date("2009-01-01",format="%Y-%m-%d"),as.Date("2013-01-01",format="%Y-%m-%d")) +
theme(legend.position="none") +
labs(x="day",y=expression(log[10](311)))
## Warning: Removed 1169 rows containing missing values (geom_point).