Fare Evasion Data

In April 2013, I filed a FOIA request for data on fare evasion arrests in 2012. My request was bounced from the MTA to the NYPD, rejected, and then approved on appeal. For $17.50, I received 70 printed pages of tables. I was told that there was no way I would be able to get this data as a digital file. Which, okay.

Since then I've been filing requests every three months to get basically "quarterly reports" of fare evasion arrests. They return it in bascially a differently formatted PDF each time. Because they can.

I'm working on making this data machine-readable and available for people to use, and it would be cool if other people helped.

##Files

2013 and 2014 should be self-explanatory.

Original scanned PDFs (thanks to Dave Riordan for scanning and OCR-ing the 2013 data) are in the pdf folder. 2013 PDFs are separated by aggregation type, 2014 just has single files.

Data-fied versions are currently going into the csv folder. I used Tabula to csv-ify the fare-age.pdf file and the fare-gender.pdf file.

##Data Structure

There are 3 categories of data I requested: arrests by age, by gender, and by race. The data were given in aggregate. This is kind of annoying since it offers only flat analyses: we know how many 19-24-year-olds were arrested, and how many Hispanic people were arrested, and how many females were arrested, but not how many of those 19-24-year-olds were Hispanic or female. It's nowhere near as rich as Stop, Question, and Frisk (SQF) data.

###2013 In each dataset, the arrests are organized by what appears to be station entrance rather than stations in total. Most of these are listed as cross-streets, with "&&" instead of "&". For example, there's a MANHATTAN AVENUE && NASSAU AVENUE-BROOKLYN entry and a NASSAU AVENUE && MANHATTAN AVENUE-BROOKLYN for the Nassau G stop.

I am not sure if the naming convention can helpfully indicate which station entrance it refers to (especially since Nassau has, according to the city's dataset of station entrances, six entrances, two of which are actually at Manhattan Avenue & (or &&) Norman Avenue).

Right now I'm thinking it makes the most sense to geocode all the rows of data, then aggregate those rows based on either proximity to each other (faster code-y solution) or based on just verifying stations that are part of the same system (solution that is more time consuming but accounts for stations that have really spread-out entrances, like the Union Square station which has 12 entrances across 3 city blocks (and 7 entries in the FOIA data)). Totally welcome other ideas.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
2013		2013
2014		2014
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fare Evasion Data

About

Releases

Packages

lifewinning/Fare-Evasion-Arrests

Folders and files

Latest commit

History

Repository files navigation

Fare Evasion Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages