Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Note on exclusions #3

Open
moxboxwa opened this issue Sep 29, 2024 · 0 comments
Open

Note on exclusions #3

moxboxwa opened this issue Sep 29, 2024 · 0 comments

Comments

@moxboxwa
Copy link

moxboxwa commented Sep 29, 2024

I've also worked on matching Fatal Encounters data to FARS data for WA State for police pursuit related fatalities. In the process I have had numerous conversations with the person who runs the research department for the WA Traffic Safety Commision -- this is our local agency that collects, cleans and sends the WA State data to the US NTSC FARS. Like you, I used the police involvement flag to identify the cases in FARS.

The matching I ran produced a venn diagram, with about 50% of the cases identified in both the public (FE) and official (FARS) datasets, and the other half roughly evenly split between single identification in one or the other. All of the FE cases have media article backup, so they're verified. So I hand checked the FE cases not found in FARS for exact date/location matches that were not identified by the police involvement flag (found quite a few), and the remaining cases for nearby dates/locations (since the media sometimes get these wrong, only a couple of these seemed likely).

On the public/FE side, the reason for missing cases is clear: if there's no digital media trace to be scraped, the pursuit will be invisible.

But on the FARS side it's not obvious why cases would be missed.

When I asked the WTSC research director about the missing cases, she said that the police involvement flag is not a reliable indicator for many reasons. One is that the definition is very strict (I'd have to pull out my notes to detail this, will do if you're interested) another is that the system was not originally designed to capture this info, and it often just isn't entered. All of the data entry for this system is voluntary, and the effort made to verify this particular field varies from state to state, but is probably minimal, and more focused on removing identified cases if they don't meet the definition than adding unidentified cases.

I raise all of this b/c in your README you state: "We excluded fatalities identified by other research organizations if a) we could not find news reports or other public records indicating a pursuit occurred, and b) we could not find a match in NHTSA’s “pursuit-involved” fatal crash data in FARS." Because we see missing cases in both datasets (when the other dataset has the record), it's possible that some cases will be in neither dataset.

There is a statistical method for estimating this unobserved fraction -- "mark-recapture" aka "capture-recapture". If anyone in your organization is interested in working on something like this, pls lmk. I think there could be both an academic and a media article in this, and it might even provide a methodology that the Feds would approve to estimate this on a regular basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant