Note on exclusions #3

moxboxwa · 2024-09-29T00:28:22Z

I've also worked on matching Fatal Encounters data to FARS data for WA State for police pursuit related fatalities. In the process I have had numerous conversations with the person who runs the research department for the WA Traffic Safety Commision -- this is our local agency that collects, cleans and sends the WA State data to the US NTSC FARS. Like you, I used the police involvement flag to identify the cases in FARS.

The matching I ran produced a venn diagram, with about 50% of the cases identified in both the public (FE) and official (FARS) datasets, and the other half roughly evenly split between single identification in one or the other. All of the FE cases have media article backup, so they're verified. So I hand checked the FE cases not found in FARS for exact date/location matches that were not identified by the police involvement flag (found quite a few), and the remaining cases for nearby dates/locations (since the media sometimes get these wrong, only a couple of these seemed likely).

On the public/FE side, the reason for missing cases is clear: if there's no digital media trace to be scraped, the pursuit will be invisible.

But on the FARS side it's not obvious why cases would be missed.

When I asked the WTSC research director about the missing cases, she said that the police involvement flag is not a reliable indicator for many reasons. One is that the definition is very strict (I'd have to pull out my notes to detail this, will do if you're interested) another is that the system was not originally designed to capture this info, and it often just isn't entered. All of the data entry for this system is voluntary, and the effort made to verify this particular field varies from state to state, but is probably minimal, and more focused on removing identified cases if they don't meet the definition than adding unidentified cases.

I raise all of this b/c in your README you state: "We excluded fatalities identified by other research organizations if a) we could not find news reports or other public records indicating a pursuit occurred, and b) we could not find a match in NHTSA’s “pursuit-involved” fatal crash data in FARS." Because we see missing cases in both datasets (when the other dataset has the record), it's possible that some cases will be in neither dataset.

There is a statistical method for estimating this unobserved fraction -- "mark-recapture" aka "capture-recapture". If anyone in your organization is interested in working on something like this, pls lmk. I think there could be both an academic and a media article in this, and it might even provide a methodology that the Feds would approve to estimate this on a regular basis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Note on exclusions #3

Note on exclusions #3

moxboxwa commented Sep 29, 2024 •

edited

Loading

Note on exclusions #3

Note on exclusions #3

Comments

moxboxwa commented Sep 29, 2024 • edited Loading

moxboxwa commented Sep 29, 2024 •

edited

Loading