Skip to content

SamanthaToet/va-evictions

 
 

Repository files navigation

Last updated: 2023-05-16

The Virginia Evictors Catalog provides data on plaintiffs filing unlawful detainers (evictions) in Virginia's General District Courts.

This repository contains code for cleaning, standardizing, and aggregating case data and for building the app. Raw case data are provided by the non-profit Legal Services Corporation. (An earlier version of this app used data gathered by Ben Schoenfeld.)

Details about the data cleaning and standardization process are in the expandable section below. The process draws heavily on the ECtools R package, developed by UVA Library Research Data Services/StatLab.

Data cleaning and standardization process
Case data are provided periodically by the Legal Services Corporation. Data on plaintiffs, defendants, hearings, etc. are provided separately; we aggregate all the data for a given case and identify a "primary" plaintiff name, plaintiff address, defendant name, and defendant address for each case based on the first-listed plaintiff/defendant in each court record. We perform this step because many cases have multiple plaintiffs and/or defendants listed.

Names in the case data have both formatting inconsistencies and errors. If left unaddressed, these would radically hamper our ability to identify multiple cases filed by the same defendant (e.g., "ABC REAL ESTATE, LLC" and "ABC REAL-ESTATE LLC" would be treated as separate plaintiffs). Plaintiff names are cleaned and standardized by the Legal Services Corporation using their CleanCourt Python library; standardization of defendant names and other data-cleaning processes are implemented using the ECtools package built by UVA Library StatLab. For plaintiff names, LSC identifies probable misspellings/alternative spellings of the same entity using term frequency–inverse document frequency and cosine similarity measures to reduce a set of messy plaintiff names a cleaned set (e.g., identifying "ABC ENTERPRISES" and "ABC ENTERPRESES" as the same entity and labelling them both as "ABC ENTERPRISES").

With cleaned and standardized names in hand, we then remove duplicate records by identifying cases that have the same filing date, plaintiff name, defendant name, defendant ZIP Code, judgment (outcome), judgment costs, attorney fees, and principal/other amounts. (We retain one record for each set of duplicate case.)

We then identify "serial cases," which we consider to be repeated cases filed by a given plaintiff against a given defendant in a given ZIP Code within a 12-month period.

We then identify and filter out non-residential defendants by using a custom-developed regex pattern, as we display results in the app for cases against residential defendants only. You can view full regex pattern here.

Cleaned data are then exported and aggregated up to the level of plaintiff, plaintiff/year, and plaintiff/month, which are the levels of summarization available for viewing in the app.

Code for the data cleaning and standardization process is in clean.R; code for aggregating cleaned data is in summarize.R; code for the app is in the va-evictors-catalog directory (see app.R).

The case records comprising the data reflected in the app are public; however, to protect defendants from being named against their will or wishes, we do not currently include the raw case data in this repository.


Full contributor acknowledgments are on the app's About the Project page.

  • Code: Jacob Goldstein-Greenwood,σ Michele Claibourn, and Elizabeth Mitchell
  • Subject-matter expertise and conceptual guidance: Kate Howell, Ben Teresa, Barbara Brown Wilson, Michele Claibourn, Hannah Woehrle, and Michael Salguiero◍ (formerly)
  • Additional project support, coordination, and communications assistance: Hannah Woehrle, Connor White, Michael Salguerio,◍ (formerly) and Atticus Johnson

◍ - The Equity Center at UVA
σ - UVA Library Research Data Services/StatLab
◬ - RVA Eviction Lab at VCU

About

Cleaning and analyzing Virginia eviction data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 99.7%
  • Other 0.3%