This repository contains the foundational data pipelines and processing logic for leasehold datasets, with a primary focus on HM Land Registry (HMLR) leasehold data.
The work in this repository supports the creation of a clean, structured, and high-quality “golden record” of residential leasehold information, enabling scalable analysis and downstream services.
The repository covers:
- Filtering and preparation of residential leasehold data
- Parsing and normalisation of lease attributes (e.g. lease dates, terms, remaining years)
- Data quality improvement using deterministic rules and language models
- Batch ingestion and change-only update processing
- Confidence scoring and quality assurance flags
This repository supports Phase 2 (Pillar 1: Data Foundations) of the Lease project and is under active development.