Skip to content

Track wire roll serial number extraction source in wire_set_certs table #259

@rhythmatician

Description

@rhythmatician

When parsing certificate documents, the wire roll serial number (wire_roll_cert_number) can come from different sources depending on the workbook format:

  • Old-format workbooks: CERT, Wire Roll sheet, cell B6
  • New-format workbooks: Order Info sheet, cell B3

Currently we log which source was used, but this information isn't persisted. Adding a wire_source column (or similar) to the wire_set_certs table would let us:

  1. Audit/debug — quickly determine where a wire roll SN came from without re-parsing the document
  2. Detect format drift — identify if workbook formats are changing over time
  3. Extend to other fields — the same pattern could track the source of traceability_number (cell B11 vs. content-disposition filename) and service_date (Excel H15 vs. PDF OCR vs. Qualer service record)

Current code reference: The TODO is on parse.py, where wire_source is set to a descriptive string but only used in log messages.

Suggested implementation:

  • Add a nullable Text column (e.g. wire_roll_source) to wire_set_certs
  • Populate it with the source string already being built in parse_certificate_data_excel (e.g. "sheet='CERT, Wire Roll', cell=B6")
  • Consider whether to also track sources for traceability_number and service_date in the same migration

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions