Skip to content

Pydantic schema for control and validation of OncoLlama v3 outputs

Notifications You must be signed in to change notification settings

JTpath/Pathollamaschemav3

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PathoLlama v3 Schema

This repo defines a Pydantic schema and prompt for extracting structured data from free text histopathology reports (macroscopy, microscopy, diagnosis, addenda) into a normalized JSON object suitable for indexing and cohort queries.

What is included

  1. Pydantic schema for histopathology report extraction
  2. Simple validation helpers
  3. Default LLM prompt template

Core capabilities

  • Multi-specimen reports (parts A/B/C)
  • Multiple tumours per specimen
  • Multiple IHC results per specimen or addendum
  • Addenda, amendments, and corrections
  • Normalized enums for key fields (site category, procedure, diagnosis category, margins, invasion)

Primary model

The root model is HistopathologyReportModel in oncollamaschemav3/oncollamaschemav3.py.

Minimal example

Input:

Clinical information
Pigmented lesion involving the oral mucosa.

Macroscopic description
Biopsy from the left lower gingiva comprising a cream-to-tan soft tissue fragment measuring 4 x 3 x 2 mm.

Microscopic description
Oral squamous mucosa is present with coarse pigmented material in the subepithelial connective tissue. No dysplasia or malignancy.

Conclusion
Oral mucosal biopsy: Features consistent with an amalgam tattoo.

Output:

{
  "document_is_histopathology_report_flag": true,
  "report_contains_pathology_diagnosis_flag": true,
  "report_metadata": null,
  "sections": {
    "clinical_information_desc": "Pigmented lesion involving the oral mucosa.",
    "macroscopic_description_desc": "Biopsy from the left lower gingiva comprising a cream-to-tan soft tissue fragment measuring 4 x 3 x 2 mm.",
    "microscopic_description_desc": "Oral squamous mucosa is present with coarse pigmented material in the subepithelial connective tissue. No dysplasia or malignancy.",
    "formatted_conclusion_desc": "Oral mucosal biopsy: Features consistent with an amalgam tattoo."
  },
  "specimens": [
    {
      "part_id": null,
      "specimen_label_desc": "left lower gingiva",
      "site_category": "oral_cavity",
      "site_desc": "left lower gingiva",
      "laterality": "left",
      "procedure_type": "biopsy",
      "procedure_desc": "biopsy",
      "specimen_size_desc": "4 x 3 x 2 mm",
      "macroscopic_desc": "Biopsy from the left lower gingiva comprising a cream-to-tan soft tissue fragment measuring 4 x 3 x 2 mm.",
      "microscopic_desc": "Oral squamous mucosa is present with coarse pigmented material in the subepithelial connective tissue. No dysplasia or malignancy.",
      "diagnoses": [
        {
          "diagnosis_name_desc": "Features consistent with an amalgam tattoo",
          "diagnosis_category": "non_neoplastic",
          "diagnosis_desc": "Features consistent with an amalgam tattoo.",
          "is_primary_diagnosis": true
        }
      ],
      "tumours": null,
      "ihc_results": null,
      "lymph_node_findings": null
    }
  ],
  "overall_diagnoses": null,
  "addenda": null
}

Validation

Use oncollamaschemav3/validate.py to validate JSON against the schema.

Prompt

The default prompt template is in oncollamaschemav3/prompts/infer_prompt.txt. Use create_system_prompt() to inject the schema into the prompt.

About

Pydantic schema for control and validation of OncoLlama v3 outputs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%