Skip to content

[CIEL Bridge] ICD11 E2E on pre-publication dataset

Open
Due by April 2, 2026
Last updated Feb 26, 2026
0% complete

Goal: Run the full ICD-11 E2E evaluation workflow on a prepared validation/sample dataset and produce a repeatable evaluation of re-ranked scores and AI Assistant outputs. This is a milestone toward the May WHO/HL7 ICD-11 connectathon preparation. This showcase will be done before Sunny's post-vacation changes can be made in Mapper.

Demo Driver: Filipe / Joe

Steps (proposed):

  1. Load the prepared ICD-11 validation dataset (Tracker 55) — a curated set of CIEL concepts with known ICD-11 ground-truth mappings

  2. Run the full multi-algorithm matching pipeline:

    • OCL CIEL Bridge
    • WHO ICD-11 automatch (Tracker 51)
    • Filipe's LLM-as-terminologist (Tracker 52)
    • Re-ranker: BAAI/bge-reranker-v2-m3
    • OCL AI Assistant (Claude, with ICD-11 tailored prompts from Tracker 4)
  3. Export the candidate results and re-ranked scores

  4. Compare algorithm outputs against the ground-truth mappings:

    • Which algorithm produced the correct top-1 result?
    • Where do algorithms agree vs. disagree?
    • How does the AI Assistant recommendation compare to automatch?
  5. Document evaluation metrics (e.g. top-1 accuracy, top-3 accuracy per algorithm)

  6. Identify failure cases and document for iterative improvement

  7. Share evaluation summary with team

List view