GitHub

Goal: Run the full ICD-11 E2E evaluation workflow on a prepared validation/sample dataset and produce a repeatable evaluation of re-ranked scores and AI Assistant outputs. This is a milestone toward the May WHO/HL7 ICD-11 connectathon preparation. This showcase will be done before Sunny's post-vacation changes can be made in Mapper.

Demo Driver: Filipe / Joe

Steps (proposed):

Load the prepared ICD-11 validation dataset (Tracker 55) — a curated set of CIEL concepts with known ICD-11 ground-truth mappings
Run the full multi-algorithm matching pipeline:
- OCL CIEL Bridge
- WHO ICD-11 automatch (Tracker 51)
- Filipe's LLM-as-terminologist (Tracker 52)
- Re-ranker: BAAI/bge-reranker-v2-m3
- OCL AI Assistant (Claude, with ICD-11 tailored prompts from Tracker 4)
Export the candidate results and re-ranked scores
Compare algorithm outputs against the ground-truth mappings:
- Which algorithm produced the correct top-1 result?
- Where do algorithms agree vs. disagree?
- How does the AI Assistant recommendation compare to automatch?
Document evaluation metrics (e.g. top-1 accuracy, top-3 accuracy per algorithm)
Identify failure cases and document for iterative improvement
Share evaluation summary with team

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CIEL Bridge] ICD11 E2E on pre-publication dataset

Pre-publication dataset for E2E Mapping + Evaluation Workflow

Add edit-before-save workflow for candidate mappings in OCL Mapper

Architecture decision: data model for storing user-generated post-coordinated concepts (ICD-11 / SNOMED)

[CIEL Bridge] ICD11 E2E on pre-publication dataset

List view

Pre-publication dataset for E2E Mapping + Evaluation Workflow

Add edit-before-save workflow for candidate mappings in OCL Mapper

Architecture decision: data model for storing user-generated post-coordinated concepts (ICD-11 / SNOMED)