Skip to content

OCL Mapper information architecture #2337

@paynejd

Description

@paynejd

Proposal:
Elevate Match Algorithms and Concepts (from the target repo) to 1st-class entities in OCL Mapper and separate $lookup from $match

Why?

  • Decoupling candidates from concepts means that an algorithm only needs to return a code/concept ID and match metadata (score, matched fragments, etc.)
  • Using $lookup (in addition to $match) means that a user may configure an authoritative concept lookup source, if it is not preferable or not possible to rely on a $match algorithm to provide full concept details
    • e.g. $lookup could point to a different repo in OCL, an external FHIR service
  • OCL Mapper will load a single canonical representation for each concept that is available in the candidate pool, even if the same code is returned by more than one algorithm
  • Algorithms that don't return full concept details (e.g. ocl-scispacy-loinc and external ICD-11 algorithms) will be linked to the canonical concept representation, so a user will still be able to view full concept details
  • Formal algo definitions will allow many benefits:
    • No hard-coded algo definitions in the code
    • Configurable and expandable set of algorithms for each mapper instance
    • Treat external match algos as first-class algos
    • Automate optimization of candidate retrieval by using algo attributes to manage batch size, parallelization, etc.

Current OCL Mapper Information Architecture

flowchart TD
    A[Input Dataset] --> B[Mapping Project]
    B --> C[Match Candidates]
    C --> B
    B --> D[Target Repository]
Loading

Planned OCL Mapper Information Architecture

flowchart LR
  A[Input Dataset] --> B[Mapping Project]
  E[Match Algorithms] --> B
  B --> C[Match Candidates]
  D[Target Repository] --> F[Concepts]
  C --> F
  B --> D
Loading

Requirements

Decouple Candidates from Concepts

  • Candidates = algo output; code and match metadata (score, matched fragments, etc.) are required; additional attributes are optional – can be minimalist, or fully enriched
  • Concepts = single source of truth for the definition of a concept that is shared across algorithms
    • Retrieved via a dedicated $lookup operation, not an algorithm response
    • As an optional optimization, $match algorithms may return a full concept definition, but that is not on the critical path
  • The decoupled approach means that ocl-scispacy-loinc now points to a fully specified concept
    • Mapper can show a unified view of a concept in the candidates tab (e.g. in Match Quality view), where there is only a single row that was returned by more than one algo
  • Users are mapping to Concepts, not Candidates -- meaning Candidates and Match Metadata are linked to the Mapped Concept, but are not directly part of it
  • Re-ranking will be applied to the Concept Pool (both bridge and target concepts) not to the Candidate Pool
  • Updated Retrieval workflow:
    • Create Candidate Pool across all algorithms -->
    • Generate Concept Pool consisting of both bridge and target codes (codes need to maintain bridge/target relationships) -->
    • Populate the Concept Pool with full details
    • Re-rank the Concept Pool to get Unified scores for all bridge and target codes

Separate $lookup from $match

  • Enables a canonical concept representation without expecting that an algorithm provides this info
  • Required when decoupling candidates from concepts
  • Mapper should be smart enough to configure $lookup on its own when it can (e.g. user selects ocl-semantic or ocl-search algos)
  • User should have the option to configure $lookup manually when they want to
  • $match still able to return a full concept definition, but no longer default behavior

Formalize Match Algorithms to be a first-class trackable entity

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

In progress

Relationships

None yet

Development

No branches or pull requests

Issue actions