-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
User story
As a developer, I want to generate additional CIEL synonyms using an LLM-based prompt approach so we can evaluate whether this improves embedding-based semantic matching accuracy for CIEL Bridge (diagnosis-only) at low cost.
Use case
Generate synonyms for CIEL terms, vectorize them, and compare semantic matching accuracy with and without the generated synonyms.
Requirements
- Add a backlog item to prototype synonym generation for CIEL using the paper-style prompt approach (e.g., Qwen3-8B locally).
- Use the generated synonyms to build a more robust embedding dataset for semantic matching evaluation.
- Clarify how/where these synonyms (and any vectorized representation) would be stored/used, and whether OCL Mapper/CIEL Bridge has a place to consume merged synonym data for semantic search.
Notes
- Implement support in OCL Mapper (CIEL Bridge algorithm) to include these artificial synonyms from an external source in the same semantic search.
- Synonyms will be published as a CSV with fields: CIEL concept ID, synonym; two rows per concept (as done in the referenced article).
Acceptance criteria
- A prototype can generate synonyms for CIEL terms using the specified prompt approach.
- Generated synonyms can be vectorized and included in an embedding dataset for evaluation.
- OCL Mapper (CIEL Bridge) can include external artificial synonyms in the same semantic search (based on the published CSV).
- There is a documented decision on storage/consumption (where the data lives and how CIEL Bridge semantic search uses it).
Ref 1: https://openmrs.slack.com/archives/C0A7S4SDXKR/p1770204859509349?thread_ts=1770148279.476269&cid=C0A7S4SDXKR
Ref 2: https://academic.oup.com/jamia/advance-article/doi/10.1093/jamia/ocag004/8445947
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Requirements