This repository contains the data and source code for the EACL 2023 paper: An Empirical Study of Clinical Note Generation from Doctor-Patient Encounters
- An Empirical Study of Clinical Note Generation from Doctor-Patient Encounters.
- Asma Ben Abacha, Wen-wai Yim, Yadan Fan and Thomas Lin.
- EACL, May 3-5, 2023, Dubrovnik, Croatia.
@inproceedings{mts-dialog,
title = {An Empirical Study of Clinical Note Generation from Doctor-Patient Encounters},
author = "Ben Abacha, Asma and
Yim, Wen-wai and
Fan, Yadan and
Lin, Thomas",
booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.eacl-main.168",
pages = "2291--2302"
}
-
The training set consists of 1,201 pairs of conversations and associated summaries.
-
The validation set consists of 100 pairs of conversations and their summaries.
-
MTS-Dialog includes 2 test sets; each test set consists of 200 conversations and associated section headers and contents:
-
MTS-Dialog-TestSet-1-MEDIQA-Chat-2023.csv: Official test set used in the MEDIQA-Chat 2023 challenge (Task A)
-
MTS-Dialog-TestSet-2-MEDIQA-Sum-2023.csv: Official test set used in the MEDIQA-Sum 2023 challenge (Task A & Task B)
-
The full list of normalized section headers:
1. fam/sochx [FAMILY HISTORY/SOCIAL HISTORY]
2. genhx [HISTORY of PRESENT ILLNESS]
3. pastmedicalhx [PAST MEDICAL HISTORY]
4. cc [CHIEF COMPLAINT]
5. pastsurgical [PAST SURGICAL HISTORY]
6. allergy
7. ros [REVIEW OF SYSTEMS]
8. medications
9. assessment
10. exam
11. diagnosis
12. disposition
13. plan
14. edcourse [EMERGENCY DEPARTMENT COURSE]
15. immunizations
16. imaging
17. gynhx [GYNECOLOGIC HISTORY]
18. procedures
19. other_history
20. labs
We provide the full augmented training set that we used in the experiments, as well as the separate datasets created using the French and Spanish translation models.
The source code for the summarization of doctor-patient conversations and the automatic generation of clinical notes.-
Manual fact-based scores for the evaluation of 400 automatic summaries generated using four summarization models from the validation set of 100 conversations and notes.
-
The Factual P/R/F1 Scores, Hallucination and Omission Rates, and Levenshtein Edit Distance are computed based on the fact-based manual counts and correction.
-
We used the manual scores to evaluate the performance of several evaluation metrics (e.g., ROUGE, BERTScore, and BLEURT) by computing the Pearson's correlation coefficients between the automatic and manual scores, as described in the paper (cf. Section 5.2 and Section 5.3).
-
We provide all the data needed to perform this correlation study on other evaluation metrics.
-
MEDIQA-Chat 2023: https://github.com/abachaa/MEDIQA-Chat-2023
-
MEDIQA-Sum 2023: https://github.com/ImageCLEF/2023_ImageCLEFmed_Mediqa
- This work is published under a Creative Commons Attribution 4.0 International Licence (CC BY). https://creativecommons.org/licenses/by/4.0/
- Asma Ben abacha (abenabacha at microsoft dot com)
- Wen-wai Yim (yimwenwai at microsoft dot com)