MedTranslate

Overview

This is a study focused on simplifying medical texts to improve health literacy, especially in under-resourced regions. The study uses the MedEasi corpus and the ctrlSIM model, employing a T5-Large model for simplification. To address computational limitations in third-world healthcare settings, a novel knowledge distillation approach is used, with a T5-Small model as a student model emulating the T5-Large teacher model. The performance of the student model is evaluated using various metrics, including SARI, ROUGE scores, and readability tests. While conventional metrics show satisfactory results, human evaluations reveal that the student model sometimes fails to simplify complex medical jargon. The research suggests the need for more user-centered evaluation methods and discusses future directions for improving text simplification and evaluation frameworks.

Report paper.

Training the Teacher Model

cd CTRL-SIMP
python training.py

The teacher model utilizes Basu et al. dataset and model. We finetune T5-large with multi-angle approach. We have modified the source code to run our settings to finetune T5-large.

Knowledge Distillation for the Student Model

Using a novel Knowledge Distillation function, inspired from the Mini-LM paper, we use:

$$L_{ENC} = \frac{1}{A^{S}_{h}|x|}\Sigma_{i=1}^{A^{S}_{h}}\Sigma_{t=1}^{|x|}D_{KL}(A^{T}_{E,a,t} || A^{S}_{E,a,t})$$

$$L_{DEC} = \frac{1}{A^{S}_{h}|x|}\Sigma_{i=1}^{A^{S}_{h}}\Sigma_{t=1}^{|x|}D_{KL}(A^{T}_{D,a,t} || A^{S}_{D,a,t})$$

$$L_{total} = \mathcal{L}_{ENC} + \mathcal{L}_{DEC}$$

to minimize the total loss, $L_{total}$.

You can train the student model by

python kdMiniLM.py -tr

We made the evaluation through the python notebook at evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
CTRL-SIMP		CTRL-SIMP
data		data
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
CS_544_Final_Report.pdf		CS_544_Final_Report.pdf
README.md		README.md
dataloader.py		dataloader.py
kdMiniLm.py		kdMiniLm.py
main.py		main.py
metrics_notebook.ipynb		metrics_notebook.ipynb
student_dev_output_no_prompts.pkl		student_dev_output_no_prompts.pkl
student_test_output_no_prompts.pkl		student_test_output_no_prompts.pkl
teacher_dev_output.pkl		teacher_dev_output.pkl
teacher_test_output.pkl		teacher_test_output.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MedTranslate

Overview

Training the Teacher Model

Knowledge Distillation for the Student Model

About

Uh oh!

Releases

Packages

Languages

scottsus/MedTranslate

Folders and files

Latest commit

History

Repository files navigation

MedTranslate

Overview

Training the Teacher Model

Knowledge Distillation for the Student Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages