A comprehensive Natural Language Processing project focused on clinical dialogue analysis, featuring medical text classification and summarization using traditional machine learning, deep learning, and transformer-based approaches.
This project explores multiple NLP techniques to process clinical dialogues between doctors and patients, converting conversational text into structured medical notes. The work is based on the MEDIQA-Chat and MEDIQA-Sum 2023 datasets and implements various classification and summarization models.
- Medical Dialogue Classification: Categorizing clinical conversations into predefined section headers (e.g., GENHX, MEDICATIONS, CC, PASTMEDICALHX)
- Clinical Conversation Summarization: Generating concise medical summaries from doctor-patient dialogues
The project uses the MTS-Dialog dataset from MEDIQA 2023:
- Training Set:
MTS-Dialog-TrainingSet.csv - Validation Set:
MTS-Dialog-ValidationSet.csv - Test Sets:
MTS-Dialog-TestSet-1-MEDIQA-Chat-2023.csvMTS-Dialog-TestSet-2-MEDIQA-Sum-2023.csv
Each dataset contains clinical dialogues with corresponding section headers and structured section texts.
.
βββ dataset/ # MEDIQA-Chat and MEDIQA-Sum 2023 datasets
βββ embedding_projector/ # Embedding visualizations
β βββ clinical_bert_embeddings_tsv.tsv
β βββ elmo_embeddings_tsv.tsv
β βββ projector_config.pbtxt
βββ processed/ # Processed models and artifacts
β βββ custom_clinical_word2vec.model
βββ second_delivery/ # Initial embeddings and preprocessing
β βββ second_delivery.ipynb
βββ third_delivery/ # Classical ML and custom DL approaches
β βββ classification/
β β βββ classif_shallow_ml.ipynb # TF-IDF + classical ML classifiers
β β βββ classif_cnn.ipynb # CNN with Clinical-BERT
β βββ summarisation/
β βββ sum_shallow_ml.ipynb # TF-IDF + TextRank summarization
β βββ sum_cnn.ipynb # LSTM encoder-decoder
βββ fourth_delivery/ # Transformer-based approaches
βββ cnn_classification/
β βββ BI-LSTM.ipynb # Bidirectional LSTM implementation
βββ transformers_classification/
β βββ transformers_classification.ipynb # DistilBERT classification
βββ transformers_sumarisation/
β βββ transformers_summarisation.ipynb # T5-small summarization
βββ raspberry_llm_lab/
βββ distilGPT2.ipynb # Distil GPT 2 model experiments
- Python 3.8+
- Virtual environment recommended
- CUDA-capable GPU (6-12 GB VRAM recommended for transformer models)
- Clone the repository:
git clone <repository-url>
cd Clinic-Note-NLP- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Linux/Mac
# or
.venv\Scripts\activate # On Windows- Install dependencies:
pip install -r requirements.txt- Exploration of contextual and non-contextual embeddings
- Custom Clinical Word2Vec model training
- Analysis of vocabulary size and tokenization characteristics
- Embedding visualization using TensorBoard Projector
- Shallow ML Approach: TF-IDF and Count Vectorizer with classical ML classifiers (Naive Bayes, SVM, Random Forest, etc.)
- Deep Learning Approach: Fine-tuned Clinical-BERT classification head with CNN architecture
- Shallow ML Approach: TF-IDF and TextRank algorithms
- Deep Learning Approach: LSTM encoder-decoder with frozen and fine-tuned embeddings
- Model: DistilBERT (distilbert-base-uncased)
- Techniques: Full fine-tuning vs. partial fine-tuning (frozen weights)
- Evaluation: Accuracy, Precision, Recall, F1 Score, Confusion Matrix
- Model: T5-small (text-to-text)
- Evaluation: ROUGE scores, BERTScore
- Baseline Comparison: Against TF-IDF, TextRank, LSTM, and Clinical-BERT
- Bidirectional LSTM architectures
- Distil GPT2 model exploration
- Accuracy
- Precision, Recall, F1 Score
- Confusion Matrix
- Inference testing on validation sets
- ROUGE (ROUGE-1, ROUGE-2, ROUGE-L)
- BERTScore
- Qualitative analysis of generated summaries
- Deep Learning: PyTorch, TensorFlow
- Transformers: Hugging Face Transformers, Accelerate
- NLP: NLTK, spaCy
- Machine Learning: scikit-learn
- Data Processing: pandas, numpy
- Evaluation: datasets, evaluate
- Clinical-BERT
- DistilBERT
- T5-small
- ELMo
- distil-GPT2
- Custom Clinical Word2Vec
- Start Jupyter:
jupyter notebook-
Navigate to the desired delivery folder and open the notebook of interest
-
Run cells sequentially to reproduce experiments
- Embeddings: Custom clinical embeddings capture domain-specific semantics better than general-purpose embeddings
- Classification: Transformer models (DistilBERT) outperform classical ML, with proper fine-tuning strategies being crucial
- Summarization: T5-small provides competitive results while remaining computationally feasible on limited hardware
- Trade-offs: Balance between model size, performance, and computational resources is critical for clinical NLP applications
The project is designed to work with limited GPU resources (6-12 GB VRAM). Strategies employed:
- Use of smaller transformer variants (DistilBERT, T5-small)
- Gradient accumulation for larger effective batch sizes
- Mixed precision training (FP16)
- Memory cleanup between training sessions
This is an academic project. For questions or suggestions, please open an issue.
See LICENSE file for details.
- MEDIQA 2023 Challenge organizers for providing the datasets
- Hugging Face for their transformers library and model hub
- Clinical-BERT, T5 and distil-GPT2 authors for domain-specific pre-trained models
For more specific information about this project, please refer to the individual delivery README files:
Note: Due to size constraints, trained model weights are not included in this repository. The notebooks contain code to reproduce all models from scratch.