Deep Learning Based Heart Sound Classification using Multimodal Features
Abstract Cardiovascular diseases remain one of the leading causes of death worldwide. Manual auscultation requires years of clinical expertise and is prone to subjectivity and noise sensitivity.
This research proposes a robust deep learning-based multimodal framework for heart sound classification using MFCC and Spectrogram features. The system integrates CNN for spatial feature extraction and LSTM for temporal modeling, followed by feature fusion for enhanced discriminative performance.
The framework is benchmarked on:
- PhysioNet Challenge 2016 (Heart Sound Classification)
- PhysioNet Challenge 2022 (Heart Murmur Detection)
Methodology
- Resampling to 22,050 Hz
- Fixed duration truncation/padding
- Log-mel spectrogram conversion
- MFCC extraction (40 coefficients)
- 2D Mel-Spectrogram → CNN branch
- MFCC sequences → LSTM branch
CNN Output + LSTM Output → Concatenation → Fully Connected Layers → Softmax
| Dataset | Accuracy |
|---|---|
| PhysioNet 2016 | 92.31% |
| PhysioNet 2022 | 82.35% |
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
PhysioNet 2016: https://archive.physionet.org/challenge/2016/
PhysioNet 2022: https://moody-challenge.physionet.org/2022/
Place inside:
data/physionet2016/ data/physionet2022/
pip install -r requirements.txt
python train.py --dataset physionet2016
python train.py --dataset physionet2016 --eval_only
Muhammad Naveed Shahzad
AI Researcher | Deep Learning Engineer