This project aims to classify respiratory sounds using deep learning models, focusing on detecting wheezes and crackles from raw audio recordings. The system is built using PyTorch and Hugging Face Transformers and incorporates Weights & Biases (W&B) for tracking experiments.
ICBHI_2017/
├── checkpoints/ # Model checkpoints
├── data/ # Data directory
│ ├── test/ # Test dataset
├── logs/ # Training logs
├── src/ # Source code
│ ├── __init__.py
│ ├── data.py # Data processing functions
│ ├── inference.py # Inference pipeline
│ ├── logger.py # Logging configuration
│ ├── models.py # Model definitions and loading
│ ├── train.py # Training pipeline
├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt # Required packages
- Operating System: Linux, macOS, or Windows
- Python Version: 3.10
- Conda (Miniconda or Anaconda)
git clone https://github.com/landeros10/ICBHI_2017
cd ICBHI_2017
conda create -n resp_sounds python=3.10 -y
conda activate resp_sounds
pip install -r requirements.txt
To perform inference on test data, you must provide either the path to a single WAV file or a directory containing multiple WAV files. The script automatically determines the base directory data_dir
. Additionally, the script looks for a labels.json
file in the same directory to evaluate model predictions against ground truth labels if the --evaluate_metrics
tag is used. Predictions and evaluation metrics are saved in data_dir
as predictions.json or inference_metrics.json
, respectively.
python src/inference.py audio_path [--model_path MODEL_PATH] [--evaluate_metrics] [--generate_vis]
python src/inference.py ./data_dir/sample.wav --model_path ./checkpoints/inference_model.pth --evaluate_metrics
python src/inference.py ./data_dir/ --generate_vis
audio_path
: The path to the input data. This can be a single WAV file (e.g.,./data_dir/sample.wav
) or a directory containing multiple WAV files (e.g.,./data_dir/
).--evaluate_metrics
: Optional flag to evaluate the model's predictions against ground truth labels. Requires alabels.json
file in thedata_dir
.--model_path
: Path to the trained model used for inference. Defaults to./checkpoints/inference_model.pth
. ---generate_vis
: Optional flag to visualize attention maps during inference. Visualizations are saved in thedata_dir
on a per-file basis.
After inference, per-file predictions are stored in predictions.json
, inference metrics under inference_metrics.json
and visualizations as [audio_file_name].png
. All are saved in the same directory as the test data, data_dir
.
Both labels.json
and predictions.json
share the same structure. Each key corresponds to the test audio file path, and the value is a dictionary indicating the presence or absence of crackles and wheezes as binary integer values.
{
"./data/test/sample1.wav": {
"crackles": 0,
"wheezes": 0
},
"./data/test/sample2.wav": {
"crackles": 1,
"wheezes": 0
},
}
If the --generate_vis
flag is used during inference, the script generates interpretability visualizations for the test audio files. Below you can see:
- The raw audio waveform.
- Regions of importance and attention identified by the model.
- Segmented respiratory cycles labeled for crackles, wheezes, or both.