Skip to content

Project focused on exploring and modeling the T1DiabetesGranada dataset, which contains clinical, biochemical, and continuous glucose monitoring (CGM) data from patients with Type 1 Diabetes.

Notifications You must be signed in to change notification settings

MatteoAv/T1DiabetesGranadaDataset-Analysis

Repository files navigation

🩺 Diabetes Granada Analysis

📖 Overview

This project analyzes the T1DiabetesGranada dataset — a collection of medical and biochemical data from patients with Type 1 Diabetes Mellitus (T1DM).
The dataset includes continuous glucose measurements, biochemical test results, patient demographics, and diagnostic codes for complications.

The goal of the study is to understand the relationships between glucose control, biochemical parameters, and diabetes-related complications, using both statistical and machine learning techniques.


🎯 Objectives

  • Explore and assess the quality and structure of the data.
  • Analyze the relationship between glucose levels and diabetic complications.
  • Study correlations between biochemical parameters and glycemic control.
  • Apply clustering (K-Means, Gaussian Mixture Models) to identify patient subgroups.
  • Implement classification models (Random Forest) to predict the presence of complications.
  • Validate findings through statistical tests and cross-validation techniques.

🧠 Methodology

1. Data Exploration

Four main datasets were analyzed:

  • Patient_info.csv – demographic and summary information about each patient
  • Glucose_measurements.csv – continuous glucose monitoring data (every 15 minutes)
  • Biochemical_parameters.csv – laboratory test results for 17 biochemical parameters
  • Diagnostics.csv – ICD-9-CM diagnostic codes describing patient complications

2. Statistical Analysis

  • Computation of TIR (Time In Range), TAR (Time Above Range), and TBR (Time Below Range) indicators.
  • Statistical comparison between patients with and without complications using the Mann–Whitney U test.
  • Correlation analysis between glucose levels and biochemical parameters.
  • Visualization of distributions by gender and age group.

3. Clustering

  • Construction of a derived dataset for unsupervised learning.
  • Application of K-Means and Gaussian Mixture Models (GMM) to group patients based on glycemic and biochemical profiles.
  • Evaluation of clustering results using ANOVA and visualization of cluster-specific complication patterns.

4. Classification

  • Implementation of Random Forest classifiers to predict the presence or type of complications.
  • Experiments with balanced and imbalanced datasets.
  • Validation through K-Fold Cross Validation and TrainFold/TestFixed setups.
  • Analysis of feature importance and feature selection (RFE).

📊 Key Findings

  • Among glycemic indicators, TBR (Time Below Range) showed a statistically significant difference between patients with and without complications.
  • Hypoglycemia (low glucose levels) appears more strongly associated with the presence of complications.
  • Hemoglobin A1c (HbA1c) correlates with average glucose levels, confirming known medical relationships.
  • Clustering and classification models highlighted consistent patterns across data-driven and supervised analyses.

🧩 Technologies Used

  • Python
  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn
  • Jupyter Notebook

👥 Authors

Matteo Avella
Gabriele Gaudiosi
University of Salerno – Department of Computer Science
Course: Foundations of Computer Vision and Biometrics
Academic Year: 2024/2025


📌 License

This project is intended for educational and research purposes only.
All rights reserved © 2025 Matteo Avella & Gabriele Gaudiosi.

About

Project focused on exploring and modeling the T1DiabetesGranada dataset, which contains clinical, biochemical, and continuous glucose monitoring (CGM) data from patients with Type 1 Diabetes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •