🩺 Diabetes Granada Analysis

📖 Overview

This project analyzes the T1DiabetesGranada dataset — a collection of medical and biochemical data from patients with Type 1 Diabetes Mellitus (T1DM).
The dataset includes continuous glucose measurements, biochemical test results, patient demographics, and diagnostic codes for complications.

The goal of the study is to understand the relationships between glucose control, biochemical parameters, and diabetes-related complications, using both statistical and machine learning techniques.

🎯 Objectives

Explore and assess the quality and structure of the data.
Analyze the relationship between glucose levels and diabetic complications.
Study correlations between biochemical parameters and glycemic control.
Apply clustering (K-Means, Gaussian Mixture Models) to identify patient subgroups.
Implement classification models (Random Forest) to predict the presence of complications.
Validate findings through statistical tests and cross-validation techniques.

🧠 Methodology

1. Data Exploration

Four main datasets were analyzed:

Patient_info.csv – demographic and summary information about each patient
Glucose_measurements.csv – continuous glucose monitoring data (every 15 minutes)
Biochemical_parameters.csv – laboratory test results for 17 biochemical parameters
Diagnostics.csv – ICD-9-CM diagnostic codes describing patient complications

2. Statistical Analysis

Computation of TIR (Time In Range), TAR (Time Above Range), and TBR (Time Below Range) indicators.
Statistical comparison between patients with and without complications using the Mann–Whitney U test.
Correlation analysis between glucose levels and biochemical parameters.
Visualization of distributions by gender and age group.

3. Clustering

Construction of a derived dataset for unsupervised learning.
Application of K-Means and Gaussian Mixture Models (GMM) to group patients based on glycemic and biochemical profiles.
Evaluation of clustering results using ANOVA and visualization of cluster-specific complication patterns.

4. Classification

Implementation of Random Forest classifiers to predict the presence or type of complications.
Experiments with balanced and imbalanced datasets.
Validation through K-Fold Cross Validation and TrainFold/TestFixed setups.
Analysis of feature importance and feature selection (RFE).

📊 Key Findings

Among glycemic indicators, TBR (Time Below Range) showed a statistically significant difference between patients with and without complications.
Hypoglycemia (low glucose levels) appears more strongly associated with the presence of complications.
Hemoglobin A1c (HbA1c) correlates with average glucose levels, confirming known medical relationships.
Clustering and classification models highlighted consistent patterns across data-driven and supervised analyses.

🧩 Technologies Used

Python
pandas
numpy
matplotlib
seaborn
scikit-learn
Jupyter Notebook

👥 Authors

Matteo Avella
Gabriele Gaudiosi
University of Salerno – Department of Computer Science
Course: Foundations of Computer Vision and Biometrics
Academic Year: 2024/2025

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.idea		.idea
Excel		Excel
.gitignore		.gitignore
Analisi.ipynb		Analisi.ipynb
AnalisiStatistica.ipynb		AnalisiStatistica.ipynb
Clustering.ipynb		Clustering.ipynb
ClusteringNonUsati.py		ClusteringNonUsati.py
Dataset_complicanze.ipynb		Dataset_complicanze.ipynb
NuoviGrafici.ipynb		NuoviGrafici.ipynb
Parametri_Biochimici.ipynb		Parametri_Biochimici.ipynb
Parte1_classificato.csv		Parte1_classificato.csv
Parte1_classificato_bilanciato.csv		Parte1_classificato_bilanciato.csv
README.md		README.md
Random Forest.ipynb		Random Forest.ipynb
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🩺 Diabetes Granada Analysis

📖 Overview

🎯 Objectives

🧠 Methodology

1. Data Exploration

2. Statistical Analysis

3. Clustering

4. Classification

📊 Key Findings

🧩 Technologies Used

👥 Authors

📌 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

MatteoAv/T1DiabetesGranadaDataset-Analysis

Folders and files

Latest commit

History

Repository files navigation

🩺 Diabetes Granada Analysis

📖 Overview

🎯 Objectives

🧠 Methodology

1. Data Exploration

2. Statistical Analysis

3. Clustering

4. Classification

📊 Key Findings

🧩 Technologies Used

👥 Authors

📌 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages