🏗️ Concrete Data Analysis & Strength Prediction

A comprehensive machine learning repository dedicated to predicting the Compressive Strength of Concrete. This project goes beyond simple prediction by incorporating rigorous Hyperparameter Tuning, Model Explainability (XAI), and Uncertainty Quantification to ensure robust and reliable engineering applications.

📌 Project Overview

This repository implements a full data science pipeline for concrete strength analysis. It addresses the following key challenges:

Optimization: Finding the absolute best model configurations using state-of-the-art frameworks like Optuna.
Transparency: Using SHAP and LIME to unbox "black-box" models and understand feature impact.
Reliability: Estimating prediction intervals using Conformal Prediction and Probabilistic Regression to gauge model confidence.

📂 Repository Structure

The repository is organized into four main modules. Click on the headings to navigate to the files.

1. 🗂️ Data

Contains the raw dataset split into training and testing sets.

train.csv: Historical data used for model training.
test.csv: Unseen data used for final evaluation.

2. 🎛️ Hyperparameter Tuning

We employ two distinct approaches for optimization:

A. Standard Tuning Techniques

Hyperparameter_tuning.ipynb
- Implements Grid Search, Random Search, Bayesian Optimization, and Hyperband.
- Compares convergence speed and final model performance across these methods.

B. Advanced Optuna Integration

Optuna_1/: Tests various Optuna Samplers (TPE, CmaEs) and Pruners (Hyperband, Median).
Optuna_2/: Extended study focusing on maximizing the objective function for complex boosting models.
Optuna_autosampler/: Investigates Optuna's automatic sampling capabilities.
Optuna_PGBM/: Specialized tuning for Probabilistic Gradient Boosting Machines.

3. 🧠 Model Explainations

Model_explainations.ipynb
- SHAP: Generates summary plots and dependence plots to show global feature importance.
- LIME: Explains individual predictions to validate model behavior on specific concrete samples.

4. 📉 Uncertainity Analysis

A critical component for engineering safety.

Conformal Predictions
- Uses MAPIE and PUNCC to generate rigorous prediction intervals (e.g., 90% confidence) with guaranteed coverage properties.
Probabilistic Distribution (IBUG)
- Implements NGBoost and PGBM to output full probability distributions (mean and variance) for each prediction, rather than single point estimates.
Quantile Regression
- Trains models to predict specific quantiles (5th and 95th percentiles) directly, providing a non-parametric way to estimate uncertainty.

📊 Dataset Details

The dataset comprises physical and chemical properties of concrete. The target variable is Compressive Strength.

Feature	Description	Unit
C	Cement content	kg/m³
mp	Mineral Admixtures / Slag	kg/m³
FA	Fine Aggregate	kg/m³
CA	Coarse Aggregate	kg/m³
F	Fly Ash / Filler	kg/m³
W_P	Water-to-Powder Ratio	Ratio
Adm	Admixture (Superplasticizer)	kg/m³
str	Compressive Strength (Target)	MPa

🛠️ Workflow & Methodology

Phase 1: Hyperparameter Tuning

Before finalizing a model, we perform extensive tuning to minimize RMSE.

Selection: We select algorithms like XGBoost, LightGBM, and CatBoost.
Optimization:
- We start with Random Search to narrow the search space.
- We apply Bayesian Optimization and Optuna (TPE Sampler) to fine-tune learning rates, tree depths, and regularization parameters.
Selection: The configuration with the best Cross-Validation score is saved for training.

Phase 2: Model Explainability

To ensure the model learns physics-compliant rules (e.g., more cement usually equals higher strength):

We run SHAP analysis on the best performing model.
We verify that features like C (Cement) and Adm (Admixture) have positive SHAP values.
We use LIME to audit outliers where the model predicts unusually high or low strength.

Phase 3: Uncertainty Analysis

We acknowledge that no model is perfect.

Method A (Conformal): We generate a prediction interval [Lower, Upper]. If the interval is too wide, the model is uncertain about that specific concrete mix.
Method B (Probabilistic): We model the output as a Normal distribution $\mathcal{N}(\mu, \sigma)$. A high $\sigma$ indicates high uncertainty (aleatoric uncertainty).

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
Data		Data
Hyperparameter tuning using Optuna		Hyperparameter tuning using Optuna
Hyperparameter_Tuning		Hyperparameter_Tuning
Model_Explainations		Model_Explainations
Uncertainity_Analysis		Uncertainity_Analysis
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏗️ Concrete Data Analysis & Strength Prediction

📑 Table of Contents (Navigation)

📌 Project Overview

📂 Repository Structure

1. 🗂️ Data

2. 🎛️ Hyperparameter Tuning

A. Standard Tuning Techniques

B. Advanced Optuna Integration

3. 🧠 Model Explainations

4. 📉 Uncertainity Analysis

📊 Dataset Details

🛠️ Workflow & Methodology

Phase 1: Hyperparameter Tuning

Phase 2: Model Explainability

Phase 3: Uncertainty Analysis

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

DaneshSelwal/Concrete_data_analysis

Folders and files

Latest commit

History

Repository files navigation

🏗️ Concrete Data Analysis & Strength Prediction

📑 Table of Contents (Navigation)

📌 Project Overview

📂 Repository Structure

1. 🗂️ Data

2. 🎛️ Hyperparameter Tuning

A. Standard Tuning Techniques

B. Advanced Optuna Integration

3. 🧠 Model Explainations

4. 📉 Uncertainity Analysis

📊 Dataset Details

🛠️ Workflow & Methodology

Phase 1: Hyperparameter Tuning

Phase 2: Model Explainability

Phase 3: Uncertainty Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages