Linear Regression Diagnostics Toolkit

This repository implements a modular Python pipeline for analyzing and validating linear regression assumptions using the Body Measurements Dataset. It focuses on statistical rigor through assumption checks, not just predictive performance.

🧠 Overview

The codebase provides tools to test and visualize the four critical assumptions of linear regression:

Linearity
Independence of errors
Homoscedasticity (constant variance)
Normality of residuals

All diagnostics are handled with reusable Python modules, no Jupyter notebooks involved.

📂 File Structure

.
├── main.py                    # Entry point for model fitting and diagnostics
├── utils.py                   # Helper functions (e.g., data cleaning, visualization)
├── check_linearity.py         # Linearity test via partial regression plots
├── check_normality.py         # Normality check (Q-Q plot, Shapiro-Wilk test)
├── check_homoscedasticity.py  # Breusch-Pagan & White tests
├── check_independence.py      # Durbin-Watson test and residual autocorrelation
└── data/                      # (Expected) Contains the CSV dataset

📊 Dataset

Source: Kaggle - Body Measurements Dataset
Content: Anthropometric data such as age, height, weight, and body part circumferences
Target Variable: Varies—commonly height, weight, or body fat percentage

🚀 How to Run

Make sure you have uv installed:

uv pip install -r pyproject.toml

Then run the diagnostics:

uv run main.py

✨ The pipeline will fit a linear regression model and sequentially check each assumption, printing results and plotting visuals using matplotlib and seaborn.

🧪 Diagnostic Modules

Each assumption test is cleanly separated into its own script:

✅ `check_linearity.py`

Uses residual plots and added variable plots
Highlights multicollinearity issues via VIFs

✅ `check_independence.py`

Calculates Durbin-Watson statistic
Plots residuals against time/index order

✅ `check_homoscedasticity.py`

Breusch-Pagan and White tests
Residuals vs fitted plot with confidence bands

✅ `check_normality.py`

Shapiro-Wilk test
Histogram and Q-Q plots

📦 Dependencies

All dependencies are defined in pyproject.toml and managed with uv.

Main packages used:

pandas, numpy
scikit-learn
statsmodels
seaborn, matplotlib
scipy

🎯 Key Highlights

🧱 Modular design for each assumption
🧪 Automated statistical testing + visual plots
🧰 Ready for integration into larger ML workflows
🔍 Emphasizes statistical validation before prediction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Linear Regression Diagnostics Toolkit

🧠 Overview

📂 File Structure

📊 Dataset

🚀 How to Run

🧪 Diagnostic Modules

✅ `check_linearity.py`

✅ `check_independence.py`

✅ `check_homoscedasticity.py`

✅ `check_normality.py`

📦 Dependencies

🎯 Key Highlights

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
check_homoscedasticity.py		check_homoscedasticity.py
check_independence.py		check_independence.py
check_linearity.py		check_linearity.py
check_normality.py		check_normality.py
main.py		main.py
pyproject.toml		pyproject.toml
utils.py		utils.py
uv.lock		uv.lock

hasancatalgol/linear-regression

Folders and files

Latest commit

History

Repository files navigation

Linear Regression Diagnostics Toolkit

🧠 Overview

📂 File Structure

📊 Dataset

🚀 How to Run

🧪 Diagnostic Modules

✅ check_linearity.py

✅ check_independence.py

✅ check_homoscedasticity.py

✅ check_normality.py

📦 Dependencies

🎯 Key Highlights

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

✅ `check_linearity.py`

✅ `check_independence.py`

✅ `check_homoscedasticity.py`

✅ `check_normality.py`

Packages