A machine learning project using Linear Regression to predict health insurance expenses based on personal and lifestyle data. Built using TensorFlow 2.x and trained on real-world data from insurance.csv
.
The dataset contains the following features:
age
β Age of primary beneficiarysex
β Gender (male
,female
)bmi
β Body mass indexchildren
β Number of dependentssmoker
β Whether the person smokes (yes
,no
)region
β Residential area in the US (northeast
,northwest
, etc.)expenses
β Medical costs billed by health insurance
- One-hot encoding applied to:
sex
,smoker
, andregion
(withdrop_first=True
to avoid dummy variable trap)
expenses
column popped as target variable- Train-test split:
80%
training /20%
testing StandardScaler
used to normalize feature columns
Built using TensorFlow Keras Sequential API:
Dense(256)
β ReLUDropout(0.1)
Dense(128)
β ReLUDropout(0.1)
Dense(64)
β ReLUDense(1)
β Output layer (regression)
Compiled with:
- Loss:
Mean Squared Error (MSE)
- Optimizer:
Adam
- Metrics:
Mean Absolute Error (MAE)
EarlyStopping used to prevent overfitting.
- Evaluated on unseen test set
- Achieved MAE < 3500, passing the freeCodeCamp challenge β
Example output:
- Load the notebook in Google Colab
- Run all cells (training will auto-start)
- Final cell evaluates the model and displays predictions vs true values on a scatter plot
Train a regression model that can predict healthcare costs within a $3500 error margin on new, unseen data. Mission accomplished.