This project focuses on predicting the number of calories burned during workouts using various Machine Learning and Deep Learning models.
It involves both Regression (continuous calorie prediction) and Classification (categorizing calorie burn level) tasks.
Developed as part of the Fanshawe College AI & ML coursework, this project demonstrates practical model comparison, evaluation, and interpretability for real-world fitness analytics.
The main goal is to predict the calories burned based on biometric and activity features such as age, gender, height, weight, workout type, and heart rate.
Two complementary approaches were implemented:
- Regression: Predict actual calories burned (continuous value).
- Classification: Predict calorie burn category (High / Low).
- Source: Kaggle – Calories Burned Prediction Dataset
- Rows: 1,500+
- Features:
| Category | Features |
|---|---|
| Demographics | Age, Gender, Height, Weight, BMI |
| Workout Stats | Max_BPM, Avg_BPM, Resting_BPM, Session_Duration |
| Lifestyle | Workout_Type, Workout_Frequency, Experience_Level, Water_Intake |
| Target | Calories_Burned |
- Label encoding for categorical variables (
Gender,Workout_Type, etc.) - Standard scaling of numeric features for model consistency
- Derived new feature BMI = weight / height²
- Removed outliers using IQR filtering
- Split dataset → 80% Train / 20% Test
| Model | Accuracy | F1-Score | Observation |
|---|---|---|---|
| Logistic Regression | 95.89% | 0.959 | Best performing model |
| Neural Network | 94.35% | 0.944 | Strong deep learning alternative |
| SVC | 93.84% | 0.937 | High precision |
| Gradient Boosting | 92.82% | 0.928 | Balanced results |
| Decision Tree | 90.25% | 0.900 | Slight overfitting |
✅ Logistic Regression gave the highest accuracy and generalization capability.
| Model | MSE | MAE | Observation |
|---|---|---|---|
| Gradient Boosting | 851.25 | 20.06 | Best regressor |
| Neural Network | 1102.09 | 26.16 | Competitive results |
| Linear Regression | 1646.18 | 30.27 | Baseline model |
| SVR | 1692.80 | 29.82 | Generalizes well |
| Decision Tree | 4538.18 | 50.58 | Overfitting observed |
✅ Gradient Boosting Regressor achieved the lowest error values.
- Classification: Accuracy, Precision, Recall, F1-score, Confusion Matrix
- Regression: Mean Absolute Error (MAE), Mean Squared Error (MSE), R² Score
- Cross-validation: 5-fold validation
- Hyperparameter Tuning: RandomizedSearchCV for optimal parameters
- Session Duration and Avg BPM were the top two most influential factors.
- Weight and Experience Level also contributed significantly.
Feature importance was visualized using Gradient Boosting feature importances and SHAP values.
- Simpler models (like Logistic Regression) can outperform complex ones with good preprocessing.
- Neural networks add flexibility but require tuning and longer training time.
- Heart rate and workout duration are strong predictors of calorie burn.
- Python 3.10+
- Pandas, NumPy, Matplotlib, Seaborn
- Scikit-learn
- TensorFlow / Keras
- XGBoost / GradientBoostingRegressor
- Jupyter Notebook
# 1️⃣ Install dependencies
pip install -r requirements.txt
# 2️⃣ Open the Jupyter notebook
jupyter notebook Predicting_Calories_Burned.ipynbPredicting_Calories_Burned/
│
├── Predicting_Calories_Burned.ipynb # Main notebook
├── requirements.txt
├── README.md
| Task | Best Model | Metric | Score |
|---|---|---|---|
| Classification | Logistic Regression | Accuracy | 95.89% |
| Regression | Gradient Boosting | MSE | 851.25 |
- Fitness tracking systems and smartwatches
- Personalized calorie estimation for users
- AI fitness assistants and gym dashboards
Ei Ei Khaing
Graduate Certificate in Artificial Intelligence & Machine Learning
Fanshawe College | London, Ontario, Canada
Machine Learning Regression Classification Gradient Boosting Logistic Regression Neural Network Calorie Prediction Fitness Analytics