Student Performance Prediction Using Machine Learning

Project Overview

This project focuses on predicting student academic performance using multiple machine learning regression models. The target variable is the final grade (G3), predicted using academic, demographic, family, and lifestyle-related features.

The objective is to analyze, compare, and optimize different regression models through systematic preprocessing and hyperparameter tuning, and identify the most effective model for accurate grade prediction.

Problem Statement

Student performance is influenced by a wide range of factors such as:

Previous academic scores
Family background
Study habits
Social and lifestyle attributes

Accurately predicting final performance can help educational institutions:

Identify at-risk students early
Design targeted academic interventions
Improve overall learning outcomes

This project addresses the question:

Which machine learning regression model best predicts student performance, and how much improvement can be achieved through hyperparameter tuning?

Dataset Description

Dataset: Student Performance Dataset (student-mat.csv)
Total samples: 395
Total features: 33
Target variable: Final grade (G3)
No missing values

The dataset contains a mix of numerical and categorical features, including:

Academic scores (G1, G2)
Family background
Study time and failures
Health, absences, and lifestyle indicators

Data Preprocessing

Categorical features converted to numerical values using Label Encoding
Feature scaling performed using StandardScaler
Dataset split into training and testing sets (80:20 split)
Exploratory analysis performed using:
- Correlation matrix
- Scatter matrix visualization

Models Implemented

The following regression models were implemented and evaluated:

Linear Regression
Support Vector Regression (RBF Kernel)
Random Forest Regressor
AdaBoost Regressor
Gradient Boosting Regressor
Decision Tree Regressor

Hyperparameter Tuning

To improve model performance, extensive tuning was performed using:

GridSearchCV
RandomizedSearchCV

Key tuned parameters included:

Regularization strength and kernel parameters (SVR)
Number of estimators, depth, and feature selection (Tree-based models)
Learning rate and ensemble size (Boosting models)

Evaluation Metric

R² Score was used as the primary evaluation metric
Higher R² indicates better explanatory power and prediction accuracy

Performance Summary

Model	R² Score
Linear Regression	0.75
SVR (RBF Kernel)	0.79
Random Forest Regressor	0.83
AdaBoost Regressor	0.83
Gradient Boosting Regressor	0.81
Decision Tree Regressor	0.85

Best performing model: Decision Tree Regressor (after hyperparameter tuning)

Key Observations

Ensemble methods consistently outperformed simple linear models
Hyperparameter tuning significantly improved model performance
Decision Tree and ensemble-based models captured non-linear relationships effectively
Feature scaling played a critical role in SVR performance

Visual Analysis

The project includes:

Correlation and scatter matrix visualizations for feature relationships
Bar chart comparison of R² scores across all models

These visualizations help interpret both feature influence and model effectiveness.

Technologies Used

Python
NumPy
Pandas
Scikit-learn
Matplotlib

Conclusion

This project demonstrates how model selection, preprocessing, and hyperparameter tuning can dramatically influence regression performance. Tree-based and ensemble models proved to be the most effective for predicting student academic outcomes in this dataset.

Future Enhancements

Feature importance analysis
Cross-dataset validation
Neural network-based regression
Model interpretability using SHAP or LIME

Author

Aadithya K L

This project emphasizes understanding model behavior and performance trade-offs rather than relying on default configurations.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
KLAadithya__MLProject_phase2.ipynb		KLAadithya__MLProject_phase2.ipynb
KLAadithya__MLProject_phase2.pdf		KLAadithya__MLProject_phase2.pdf
KLAadithya__MLProject_phase3.ipynb		KLAadithya__MLProject_phase3.ipynb
KLAadithya__MLProject_phase3.pdf		KLAadithya__MLProject_phase3.pdf
KLAadithya__baseline_experiment.ipynb		KLAadithya__baseline_experiment.ipynb
KLAadithya__baseline_experiment.pdf		KLAadithya__baseline_experiment.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Student Performance Prediction Using Machine Learning

Project Overview

Problem Statement

Dataset Description

Data Preprocessing

Models Implemented

Hyperparameter Tuning

Evaluation Metric

Performance Summary

Key Observations

Visual Analysis

Technologies Used

Conclusion

Future Enhancements

Author

About

Uh oh!

Releases

Packages

Languages

Aadithya-kl/ML_Project_student-performance

Folders and files

Latest commit

History

Repository files navigation

Student Performance Prediction Using Machine Learning

Project Overview

Problem Statement

Dataset Description

Data Preprocessing

Models Implemented

Hyperparameter Tuning

Evaluation Metric

Performance Summary

Key Observations

Visual Analysis

Technologies Used

Conclusion

Future Enhancements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages