Student Performance Analysis

Project Overview

This project conducts a comprehensive analysis of student performance using a dataset extracted from a MySQL database. It adheres to the complete data science lifecycle, encompassing data ingestion, preprocessing, model development, evaluation, and deployment. Advanced tools such as MLflow and DVC are utilized for experiment tracking and data versioning, ensuring a reproducible and efficient workflow. Version control is maintained via Git and GitHub, providing a clear audit trail of project progress through iterative commits.

The objective is to leverage machine learning algorithms to forecast academic performance, enabling proactive interventions and data-driven educational strategies.

Technologies Used

Data Science Tools



MLflow for experiment tracking	DVC for data version control	Pandas for data manipulation


NumPy for numerical computations	Matplotlib for data visualization	MySQL for data storage and retrieval

Introduction

This project aims to implement predictive analytics to model student performance using machine learning. By adhering to a structured data science workflow, we systematically approach data handling, model development, and evaluation. The project serves as a practical application of machine learning methodologies in educational data mining.

Data Ingestion

Data was ingested from a MySQL database into a Pandas DataFrame. This process involved querying the database, handling data types, and ensuring consistency in the data structure for further analysis.

Data Transformation

The transformation phase included rigorous data preprocessing. Tasks such as handling One Hot Encodng, normalizing data, and performing feature engineering were executed. This step is critical for enhancing model performance and ensuring data quality.

Exploratory Data Analysis (EDA)

EDA was conducted using Matplotlib and Seaborn, focusing on statistical summaries and visualizations. Insights were drawn regarding data distribution, correlations, and potential anomalies, which guided the feature selection and model development process.

Model Training

Various machine learning models were trained, including Linear Regression, Decision Trees, XGBregessor, Random Forest Regressor, AdaBoost, and CatBoost. A GridSearchCV was applied to all models for hyperparameter tuning to identify the best configuration for each. After evaluating the performance of all models, Linear Regression emerged as the best performer, delivering the highest accuracy and lowest error metrics among the tested algorithms.

Results

Performance metrics of the Linear Regression model include:

RMSE: 5.39
R² Score: 0.88
MAE: 4.21

These metrics underscore the model's predictive accuracy and robustness in handling the dataset.

DagsHub Experiments

DagsHub was utilized for experiment tracking and collaborative development. This platform enabled efficient version control and seamless collaboration among team members.

MLflow Tracking

The project employed MLflow for comprehensive experiment tracking. Below are the configuration details for MLflow:

MLFLOW_TRACKING_URI: https://dagshub.com/38832/mlproject.mlflow
MLFLOW_TRACKING_USERNAME: 38832
MLFLOW_TRACKING_PASSWORD: ed5a6942f3480d84b1bbd6bfccba8e3c5fbc9195

MLflow ensured a streamlined tracking process, capturing all model parameters, metrics, and artifacts, thereby facilitating reproducibility and transparency.

Conclusion

The project provided a detailed analysis of factors influencing student performance and demonstrated the applicability of machine learning in educational settings. The insights gained and the models developed can be leveraged for targeted interventions and strategic decision-making in educational institutions. Future work will focus on expanding the dataset and integrating additional predictive features to further enhance model accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.dvc		.dvc
artifacts		artifacts
assets		assets
catboost_info		catboost_info
mlruns/0		mlruns/0
notebook		notebook
src/mlproject		src/mlproject
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
dagshubexp.png		dagshubexp.png
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Student Performance Analysis

Project Overview

Technologies Used

Data Science Tools

Table of Contents

Introduction

Data Ingestion

Data Transformation

Exploratory Data Analysis (EDA)

Model Training

Results

DagsHub Experiments

MLflow Tracking

Conclusion

About

Uh oh!

Releases

Packages

Languages

38832/Student-Performance-Analysis

Folders and files

Latest commit

History

Repository files navigation

Student Performance Analysis

Project Overview

Technologies Used

Data Science Tools

Table of Contents

Introduction

Data Ingestion

Data Transformation

Exploratory Data Analysis (EDA)

Model Training

Results

DagsHub Experiments

MLflow Tracking

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages