Student Performance Prediction

Predict final student grades (G3) from demographic, school, and historical grade features.

About

This project trains and ships a production-ready machine learning pipeline to predict final student grades (G3) using the UCI Student Performance dataset. The delivered artifact is a saved sklearn Pipeline (preprocessing + model) and a FastAPI application for inference.

Key goals:

Reproducible preprocessing with ColumnTransformer and Pipeline
Robust baseline and ensemble models (Linear Regression, Random Forest)
Clear evaluation and model persistence
Lightweight FastAPI inference endpoint

Features

End-to-end pipeline: data → preprocessing → training → evaluation → model saving
Preprocessing implemented with SimpleImputer, StandardScaler, and OneHotEncoder
Baseline and Random Forest regression models
Saved sklearn pipeline for zero-drift inference
FastAPI server with Pydantic input validation

Repository Structure

student-performance-prediction/
├── data/
│   └── raw/                 # Raw data files
│
├── models/                  # Saved models
│   └── student_performance_rf.pkl
├── notebooks/               # EDA and experiments
│   └── 01_eda.ipynb
├── src/                     # Source code modules
│   ├── __init__.py
│   ├── evaluate.py
│   ├── predict.py
│   ├── preprocessing.py
│   └── train.py
├── main.py                  # FastAPI application entry point
├── pyproject.toml           # Project configuration
├── requirements.txt         # Dependencies
├── uv.lock                  # Dependency lock file
├── .gitignore
├── .python-version
└── README.md

Dataset

Source: UCI Student Performance dataset (Kaggle mirror recommended)
Files: student-mat.csv (semicolon-separated)
Target column: G3 (final grade, range 0–20)

Note: keep raw data under data/raw/ and never commit sensitive/raw files to public repos.

Quickstart

Create and activate a virtual environment

python -m venv .venv
# mac / linux
source .venv/bin/activate
# windows (powershell)
.\.venv\Scripts\Activate.ps1

pip install -r requirements.txt

Place student-mat.csv in data/raw/
Run training (example)

python src/train.py --data_path data/raw/student-mat.csv --output models/student_performance_rf.pkl

Start the API

fastapi dev main.py
# open http://127.0.0.1:8000/docs

Preprocessing & Modeling

Numerical features: median imputation + StandardScaler
Categorical features: most-frequent imputation + OneHotEncoder(handle_unknown='ignore')
Preprocessing implemented via build_preprocessor(cat_cols, num_cols) in src/preprocessing.py
Models available in src/train.py (Linear Regression baseline, Random Forest)

Evaluation

Model evaluation scripts (src/evaluate.py) produce MAE, RMSE and R² metrics.

Example baseline results (expected range):

MAE ≈ 1.2–1.8
RMSE ≈ 2.0–2.5
R² ≈ 0.6–0.85

API

The FastAPI application loads the serialized sklearn Pipeline and exposes a /predict endpoint.

Input: JSON matching app/schema.py (Pydantic model)

Output: `{ "predicted_G3": float }

Example request payload (replace with realistic values):

The High-Achiever (Positive Test) output: 18 - 20

{
  "school": "GP", 
  "sex": "M", 
  "age": 17, 
  "address": "U", 
  "famsize": "LE3", 
  "Pstatus": "T",
  "Medu": 4, 
  "Fedu": 4, 
  "Mjob": "health", 
  "Fjob": "services", 
  "reason": "reputation",
  "guardian": "mother", 
  "traveltime": 1, 
  "studytime": 4, 
  "failures": 0, 
  "schoolsup": "no",
  "famsup": "yes", 
  "paid": "yes", 
  "activities": "yes", 
  "nursery": "yes", 
  "higher": "yes",
  "internet": "yes", 
  "romantic": "no", 
  "famrel": 5, 
  "freetime": 2, 
  "goout": 2,
  "Dalc": 1, 
  "Walc": 1, 
  "health": 5, 
  "absences": 0, 
  "G1": 18, 
  "G2": 19,
  "Dalc": 1, 
  "Walc": 1, 
  "health": 5, 
  "absences": 0, 
  "G1": 18, 
  "G2": 19
}

The At-Risk Student (Negative Test) output: 0 - 6

{
  "school": "MS", 
  "sex": "M", 
  "age": 19, 
  "address": "R", 
  "famsize": "GT3", 
  "Pstatus": "T",
  "Medu": 1, 
  "Fedu": 1, 
  "Mjob": "other", 
  "Fjob": "other", 
  "reason": "course",
  "guardian": "other", 
  "traveltime": 3, 
  "studytime": 1, 
  "failures": 3, 
  "schoolsup": "no",
  "famsup": "no", 
  "paid": "no", 
  "activities": "no", 
  "nursery": "no", 
  "higher": "no",
  "internet": "no", 
  "romantic": "yes", 
  "famrel": 2, 
  "freetime": 4, 
  "goout": 5,
  "Dalc": 3, 
  "Walc": 4, 
  "health": 2, 
  "absences": 20, 
  "G1": 5, 
  "G2": 4
}

The "Average" Student (Boundary Test) output: 10 - 12

{
  "school": "GP", 
  "sex": "F", 
  "age": 16, 
  "address": "U", 
  "famsize": "GT3", 
  "Pstatus": "T",
  "Medu": 2, 
  "Fedu": 2, 
  "Mjob": "services", 
  "Fjob": "other", 
  "reason": "home",
  "guardian": "father", 
  "traveltime": 1, 
  "studytime": 2, 
  "failures": 0, 
  "schoolsup": "yes",
  "famsup": "yes", 
  "paid": "no", 
  "activities": "yes", 
  "nursery": "yes", 
  "higher": "yes",
  "internet": "yes", 
  "romantic": "no", 
  "famrel": 4, 
  "freetime": 3, 
  "goout": 3,
  "Dalc": 1, 
  "Walc": 2, 
  "health": 4, 
  "absences": 6, 
  "G1": 11, 
  "G2": 10
}

Roadmap & Improvements

Hyperparameter tuning (Grid / Random / Bayesian)
Cross-validation & CI checks
Model explainability (SHAP) and fairness checks
Monitoring: latency, error-rate, prediction drift
Add unit & integration tests for API

Contributing

Contributions are welcome. Please open an issue or PR. Follow the code style and add tests for new functionality.

License

MIT License — see LICENSE file.

Contact

Ali Sulman — https://github.com/alisulmanpro

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Student Performance Prediction

Table of Contents

About

Features

Repository Structure

Dataset

Quickstart

Preprocessing & Modeling

Evaluation

API

Roadmap & Improvements

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

License

alisulmanpro/Student-Performance-Prediction

Folders and files

Latest commit

History

Repository files navigation

Student Performance Prediction

Table of Contents

About

Features

Repository Structure

Dataset

Quickstart

Preprocessing & Modeling

Evaluation

API

Roadmap & Improvements

Contributing

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages