This project aims to understand how student performance (test scores) is affected by various variables such as Gender, Ethnicity, Parental Level of Education, Lunch, and Test Preparation Course.
The dataset used in this project includes the following features:
- Gender: Sex of students (Male/Female).
- Race/Ethnicity: Ethnicity of students (Group A, B, C, D, E).
- Parental Level of Education: Parents' final education (Bachelor's Degree, Some College, Master's Degree, Associate's Degree, High School).
- Lunch: Whether the student had lunch before the test (Standard or Free/Reduced).
- Test Preparation Course: Whether the student completed a test preparation course before the test.
- Math Score: Score of a particular student in math.
- Reading Score: Score of a particular student in reading.
- Writing Score: Score of a particular student in writing.
This project consists of the following steps:
-
Data Cleaning: The dataset was cleaned to handle missing values, outliers, and inconsistencies. This involved imputation, outlier removal, and data normalization.
-
Exploratory Data Analysis (EDA): Exploratory data analysis was performed to gain insights into the relationships between different variables and identify patterns in the data.
-
Feature Engineering: Additional features were created or transformed to enhance the predictive power of the model. This may include feature scaling, one-hot encoding, or feature extraction.
-
Model Training: Several machine learning models were trained and evaluated to predict student performance based on the given features. Models such as linear regression, decision trees, and ensemble methods were considered.
-
Model Evaluation: The performance of each model was evaluated using appropriate evaluation metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared. Cross-validation techniques were employed to ensure robustness of the models.
-
Model Deployment: The best performing model was deployed using Flask on AWS Beanstalk to create a web application for predicting student performance.
-
Web Application: A user-friendly web interface was developed where users can input student information and get predictions for student performance.
- Python: For data cleaning, analysis, and model training.
- Pandas: For data manipulation and cleaning.
- Matplotlib and Seaborn: For data visualization.
- Scikit-learn: For machine learning model training and evaluation.
- Flask: For web application development.
- AWS Beanstalk: For model deployment.
Aws Beanstalk Link : (http://studentsexamperformaceindicator-env.eba-eainjvrs.us-east-1.elasticbeanstalk.com/predictdata)
This project successfully developed a predictive model for student performance based on various demographic and educational factors. The deployed web application provides a user-friendly interface for stakeholders such as educators and policymakers to gain insights into student performance and make informed decisions.