This project analyzes student academic performance using descriptive and inferential statistics to uncover how different factors like gender, parental education, lunch type, and test preparation affect scores in Math, Reading, and Writing.
To apply core statistical techniques on real-world education data to:
- Explore trends and patterns in student scores
- Understand which demographic or behavioral factors influence performance
- Perform hypothesis testing to validate assumptions
We used the Students Performance in Exams Dataset from Kaggle, which includes:
- Gender
- Race/Ethnicity
- Parental Level of Education
- Lunch Type
- Test Preparation Course
- Math, Reading, and Writing Scores
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- SciPy, Statsmodels
- Jupyter Notebook / Google Colab
- Calculated Mean, Median, Mode, Range, Standard Deviation, and IQR for scores
- Cleaned and standardized column names for analysis
Here are some key plots used to support the analysis:
Strong positive correlation among Math, Reading, and Writing scores. Correlation Matrix
Math score distribution shows a slight right skew, peaking around 65–75. Math Score Distribution
Cleaned dataset preview after column renaming and preprocessing. Dataset Preview
Slight difference in Math scores between male and female students shown using a boxplot. Math Scores by Gender
-
Test Preparation vs Math Scores
✅ Students who completed the prep course performed significantly better in Math.
✔️ t-test p-value < 0.05 -
Gender vs Reading Scores
🚫 No statistically significant difference found.
✔️ t-test p-value > 0.05
- ✅ Test preparation improves performance in all subjects
- ✅ Parental education moderately impacts academic results
- ✅ Gender does not significantly affect final scores
- ✅ High correlation exists between all three subject scores
- ✅ Lunch type reflects socioeconomic status influencing performance
Student-Performance-Analysis/ ├── student_performance.csv ├── Student_Performance_Analysis.ipynb ├── README.md └── images/ ├── correlation_matrix.png ├── Distribution_ofmathscores.png ├── dataset.png └── maths_scoresbygender.png
Developed by Sinchana P as part of placement-focused learning in Data Science and Statistics.
Dataset sourced from Kaggle.