This project analyzes loan application data to identify factors that contribute to loan default. It uses real-world datasets to perform data cleaning, EDA, statistical testing, and derive actionable insights.
Loan-default-analysis/ │ ├── data/ # Raw CSV files ├── notebooks/ # Jupyter Notebook for EDA & testing ├── output/ # Saved charts/plots ├── docs/ # Summary and documentation ├── requirements.txt # Required libraries └── README.md # Project overview
To uncover insights from demographic and financial data that influence whether a loan gets approved or not — and support risk assessment in loan applications.
| Step | Description |
|---|---|
| ✅ Data Cleaning | Handled missing values, verified data types |
| ✅ EDA | Univariate & Bivariate analysis (age, income, etc.) |
| ✅ Visualization | Countplot, Histogram, Boxplot, Heatmap |
| ✅ Statistical Testing | Chi-square, T-test, ANOVA (Scipy) |
| ✅ Documentation | Summary reports in Markdown files |
| ✅ Organized Code | Jupyter Notebook, structured folders |
- Python: pandas, numpy, seaborn, matplotlib, scipy
- Jupyter Notebook
- Markdown
- VS Code / GitHub
All visuals are stored in /output/ folder.
- 📘 EDA Report →
docs/eda_summary.md - 📘 Final Summary →
docs/summary.md
Tarun Kumar Malviya
Data Analyst | Python | SQL | Tableau | EDA | Hypothesis Testing
For collaboration or queries:
📧 tarunmalviya804@gmail.com
- ✅ Completed: EDA + Hypothesis Testing
- 🟡 Optional Next Phase: ML Modeling (Logistic Regression)



