In an increasingly complex financial landscape, the ability to accurately predict loan default risk has become paramount for financial institutions, lenders, and investors. This project aims to develop a robust and data-driven solution for loan default prediction, leveraging data preprocessing and feature engineering techniques, extensive data analysis and state-of-the-art machine learning algorithms.
The Dataset used for the project is historical loan data from Small Business Administration (SBA) of the United States. The dataset contains 899,164 records and 27 features. The dataset is highly imbalanced with only 17.5% of the loans being defaulted.
- Python
- Scikit-learn
- Pandas
- Streamlit
- FastAPI
-
Install the dependencies.
pip install -r requirements.txt
-
Change the directory to the src folder and run the FastAPI server.
uvicorn server:app --reload
-
Run the Streamlit app from root directory.
streamlit run app.py
- Min Li, Amy Mickel & Stanley Taylor (2018) “Should This Loan be Approved or Denied?”: A Large Dataset with Class Assignment Guidelines, Journal of Statistics Education, 26:1, 55-66, DOI: 10.1080/10691898.2018.1434342