This application predicts credit risk using machine learning algorithms applied to German credit data. It features a comprehensive preprocessing pipeline, multiple trained models including Random Forest, Logistic Regression, SVC, and XGBoost, with a stacking ensemble for optimal performance.
- Interactive Web Interface: User-friendly Streamlit application for real-time credit risk predictions
- Advanced Preprocessing: Feature engineering, outlier handling, and data imputation techniques
- Ensemble Modeling: Stacking ensemble approach combining multiple ML algorithms
- Model Interpretation: SHAP analysis for explainable AI
- Performance Metrics: Confusion matrices and ROC curves for transparent model evaluation
- Python: Core programming language
- Scikit-learn: Machine learning algorithms and preprocessing
- XGBoost: Gradient boosting implementation
- SMOTE: Handling class imbalance
- Streamlit: Web application framework
- Pandas/NumPy: Data manipulation
- Matplotlib/Plotly: Data visualization
- SHAP: Model interpretation
.
├── creditrisk.py # Preprocessing and feature engineering
├── train_models.py # Model training script
├── shap_analysis.py # Model interpretation with SHAP
├── performance_plots.py # Generate performance visualizations
├── credit_risk_streamlitapp.py # Web application
├── data/
│ ├── german_credit_data.csv # Original dataset
│ └── preprocessed_credit_data.csv # Preprocessed data
├── models/ # Saved models and encoders
└── plots/ # Performance and interpretation plots
# Clone the repository
git clone https://github.com/Kpreya/Credit_Risk_Predictor_App.git
cd Credit_Risk_Predictor_App
# Install dependencies
pip install -r requirements.txtpython creditrisk.pyThis script loads the German credit dataset, performs cleaning, feature engineering, and preprocessing tasks, and saves the artifacts for model training.
python train_models.pyTrains multiple machine learning models and saves them for later use.
python shap_analysis.py
python performance_plots.pyGenerates model interpretation plots and performance metrics.
streamlit run credit_risk_streamlitapp.pyLaunches the interactive web application for credit risk prediction.
The application employs a stacking ensemble of multiple base models to achieve optimal prediction performance. Model evaluation metrics include:
- Confusion matrix
- ROC curve
- SHAP feature importance
- Cross-validation scores
Key factors in credit risk prediction include:
- Financial security (saving level relative to credit amount)
- Checking account status
- Credit amount and duration
- Savings account level
- Purpose of the loan
- Age and employment status
- Additional model algorithms for performance comparison
- Deep learning approaches for complex pattern recognition
- API deployment for integration with banking systems
- Enhanced visualization options for better interpretability
Developed by Krishnopreya C. Contact :krish6.ch@gmail.com
- German Credit Data from UCI Machine Learning Repository
- SHAP library for model interpretation
- Streamlit for web application framework