Skip to content

Web API for predicting fraud claims from data trained and evaluated on a Decision Tree Classifier.

License

Notifications You must be signed in to change notification settings

tosi-n/Fraud-predict-web-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fraud-predict-web-service

Tosin Dairo MAY 13, 2020

Web API for predicting fraud claims from data trained and evaluated on a Decision Tree Classifier. The data used for training is an Auto Insurance Claims Data sourced from kaggle. The data consist of 39 datapots and 1000 observations. In order to effectively select a classifier model for a prediction service, multiple machine learning algorithms are trained to achieve a binary prediction output. After evalution of the 6 trained predictive model, the most efficient model was serialized and deployed for production ready prediction service.

Predictive API Endpoints

Trained Machine Learning Algorithms

  • Logistic Regression
  • K Nearest Neighbor
  • Decision Tree
  • Random Forest
  • Linear Discriminant Analysis
  • Support Vector Machine

Requirements

Python 3 or 3.6

Setup

With VENV Create a new virtual environment and install packages.

virtualenv -p python3 venv

source ./venv/bin/activate

Install requirements

pip3 install requirements.txt

Usage

Model training can be done from notebook with sufficient documentation, which can be found in fraud_claim.ipynb link.

Trained and selected Decision Tree model is saved as a pickle serialized file found in directory /src/model_weight/

In order to effectively evaluate the predictive model a calibration and discrimination technique is applied. The figure below shows a ROC Curve with Decision Tree Classifier having an AUC value of 0.71 which is slightly above 0.5. This means that the Decision Tree Classifier has sufficient information for predicting fraud claims. ROC Curve

Run API using Django rest framework

./manage.py createsuperuser

./manage.py runserver

In order to test API prediction endpoints, an API client application like Postman can be used on local host to make predictions on endpoints.


Challenges

  • Data missingness required the use of a Multiple Imputation by Chained Equations package for imputing missing data
  • Data for prediction had imbalanced class. This was dealt with using oversampling techniques for minority class using the SMOTE package
  • Model metrics can be improved for better discrimination by training model on more data samples for fraud prediction

About

Web API for predicting fraud claims from data trained and evaluated on a Decision Tree Classifier.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published