This is a group assignment project for WQD7006 Machine Learning for Data Science for the Master of Data Science at University of Malaya (UM). This project uses Credit Card Fraud Detection Dataset 2023 from Kaggle. The report and slides of this project can be found at the report folder.
Logistic Regression, Random Forest, and SVM are the model selected in this project. Random Forest is found to have the best accuracy.
The result of the trained model is presented as follow:
Models | Accuracy (Trainig Set) | Accuracy (Testing Set) |
---|---|---|
Logistic Regression | 0.96 | 0.96 |
Random Forest | 1.0 | 1.0 |
SVM | 1.0 | 1.0 |
The model is then deployed with streamlit community cloud: https://ummlassignment-g5.streamlit.app. The link may be down as the application enter sleeping mode.
-
Download and Install Python 3.10
-
Clone this repository
git clone https://github.com/samueltan3972/ML-Assignment.git
-
Install necesseray dependency
cd <bla> pip install pipenv # Optional, but recommend using virtual environment python3 -m pipenv shell pipenv install # if using pipenv, but below method works whether is pipenv pip install -r requirements.txt # choose 1 to run only, if above is selected, don't run this
-
Start Jupyter Lab and open Model_Training.ipynb to see the training process. However, the model has been trained, no further training is required.
jupyter lab
-
To run the developed application for deployment.
streamlit run Credit_Card_Fraud_Detection_App.py
Dataset Link: https://www.kaggle.com/datasets/nelgiriyewithana/credit-card-fraud-detection-dataset-2023