This project implements a machine learning pipeline for detecting fraudulent transactions using ensemble and distance-based classifiers. The pipeline includes preprocessing, model training, hyperparameter tuning, evaluation, and experiment tracking via MLflow.
Two models were trained and evaluated:
- ExtraTreesClassifier
- KNeighborsClassifier
After cross-validation and evaluation on the test set:
- ExtraTreesClassifier consistently outperformed KNN in terms of:
- F1-score
- Recall
- ROC AUC
- It also demonstrated better generalization and robustness across imbalanced class distributions.
📌 Conclusion: ExtraTreesClassifier is the recommended model for this fraud detection task.
Fraud_detection/
├── src/ # Source code modules
├── notebooks/ # Jupyter notebooks for exploration and prototyping
├── configs/
│ ├── config.yaml # Data and path configurations
│ └── params.yaml # Model parameters
├── .env # Environment variables (see below)
├── main.py # Entry point for running the pipeline
├── requirements.txt # Python dependencies
├── README.md # Project documentation
Follow these steps to set up the environment and run the pipeline.
git clone https://github.com/Shahriyar-1988/Fraud_detection.git
cd Fraud_detectionpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtCreate a .env file in the project root with the following content:
MLFLOW_TRACKING_URI=http://localhost:5000
⚠️ This URI points to the MLflow server used for experiment tracking. Make sure the server is running before executing the pipeline.
You can start a local MLflow UI using:
mlflow uiThen open http://localhost:5000 in your browser.
python main.pyThis project uses MLflow for:
- Logging models and metrics
- Comparing model performance
- Storing artifacts like confusion matrices and classification reports
Feel free to fork this repo and submit pull requests. Contributions for improving model performance, optimizing preprocessing, or enhancing logging are welcome!
This project is licensed under the MIT License. See LICENSE for details.