🕵️ Fraud Detection Pipeline

This project implements a machine learning pipeline for detecting fraudulent transactions using ensemble and distance-based classifiers. The pipeline includes preprocessing, model training, hyperparameter tuning, evaluation, and experiment tracking via MLflow.

📊 Models Compared

Two models were trained and evaluated:

ExtraTreesClassifier
KNeighborsClassifier

✅ Performance Summary

After cross-validation and evaluation on the test set:

ExtraTreesClassifier consistently outperformed KNN in terms of:
- F1-score
- Recall
- ROC AUC
It also demonstrated better generalization and robustness across imbalanced class distributions.

📌 Conclusion: ExtraTreesClassifier is the recommended model for this fraud detection task.

📁 Project Structure

Fraud_detection/
├── src/                 # Source code modules
├── notebooks/           # Jupyter notebooks for exploration and prototyping
├── configs/
│   ├── config.yaml      # Data and path configurations
│   └── params.yaml      # Model parameters
├── .env                 # Environment variables (see below)
├── main.py              # Entry point for running the pipeline
├── requirements.txt     # Python dependencies
├── README.md            # Project documentation

🚀 Getting Started

Follow these steps to set up the environment and run the pipeline.

1. Clone the Repository

git clone https://github.com/Shahriyar-1988/Fraud_detection.git
cd Fraud_detection

2. Create and Activate a Virtual Environment

python -m venv venv
source venv/bin/activate       # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment Variables

Create a .env file in the project root with the following content:

MLFLOW_TRACKING_URI=http://localhost:5000

⚠️ This URI points to the MLflow server used for experiment tracking. Make sure the server is running before executing the pipeline.

You can start a local MLflow UI using:

mlflow ui

Then open http://localhost:5000 in your browser.

5. Run the Pipeline

python main.py

🧪 Experiment Tracking with MLflow

This project uses MLflow for:

Logging models and metrics
Comparing model performance
Storing artifacts like confusion matrices and classification reports

🤝 Contributing

Feel free to fork this repo and submit pull requests. Contributions for improving model performance, optimizing preprocessing, or enhancing logging are welcome!

📄 License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
artifacts		artifacts
notebooks		notebooks
src		src
.gitignore		.gitignore
ExtraTree_confusion_matrix.jpg		ExtraTree_confusion_matrix.jpg
KNN_confusion_matrix.jpg		KNN_confusion_matrix.jpg
README.md		README.md
config.yaml		config.yaml
main.py		main.py
params.yaml		params.yaml
requirements.txt		requirements.txt
schema.yaml		schema.yaml
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕵️ Fraud Detection Pipeline

📊 Models Compared

✅ Performance Summary

📁 Project Structure

🚀 Getting Started

1. Clone the Repository

2. Create and Activate a Virtual Environment

3. Install Dependencies

4. Configure Environment Variables

5. Run the Pipeline

🧪 Experiment Tracking with MLflow

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Shah-xai/Fraud_detection

Folders and files

Latest commit

History

Repository files navigation

🕵️ Fraud Detection Pipeline

📊 Models Compared

✅ Performance Summary

📁 Project Structure

🚀 Getting Started

1. Clone the Repository

2. Create and Activate a Virtual Environment

3. Install Dependencies

4. Configure Environment Variables

5. Run the Pipeline

🧪 Experiment Tracking with MLflow

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages