A machine learning-based system for detecting potentially fraudulent transactions in real-time.
This project implements a fraud detection system using machine learning to identify suspicious transactions. The system analyzes various features of transactions, including:
- Transaction amount and timing
- Customer transaction history
- Terminal transaction history
- Temporal patterns
Here are examples of how the system evaluates transactions:
The user interface for entering transaction details and historical statistics
A normal transaction with low fraud probability and no risk factors
A suspicious transaction flagged with multiple risk factors showing:
- High transaction amount (>220)
- Unusual amount for the customer
- Unusual amount for the terminal
- Real-time fraud detection
- Web-based interface using Streamlit
- Model training and evaluation
- Transaction testing capabilities
- Feature importance visualization
- Risk factor analysis
fraud_detection/
├── app.py # Streamlit web application
├── fraud_detection.py # Main training and model code
├── test_fraud_detection.py # Transaction testing script
├── convert_to_csv.py # Data conversion utility
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Clone the repository:
git clone https://github.com/yourusername/fraud_detection.git
cd fraud_detection
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Place your transaction data in the
data
directory - Run the training script:
python fraud_detection.py
- Use the test script to check individual transactions:
python test_fraud_detection.py
- Start the Streamlit app:
streamlit run app.py
- Open your browser and navigate to
http://localhost:8501
- Transaction Amount
- Log-transformed Amount
- Time of Transaction (Hour, Day, Month)
- Customer Statistics:
- Mean Transaction Amount
- Standard Deviation
- Transaction Count
- Terminal Statistics:
- Mean Transaction Amount
- Standard Deviation
- Transaction Count
The system uses a Random Forest classifier with the following characteristics:
- Handles class imbalance
- Provides probability estimates
- Evaluates multiple risk factors
- Visualizes feature importance
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Data preprocessing techniques
- Feature engineering approaches
- Machine learning model implementation
The dataset contains simulated transaction data with the following columns:
- TRANSACTION_ID: Unique identifier for the transaction
- TX_DATETIME: Date and time of the transaction
- CUSTOMER_ID: Unique identifier for the customer
- TERMINAL_ID: Unique identifier for the merchant terminal
- TX_AMOUNT: Amount of the transaction
- TX_FRAUD: Binary variable (0 for legitimate, 1 for fraudulent)
The system detects fraud based on three main patterns:
- Transactions with amount > 220
- Transactions from compromised terminals
- Unusual spending patterns from compromised customer accounts