This project demonstrates how to use Apache Kafka Streams to detect fraudulent activities by analyzing IP logs in real-time. By processing the streaming data, the system flags potential fraud by identifying suspicious patterns, such as repeated login attempts or access from unusual IP addresses. Kafka Streams offers a scalable and fault-tolerant solution for processing large-scale log data, making it ideal for real-time fraud detection.
- Real-time Processing: Streams IP logs continuously from Kafka topics for real-time detection.
- Fraud Detection: Identifies and flags suspicious activities based on predefined rules and patterns.
- Scalable Architecture: Leverages Kafka Streams for horizontal scalability and fault tolerance.
- Customizable Rules: Allows for configurable fraud detection rules to adjust sensitivity.
- Kafka Streams: Core for stream processing.
- Apache Kafka: Message broker for distributing IP log data.
- Java: Programming language used for building Kafka Streams application.
- Maven: Dependency management and build tool.
- Zookeeper: Manages Kafka cluster and configuration.
- Docker (Optional): For containerized deployment of Kafka and Zookeeper.
- Prometheus & Grafana (Optional): For monitoring and visualizing Kafka metrics.
- Apache Kafka installed locally or available via a cluster (or Docker setup).
- Java 8+ installed.
- Maven installed.
- Zookeeper installed (if running Kafka locally).
-
Clone the repository:
git clone https://github.com/sabareh/fraud-detection-using-kafka-streams.git cd fraud-detection-using-kafka-streams
-
Install Dependencies:
Navigate to the project directory and install Maven dependencies:
mvn clean install
-
Start Kafka and Zookeeper:
If using Docker, you can use the provided
docker-compose.yml
:docker-compose up
Alternatively, start Kafka and Zookeeper manually.
-
Configure Kafka Topics:
Create the necessary topics for streaming logs and fraud detection alerts.
kafka-topics.sh --create --topic ip-logs --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1 kafka-topics.sh --create --topic fraud-alerts --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
-
Run the Application:
Use Maven to run the application:
mvn exec:java -Dexec.mainClass=com.yourpackage.FraudDetectionApp
-
Send IP Logs to Kafka:
You can simulate IP log data by producing messages to the
ip-logs
topic. For example:kafka-console-producer.sh --topic ip-logs --bootstrap-server localhost:9092
Send some test log data to see how the system reacts.
Modify the CacheIPLookup.java
file to adjust the detection logic. For example, you can change the rule that flags IPs with more than X failed logins within Y minutes or alerts on access from specific geo-locations.
- src/main/java: Contains the Kafka Streams application and logic for fraud detection.
- pom.xml: Maven project file containing dependencies and build configuration.
- docker-compose.yml: Optional Docker configuration for running Kafka and Zookeeper locally.
- prometheus.yml: (Optional) Configuration file for setting up Prometheus for monitoring.
Imagine monitoring login activity for an online banking system. The fraud detection system identifies abnormal login patterns, such as multiple failed attempts from a single IP or login attempts from IP addresses associated with known proxies or malicious sources.
- Implement more advanced fraud detection using machine learning models.
- Integration with other real-time alerting services like Slack or email.
- Store detected fraud events in a database for audit purposes.
- Support for geo-fencing by integrating with a geo-location API.
Feel free to fork this repository and make pull requests. Any contributions, suggestions, or improvements are welcome!
This project is licensed under the MIT License. See the LICENSE file for details.