This project implements a machine learning-powered Web Application Firewall (WAF) to detect and prevent web-based attacks, including threats from the OWASP Top 10. Built using Python, FastAPI, and Scikit-learn, the WAF combines NLP-based supervised learning with anomaly detection to provide real-time threat classification and mitigation.
Traditional WAFs rely heavily on static rules and signatures, making them less effective against zero-day attacks or evasive payloads. This project enhances traditional detection capabilities using machine learning models trained on both synthetic and real-world attack traffic.
The firewall is integrated into a modular FastAPI backend and can be used as a standalone microservice or within an existing web stack.
| Capability | Description |
|---|---|
| ML-Based Threat Detection | Detects malicious payloads using classification models trained on attack patterns. |
| OWASP Coverage | Focuses on high-impact vulnerabilities including SQLi, XSS, SSRF, RCE, and more. |
| Anomaly Detection | Uses unsupervised models to identify outliers and unknown attacks. |
| Real-Time Classification | Handles live traffic and provides immediate threat feedback. |
| FastAPI Integration | Easy-to-deploy REST API for modular usage or CI/CD testing. |
| Burp Suite Compatibility | Simulates attacks and traffic using Burp Suite for model evaluation. |
| Component | Description |
|---|---|
| Python | Primary programming language |
| FastAPI | Backend framework for serving the WAF API |
| Scikit-learn | ML models for classification and anomaly detection |
| Regex & NLP | Used for pattern extraction and text feature engineering |
| Burp Suite | Used for payload simulation and traffic replay |
- Dataset: Mix of real-world payloads, synthetic attack vectors, and clean traffic.
- Features:
- Regex patterns
- Token frequency
- Length and entropy measures
- Algorithms:
- Supervised: Random Forest, Logistic Regression
- Unsupervised: Isolation Forest, One-Class SVM
- Validation: Burp Suite replay to simulate attack traffic and test detection effectiveness.
| Method | Endpoint | Description |
|---|---|---|
| POST | /scan |
Accepts a web request payload and classifies it as benign or malicious. |
| GET | /status |
Health check endpoint. |
| GET | /rules |
Returns current regex-based rule set. |
This project is licensed under the MIT License.
Please refer to the LICENSE file for more details.
Abhijit Rai
- GitHub: https://github.com/aerostorm19
- LinkedIn: https://www.linkedin.com/in/abhijit-rai-163214280/
- Email: abhi160407@gmail.com