Skip to content

Intrusion detection in IoT networks using machine learning and GAN-based data augmentation

Notifications You must be signed in to change notification settings

DianaC01/iot-attack-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

IoT Attack Detection with GAN-based Data Augmentation 🔐📶

Intrusion detection for IoT networks using machine learning / deep learning and GAN-based synthetic data to improve class balance.
This repository contains a Google Colab notebook and a concise report-style README that summarizes the theory and the implementation steps.


📚 Background (Short Theory)

Internet of Things (IoT) deployments are exposed to a wide spectrum of attacks (e.g., port scans, DoS, brute-force, botnet traffic). Signature-based IDS struggles with novel or rare patterns, while ML/DL classifiers can generalize better—provided the training data is representative.
However, many IoT datasets are imbalanced: some attack classes are under-represented, which hurts recall. Generative Adversarial Networks (GANs)—here CTGAN for tabular data—can synthesize additional samples for the minority classes to balance the dataset and boost detection metrics.


🗄️ Dataset

  • Based on the IoT-23 dataset (UNB/CIC) or a cleaned derivative.
  • Typical pipeline: CSVs with network-flow features, label column (normal / multiple attack types), and optional train/test splits.

Dataset files are not included in this repo. Place them under data/ when running locally or mount from Drive in Colab.


🧩 Methodology

  1. Preprocessing
    • Load CSV(s), drop irrelevant cols, handle missing values, encode categorical features, scale numeric features.
  2. Baseline Model
    • Train a neural network (Keras MLP) or a classic ML model as a baseline; record metrics (Accuracy, Precision/Recall/F1, Confusion Matrix).
  3. Synthetic Data with CTGAN
    • Train CTGAN on the training split—focusing on minority classes—and generate synthetic samples.
  4. Retrain with Augmented Data
    • Concatenate real + synthetic data; retrain a robust model (e.g., RandomForest or improved MLP).
  5. Evaluation
    • Compare baseline vs. augmented: class-wise precision/recall/F1, macro-F1, ROC-AUC (if applicable), and visualize the confusion matrix.

🛠️ Implementation Steps (Notebook)

  1. Environment setup (Colab): install libs, mount Google Drive (optional).
  2. Load & preprocess: read CSV(s), encode & scale, split into train/test.
  3. Train baseline: fit model, log metrics, save artifacts to results/.
  4. CTGAN training: fit on minority classes, generate N samples per class.
  5. Augmented training: mix real + synthetic, refit model, log metrics.
  6. Evaluation & plots: classification report, confusion matrix, and (optionally) feature importance for tree-based models.

▶️ How to Run (Google Colab)

  1. Open Google Colab and upload notebooks/Untitled10.ipynb (or open from GitHub).
  2. Prepare data:
    • Upload your CSV(s) to Colab, or
    • Mount Google Drive and point the notebook to your data folder.
  3. Run the notebook cells in order (setup → preprocessing → baseline → CTGAN → retrain → evaluation).
  4. Results (figures, CSVs, models) can be saved under results/.

📂 Repository Structure

iot-attack-detection/
├─ notebooks/
│  └─ Untitled10.ipynb      # main Colab notebook
├─ results/                 # generated plots/reports (ignored by Git)
├─ requirements.txt         # Python dependencies
├─ .gitignore
└─ README.md

🔖 Recommended Topics

iot, intrusion-detection, cybersecurity, machine-learning, deep-learning, gan, ctgan, tabular-data


📝 Notes

  • Replace Untitled10.ipynb with a clearer name (e.g., iot23_gan_augmentation.ipynb) once you finalize it.
  • If you need to reproduce on CPU-only machines, consider using RandomForest as baseline + augmented retraining (fast & strong for tabular data).
  • Keep large datasets outside the repo (data/ is ignored via .gitignore).

About

Intrusion detection in IoT networks using machine learning and GAN-based data augmentation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published