- Create a Virtual Environment (VE)
python -m venv venv- Activate the VE
source venv/bin/activate- Install the packages:
pip install -r requirements.txt( If there is an error installing scapy, install it from Scapy )
- Install tcplice (only needed for evaluating MUDgee. Not required for the framework.)
sudo apt install tcpsliceartifact
├── dataset
├── evaluation
│ ├── devicemein
│ ├── geniotid
│ ├── iotdevid
│ ├── iot-sentinel
│ ├── mudgee
│ └── your-smart-home
├── framework
│ └── eval_modules
├── scripts
└── main.pyCode for the Main Controller is in this file.
Download the dataset from Zenodo and put the unzipped directories for evaluations in this directory.
This is the workspace directory where all the relavant data for the evaluations—such as configs, processed datasets, trained models, and evaluation results—are stored. We create a separate directory for each method we evaluate.
An example config file for each method is included in the repository. The other files—processed data, models, and evluations results—are generated by the framework and saved here.
The workspace for MUDgee contains additional scripts and config files for the datasets used for our evaluations. It also conains the executable for the original code to generates MUD profiles.
The code for the framework is organized in this directory. Each module of the framework is implemented as a separate class, and each class is placed in its own file:
- data_ingestion.py - Code for the Data Ingestion module.
- data_preprocessing.py - Code for the Data Preprocessing module.
- model_training.py - Base class for Model Training.
- model_testing.py - Base class for Model Testing, including a common function for generating evaluation reports.
- packet.py - Code for Packet class which consists of the information extracted from network packets.
- util.py - Common utility functions.
To evaluate a new method:
- Create a new class for training and another for testing.
- Inherit from the respective base classes.
- Implement your method-specific logic inside these child classes.
- Save the new classes in the eval_modules directory.
The methods we re-implemented and evaluated are listed here.
This directory contains miscelleneous scripts used to automate parts of the evaluation process.
-
Verify method code structure
Ensure the original code for your method has separate functions for training and testing the model. -
Add a training module
- Create a new file named
<method-name>_training.pyinside theframework/eval_modulesdirectory. - In this file, define a class named
<MethodName>Trainingthat inherits from theModelTrainingclass in theframeworkdirectory.
- Create a new file named
-
Follow framework examples
Check the existing files inframework/eval_modulesfor examples on how to properly call your method’s training functions from the framework. -
Add a testing module
- Repeat steps 2–3, but for testing.
- Create
<method-name>_testing.pywith a class<MethodName>Testingthat inherits from theModelTestingclass.
-
Set up evaluation directory
- Inside the
evaluationdirectory, create a new subdirectory named after your method:evaluation/<method-name>/
- Inside the
-
Add configuration file
- Place a
config.ymlfile inside your method’s evaluation directory:evaluation/<method-name>/config.yml - Refer to Part 2 below for details on filling out the config file.
- Place a
To configure the framework for evaluating a new method, edit the config.yml file.
(See the example config.yml in the main directory for reference.)
This section contains metadata about the method being tested.
- method-name: Name of the method (no spaces).
- input-type: Type of dataset input:
mixed→ traffic from all devices is combined.per_device→ traffic is split by device into separate directories.
- log-dir: Path where evaluation logs will be stored. The directory is created automatically.
- report-dir: Path where evaluation reports/results will be stored.
general:
method-name: "<name-of-the-method>"
input-type: "<mixed/per_device>"
log-dir: "./logs"Defines which datasets to use for evaluation.
- type: Dataset structure (
mixedorper_device). - load: Not fully implemented — keep it
no. - list-datasets: Add one or more datasets.
- If one dataset is used → framework applies cross-validation.
- If two datasets are used → framework trains on one and tests on the other (see data-preprocessing).
data-ingestion:
list-datasets:
- name: "<dataset-1>"
path: "<path-to-dataset-dir>"
type: "<mixed/per_device>"
load: no
- name: "<dataset-2>"
path: "<path-to-dataset-dir>"
type: "<mixed/per_device>"
load: noDefines how datasets are used for training/testing.
- required-data-format: Keep as raw (don’t change).
- use-known-devices: If yes, testing only includes devices seen in training.
- train-dataset: Dataset to use for training.
- test-dataset: Dataset to use for testing.
- threshold: Cutoff for detecting unknown devices (if score falls below this value, device is marked unknown).
data-preprocessing:
required-data-format: raw
use-known-devices: no
train-dataset: "<dataset-1>"
test-dataset: "<dataset-2>"
threshold: 0.7
- Example configuration for when the train and test datasets are the different:
data-ingestion:
list-datasets:
- name: "Exp-1_iot"
path: "dataset/Exp-1_iot_per_device"
type: "per_device"
load: no
- name: "Exp-1_iot_mobile"
path: "dataset/Exp-1_iot_mobile_per_device"
type: "per_device"
load: no
data-preprocessing:
required-data-format: raw
use-known-devices: yes
train-dataset: "Exp-1_iot"
test-dataset: "Exp-1_iot_mobile"
threshold: 0.7- Example configuration for when the train and test datasets are the same
data-ingestion:
list-datasets:
- name: "Exp-1_iot"
path: "dataset/Exp-1_iot_per_device"
type: "per_device"
load: no
data-preprocessing:
required-data-format: raw
use-known-devices: yes
train-dataset: "Exp-1_iot"
test-dataset: "Exp-1_iot"
threshold: 0.7Specifies how the framework should run training.
- class-name: Training class name (must extend ModelTraining).
- class-path: Path to the training file (inside framework/eval_modules/).
- train-model:
- yes → train a new model.
- no → use an existing model.
- paths:
- repo: Path to the original method’s codebase.
- eval-dir: Directory for storing processed data/reports.
- model-dir: Directory for saving trained models.
- device-file: File listing devices and MAC addresses (ground truth).
model-training:
class-name: "<name-of-the-method>Training"
class-path: "framework/eval_modules/<name-of-the-method>_training.py"
train-model: yes
paths:
repo: "<path-to-the-code-repository-to-evaluate>"
eval-dir: "./evaluation/<name-of-the-method>"
model-dir: "./evaluation/<name-of-the-method>/models"
device-file: "{}/devices.txt"
model-testing:
class-name: "<name-of-the-method>Testing"
class-path: "framework/eval_modules/<name-of-the-method>_testing.py"
report-dir: "./evaluation/<name-of-the-method>/reports/"- Enter the
artifactdirectory.
cd artifact- Run the framework.
python main.py evaluation/<name-of-the-method>/config.ymlReproduce the results from the paper by following these steps.
If the evaluation ran without error:
- the report will be generated in the
evaluation/<method-name>/reports/directory. - if it is unclear which report to check:
- Open the latest log file.
- The last line will contain the path to the file. Example:
2025-09-11 13:20:55,723 - root - DEBUG - [Model Testing] : Result report generated at path: ./evaluation/iot-sentinel/reports/iot-sentinel_Exp-1_iot_0.json
If the evaluation faced an error:
- Check the log file in the
logs/directory. - Log files are timestamped — the most recent one corresponds to your last run.
- The log will contain details about the error.
- The top contains information about the experiment configurations, time of the experiment, and the list of devices used for the evaluation.
"method": "geniotid",
"timestamp": "09/09/2025, 16:00:35",
"eval-config": {
"required-data-format": "raw",
"use-known-devices": false,
"train-dataset": "Exp-1_iot",
"test-dataset": "Exp-1_iot",
"threshold": 0.7
},
"cross-validation": true,
"devices": [
"amazon_echo",
"bosch_dishwasher",
"calex_motion_sensor",
"calex_slimme_led_vloerlamp",
"chromecast_v3",
"echo_dot",
"google_home_mini",
"google_nest_cam",
"ilife_t10s",
"nest_smokealarm",
"philips_hue",
"philips_luchtreiniger_600-serie",
"philips_tv",
"pni_safe_house_pg600lr",
"princess_smart_aerofryer",
"ring_doorbell",
"salcar_radiatorthermostaat_trv801w",
"samsung_fridge",
"smartplug-hs110",
"smartplug-kp105",
"watersensor",
"welltobe_pet_feeder"
]- The full-classification-report lists performance values for each device, followed by the overall performance values.
- When
use-known-devicesis set tono, the performance for identifying unknown devices is included as a separate entry in the report.
- When
"full-classification-report": {
"amazon_echo": {
"precision": 0.5827123695976155,
"recall": 0.8947368421052632,
"f1-score": 0.7057761732851986,
"support": 437.0
},
"bosch_dishwasher": {
"precision": 0.9846153846153847,
"recall": 0.8,
"f1-score": 0.8827586206896552,
"support": 80.0
},
.
.
.
"weighted avg": {
"precision": 0.8822505454629617,
"recall": 0.8459188127455259,
"f1-score": 0.8445809172986433,
"support": 4582.0
}
}
- The predictions contains the list of all true values, predicted values, and their prediction probabilities.
"predictions": [
[
[
"smartplug-kp105",
"smartplug-kp105",
0.8009739101768557
],
[
"smartplug-kp105",
"amazon_echo",
0.08665806288730057
],
.
.
.
]
]