EvalIoT: Framework to Evaluate IoT Device Identification Methods.

Setup environment

Create a Virtual Environment (VE)

python -m venv venv

Activate the VE

source venv/bin/activate

Install the packages:

pip install -r requirements.txt

( If there is an error installing scapy, install it from Scapy )

Install tcplice (only needed for evaluating MUDgee. Not required for the framework.)

sudo apt install tcpslice

Directory Structre

artifact
├── dataset
├── evaluation
│   ├── devicemein
│   ├── geniotid
│   ├── iotdevid
│   ├── iot-sentinel
│   ├── mudgee
│   └── your-smart-home
├── framework
│   └── eval_modules
├── scripts
└── main.py

main.py

Code for the Main Controller is in this file.

Dataset

Download the dataset from Zenodo and put the unzipped directories for evaluations in this directory.

Evaluation

This is the workspace directory where all the relavant data for the evaluations—such as configs, processed datasets, trained models, and evaluation results—are stored. We create a separate directory for each method we evaluate.

An example config file for each method is included in the repository. The other files—processed data, models, and evluations results—are generated by the framework and saved here.

The workspace for MUDgee contains additional scripts and config files for the datasets used for our evaluations. It also conains the executable for the original code to generates MUD profiles.

Framework

The code for the framework is organized in this directory. Each module of the framework is implemented as a separate class, and each class is placed in its own file:

data_ingestion.py - Code for the Data Ingestion module.
data_preprocessing.py - Code for the Data Preprocessing module.
model_training.py - Base class for Model Training.
model_testing.py - Base class for Model Testing, including a common function for generating evaluation reports.
packet.py - Code for Packet class which consists of the information extracted from network packets.
util.py - Common utility functions.

To evaluate a new method:

Create a new class for training and another for testing.
Inherit from the respective base classes.
Implement your method-specific logic inside these child classes.
Save the new classes in the eval_modules directory.

The methods we re-implemented and evaluated are listed here.

Scripts

This directory contains miscelleneous scripts used to automate parts of the evaluation process.

Getting started

Part 1: Training and Testing Setup

Verify method code structure
Ensure the original code for your method has separate functions for training and testing the model.
Add a training module
- Create a new file named <method-name>_training.py inside the framework/eval_modules directory.
- In this file, define a class named <MethodName>Training that inherits from the ModelTraining class in the framework directory.
Follow framework examples
Check the existing files in framework/eval_modules for examples on how to properly call your method’s training functions from the framework.
Add a testing module
- Repeat steps 2–3, but for testing.
- Create <method-name>_testing.py with a class <MethodName>Testing that inherits from the ModelTesting class.
Set up evaluation directory
- Inside the evaluation directory, create a new subdirectory named after your method:
```
evaluation/<method-name>/
```
Add configuration file
- Place a config.yml file inside your method’s evaluation directory:
```
evaluation/<method-name>/config.yml
```
- Refer to Part 2 below for details on filling out the config file.

Part 2: Editing the Config File

To configure the framework for evaluating a new method, edit the config.yml file.
(See the example config.yml in the main directory for reference.)

1. General Section

This section contains metadata about the method being tested.

method-name: Name of the method (no spaces).
input-type: Type of dataset input:
- mixed → traffic from all devices is combined.
- per_device → traffic is split by device into separate directories.
log-dir: Path where evaluation logs will be stored. The directory is created automatically.
report-dir: Path where evaluation reports/results will be stored.

general:
    method-name: "<name-of-the-method>"
    input-type: "<mixed/per_device>"
    log-dir: "./logs"

2. Data Ingestion Section

Defines which datasets to use for evaluation.

type: Dataset structure (mixed or per_device).
load: Not fully implemented — keep it no.
list-datasets: Add one or more datasets.
- If one dataset is used → framework applies cross-validation.
- If two datasets are used → framework trains on one and tests on the other (see data-preprocessing).

data-ingestion:
    list-datasets:
      - name: "<dataset-1>"
        path: "<path-to-dataset-dir>"
        type: "<mixed/per_device>"
        load: no 
      - name: "<dataset-2>"
        path: "<path-to-dataset-dir>"
        type: "<mixed/per_device>"
        load: no

3. Data Preprocessing Section

Defines how datasets are used for training/testing.

required-data-format: Keep as raw (don’t change).
use-known-devices: If yes, testing only includes devices seen in training.
train-dataset: Dataset to use for training.
test-dataset: Dataset to use for testing.
threshold: Cutoff for detecting unknown devices (if score falls below this value, device is marked unknown).

data-preprocessing:
    required-data-format: raw
    use-known-devices: no 
    train-dataset: "<dataset-1>"
    test-dataset: "<dataset-2>"
    threshold: 0.7

Example configuration for when the train and test datasets are the different:

data-ingestion:
    list-datasets:
      - name: "Exp-1_iot"
        path: "dataset/Exp-1_iot_per_device"
        type: "per_device"
        load: no 
      - name: "Exp-1_iot_mobile"
        path: "dataset/Exp-1_iot_mobile_per_device"
        type: "per_device"
        load: no

data-preprocessing:
    required-data-format: raw
    use-known-devices: yes
    train-dataset: "Exp-1_iot"
    test-dataset: "Exp-1_iot_mobile"
    threshold: 0.7

Example configuration for when the train and test datasets are the same

data-ingestion:
    list-datasets:
      - name: "Exp-1_iot"
        path: "dataset/Exp-1_iot_per_device"
        type: "per_device"
        load: no

data-preprocessing:
    required-data-format: raw
    use-known-devices: yes
    train-dataset: "Exp-1_iot"
    test-dataset: "Exp-1_iot"
    threshold: 0.7

4. Model Training Section

Specifies how the framework should run training.

class-name: Training class name (must extend ModelTraining).
class-path: Path to the training file (inside framework/eval_modules/).
train-model:
- yes → train a new model.
- no → use an existing model.
paths:
- repo: Path to the original method’s codebase.
- eval-dir: Directory for storing processed data/reports.
- model-dir: Directory for saving trained models.
- device-file: File listing devices and MAC addresses (ground truth).

model-training:
    class-name: "<name-of-the-method>Training"
    class-path: "framework/eval_modules/<name-of-the-method>_training.py" 
    train-model: yes
    paths:
      repo: "<path-to-the-code-repository-to-evaluate>"
      eval-dir: "./evaluation/<name-of-the-method>"
      model-dir: "./evaluation/<name-of-the-method>/models" 
      device-file: "{}/devices.txt"

model-testing:
    class-name: "<name-of-the-method>Testing"
    class-path: "framework/eval_modules/<name-of-the-method>_testing.py"
    report-dir: "./evaluation/<name-of-the-method>/reports/"

Run the experiment

Enter the artifact directory.

cd artifact

Run the framework.

python main.py evaluation/<name-of-the-method>/config.yml

Reproduce Results

Reproduce the results from the paper by following these steps.

Report Format

If the evaluation ran without error:

the report will be generated in the evaluation/<method-name>/reports/ directory.

if it is unclear which report to check:

Open the latest log file.
The last line will contain the path to the file. Example:

2025-09-11 13:20:55,723 - root - DEBUG - [Model Testing] : Result report generated at path: ./evaluation/iot-sentinel/reports/iot-sentinel_Exp-1_iot_0.json

If the evaluation faced an error:

Check the log file in the logs/ directory.
Log files are timestamped — the most recent one corresponds to your last run.
The log will contain details about the error.

The top contains information about the experiment configurations, time of the experiment, and the list of devices used for the evaluation.

  "method": "geniotid",
  "timestamp": "09/09/2025, 16:00:35",
  "eval-config": {
    "required-data-format": "raw",
    "use-known-devices": false,
    "train-dataset": "Exp-1_iot",
    "test-dataset": "Exp-1_iot",
    "threshold": 0.7
  },
  "cross-validation": true,
  "devices": [
    "amazon_echo",
    "bosch_dishwasher",
    "calex_motion_sensor",
    "calex_slimme_led_vloerlamp",
    "chromecast_v3",
    "echo_dot",
    "google_home_mini",
    "google_nest_cam",
    "ilife_t10s",
    "nest_smokealarm",
    "philips_hue",
    "philips_luchtreiniger_600-serie",
    "philips_tv",
    "pni_safe_house_pg600lr",
    "princess_smart_aerofryer",
    "ring_doorbell",
    "salcar_radiatorthermostaat_trv801w",
    "samsung_fridge",
    "smartplug-hs110",
    "smartplug-kp105",
    "watersensor",
    "welltobe_pet_feeder"
  ]

The full-classification-report lists performance values for each device, followed by the overall performance values.
- When use-known-devices is set to no, the performance for identifying unknown devices is included as a separate entry in the report.

"full-classification-report": {
    "amazon_echo": {
      "precision": 0.5827123695976155,
      "recall": 0.8947368421052632,
      "f1-score": 0.7057761732851986,
      "support": 437.0
    },
    "bosch_dishwasher": {
      "precision": 0.9846153846153847,
      "recall": 0.8,
      "f1-score": 0.8827586206896552,
      "support": 80.0
    },
    .
    .
    .
    "weighted avg": {
      "precision": 0.8822505454629617,
      "recall": 0.8459188127455259,
      "f1-score": 0.8445809172986433,
      "support": 4582.0
    }
}

The predictions contains the list of all true values, predicted values, and their prediction probabilities.

"predictions": [
    [
      [
        "smartplug-kp105",
        "smartplug-kp105",
        0.8009739101768557
      ],
      [
        "smartplug-kp105",
        "amazon_echo",
        0.08665806288730057
      ],
      .
      .
      .
    ]
]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
artifact		artifact
claims		claims
.gitignore		.gitignore
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EvalIoT: Framework to Evaluate IoT Device Identification Methods.

Setup environment

Directory Structre

main.py

Dataset

Evaluation

Framework

Scripts

Getting started

Part 1: Training and Testing Setup

Part 2: Editing the Config File

1. General Section

2. Data Ingestion Section

3. Data Preprocessing Section

4. Model Training Section

Run the experiment

Reproduce Results

Report Format

About

Uh oh!

Releases

Packages

Languages

utwente-scs/evaliot

Folders and files

Latest commit

History

Repository files navigation

EvalIoT: Framework to Evaluate IoT Device Identification Methods.

Setup environment

Directory Structre

main.py

Dataset

Evaluation

Framework

Scripts

Getting started

Part 1: Training and Testing Setup

Part 2: Editing the Config File

1. General Section

2. Data Ingestion Section

3. Data Preprocessing Section

4. Model Training Section

Run the experiment

Reproduce Results

Report Format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages