Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning

Project Overview

Main Objective

Design and develop a unified framework to detect both masquerade attacks in the CAN bus that can be deployed on edge devices. We aim to model CAN message streams as CAN graph messages, embedding both node and edge attributes. From this, we can train a model using normal data to detect attacks in the test dataset.

Key Questions & Hypothesis

Question: Can we detect masquerade attacks in the CAN bus using a graph ML?
Hypothesis:
1. CAN messages graphs (embedding both node/edge attributes) may characterize normal/attack conditions in CAN.
2. Fabrication/masquerade attacks scenarios can be framed as graph classification framework.

Workflow

Phase 1 - Literature Review
- Familiarization with CAN protocol, in-vehicle security, CAN IDS, and Graph ML.
- Experimentation with the ROAD dataset.
Phase 2 - Building & Annotating CAN Message Graphs
- Parse CAN data to build CAN message graphs.
- Partition CAN streaming data into temporal CAN message graphs.
- Embed attributes from CAN signals into nodes and frequency of messages into edges.
Phase 3 - Experiments with GNN for Graph Classification
- Build training datasets based on CAN graph messages.
- Train the Graph ML framework for IDS in CAN graph messages.
- Test the trained GNN framework on attack captures and compute classification metrics.

Dataset:

ROAD Dataset: This dataset contains multiple hours of recorded data, with 3 hours for training and 30 minutes for testing, covering various driving scenarios. It includes five masquerade attacks: correlated signal, max engine, max speedometer, reverse light off, and reverse light on attacks. For more details, refer to the ROAD dataset paper. Here is a detailed tree structure for the dataset used in this project:

road/
├── ambient
├── attacks
├── data_table.csv
├── readme.md
└── signal_extractions
├── ambient
│ ├── ambient_dyno_drive_basic_long.csv
│ ├── ambient_dyno_drive_basic_short.csv
│ ├── ambient_dyno_drive_benign_anomaly.csv
│ ├── ambient_dyno_drive_extended_long.csv
│ ├── ambient_dyno_drive_extended_short.csv
│ ├── ambient_dyno_drive_radio_infotainment.csv
│ ├── ambient_dyno_drive_winter.csv
│ ├── ambient_dyno_exercise_all_bits.csv
│ ├── ambient_dyno_idle_radio_infotainment.csv
│ ├── ambient_dyno_reverse.csv
│ ├── ambient_highway_street_driving_diagnostics.csv
│ ├── ambient_highway_street_driving_long.csv
├── attacks
│ ├── accelerator_attack_drive_1.csv
│ ├── accelerator_attack_drive_2.csv
│ ├── accelerator_attack_reverse_1.csv
│ ├── accelerator_attack_reverse_2.csv
│ ├── correlated_signal_attack_1_masquerade.csv
│ ├── correlated_signal_attack_2_masquerade.csv
│ ├── correlated_signal_attack_3_masquerade.csv
│ ├── max_engine_coolant_temp_attack_masquerade.csv
│ ├── max_speedometer_attack_1_masquerade.csv
│ ├── max_speedometer_attack_2_masquerade.csv
│ ├── max_speedometer_attack_3_masquerade.csv
│ ├── metadata.json
│ ├── reverse_light_off_attack_1_masquerade.csv
│ ├── reverse_light_off_attack_2_masquerade.csv
│ ├── reverse_light_off_attack_3_masquerade.csv
│ ├── reverse_light_on_attack_1_masquerade.csv
│ ├── reverse_light_on_attack_2_masquerade.csv
│ ├── reverse_light_on_attack_3_masquerade.csv
└── DBC

Code Structure

Python Scripts:

ambient_dyno_drive_basic_long.py:
- Script to process a specific benign CAN log file ambient_dyno_drive_basic_long.log.
data_processing.py:
- Functions for reading and preprocessing the CAN logs.
- Functions to partition the CAN data into temporal graphs.
embedding_generation.py:
- Functions to generate node embeddings using Node2Vec.
- Functions to generate edge embeddings.
imports.py:
- Script to import necessary modules and set up the environment for other scripts.
main_script.py:
- Main script to run the core experiments and analysis on CAN log data.
main_script_after_integration_timeseries_signals.py:
- Script to process and analyze CAN log data after integrating time-series signals.
new_experiment_samples.py:
- Functions for analyzing CAN log data using various combinations of window sizes and offsets based on the number of samples.
- The goal is to understand how different sample-based window sizes and offsets affect the detection and analysis of CAN network activities, both for benign and attack scenarios.
- This experiment aims to optimize the parameters for better anomaly detection and network activity characterization.
process_all_correlated_masquerade_attack_files.py:
- Script to process all correlated masquerade attack files.
process_all_files.py:
- Script to process all benign files.
process_all_max_speedometer_attack_files.py:
- Script to process all max speedometer masquerade attack files.
process_all_reverse_light_off_attack_files.py:
- Script to process all reverse light off masquerade attack files.
process_all_reverse_light_on_attack_files.py:
- Script to process all reverse light on masquerade attack files.
process_max_engine_coolant_temp_attack_masquerade_file.py:
- Script to process max engine coolant temperature masquerade attack files.
requirements.txt:
- File containing a list of all the required packages along with their specific versions.
run_experiments.py:
- Script to run various experiments such as process_all_files_script and process_all_attacks_script at a go on the CAN log data.

Installation of Dependencies

To ensure that you have all the necessary Python packages and dependencies for this project, you can use the requirements.txt file provided in the repository. This file contains a list of all the required packages along with their specific versions.

Installing Dependencies

Using Conda: If you are using Conda, you can create a new environment and install the dependencies from the requirements.txt file by running the following commands:
```
conda create --name your_env_name python=3.8
conda activate your_env_name
pip install -r requirements.txt
```

Getting Started

Running the Code (the window sizes and offsets can be altered depending on overlaps and time slices)

Benign Files:

To run only the first benign file, which is ambient_dyno_drive_basic_long.log:

python ambient_dyno_drive_basic_long.py --window-size 10 --offset 10 --pkl-folder road_ambient_dyno_drive_basic_long_050305_002000 "C:\Users\willi\OneDrive\Desktop\Research\oak_ridge_in_vehicle\road\ambient\ambient_dyno_drive_basic_long.log"

To run all benign files:

python process_all_files.py --window-size 4 --offset 4

Attack Files:

To process all correlated masquerade attack files:

python process_all_correlated_masquerade_attack_files.py --window-size 4 --offset 4

To process all reverse light off masquerade attack files:

python process_all_reverse_light_off_attack_files.py --window-size 4 --offset 4

To process all reverse light on masquerade attack files:

python process_all_reverse_light_on_attack_files.py --window-size 4 --offset 4

To process all max engine coolant temperature masquerade attack files:

python process_max_engine_coolant_temp_attack_masquerade_file.py --window-size 4 --offset 4

To process all max speedometer masquerade attack files:

python process_all_max_speedometer_attack_files.py --window-size 4 --offset 4

New Experiment: Sample-Based Analysis

This new set of experiments focuses on analyzing CAN log data using various combinations of window sizes and offsets based on the number of samples, rather than time. The goal is to understand how different sample-based window sizes and offsets affect the detection and analysis of CAN network activities, both for benign and attack scenarios. This experiment aims to optimize the parameters for better anomaly detection and network activity characterization.

Running the New Experiment:

To process new experiment samples for attack scenarios:

python new_experiment_samples.py --window-size 50 --offset 50

To process new experiment samples for benign scenarios:

python new_experiment_samples.py --window-size 50 --offset 50 --attack-type benign

Acknowledgement

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for U.S. Government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research was sponsored in part by Oak Ridge National Laboratory’s (ORNL’s) Laboratory Directed Research and Development program through the Sustainable Research Pathways (SRP) program and by the DOE. This research is also partially funded by US Department of Energy Award No. DE-FE0032089. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of this manuscript.

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
Codes		Codes
Data		Data
Figures		Figures
Notebooks		Notebooks
License.md		License.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning

Project Overview

Main Objective

Key Questions & Hypothesis

Workflow

Dataset:

Code Structure

Python Scripts:

Installation of Dependencies

Installing Dependencies

Getting Started

Running the Code (the window sizes and offsets can be altered depending on overlaps and time slices)

Benign Files:

Attack Files:

New Experiment: Sample-Based Analysis

Running the New Experiment:

Acknowledgement

About

Releases

Packages

Languages

License

billmj/GraphML-CONTROLLER-AREA-NETWORK-Attack-Detection

Folders and files

Latest commit

History

Repository files navigation

Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning

Project Overview

Main Objective

Key Questions & Hypothesis

Workflow

Dataset:

Code Structure

Python Scripts:

Installation of Dependencies

Installing Dependencies

Getting Started

Running the Code (the window sizes and offsets can be altered depending on overlaps and time slices)

Benign Files:

Attack Files:

New Experiment: Sample-Based Analysis

Running the New Experiment:

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages