Skip to content

My Bachelor Thesis about using LLMs in Steganography and assessing their security claims with addiotional experiments

Notifications You must be signed in to change notification settings

xsenyaaax/SteganographyWithLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Steganography in text generated by autoregressive models

About

This repository includes source code and additional files for my Bachelor Thesis on CTU FIT.

My thesis expands the experiments of known steganographic approaches that uses LLMs to hide messages. Specifically, there is an interest on how different samplers/temperature or other factors affect security of those algorithms.

This is why I created a pluggable app with unified interface for steganoraphic algorithms/LLM/samplers. I can easily swap out different LLMs/samplers/steganographic algorithms and then for example generate datasets to assess security using machine learning models

This app, specifically steganographic algorithms, builds upon source code of Arithmetic Coding, Meteor and Discop. Each algorithm is using GPT-2 publicly available from Hugging Face library.

Also, during my experiments I created different datasets of random sampling or stegotext (generated sequences with hidden messages). These are located in folder experiments. Each subfolder have specific datasets used in that particular experiment. For example experiment1: Distinguishing Text by Sampling Methods has random sampled sequences for different samplers (Top P, Top K etc.). Further information about each experiment could be read in my thesis in Measuring the Security section

Datasets

Each dataset name follows a same name convention: {llm_model_name}-{omitten tokens indices}-{sampler}-{sampler-parameter}-{steganographic algorithm}-temp-{temperature parameter}

There will be two types of files with that name: csv and txt.

  • CSV file includes columns: algorithm name and individual tokens (in this case token1...token50).
  • TXT file include decoded tokens (generated text) for each of the row in CSV file

Additional notes:

  • If no steganographic algorithm is used, that part of the name is omitted entirely.
  • If no tokens are omitted, the placeholder [] is used to represent an empty list.

Start app

You need to be in steganography_llm directory. Create python virtual environment: python -m venv .venv. Activate it: source .venv/bin/activate. Then:

  1. pip install -r requirements.txt
  2. Modify the necessary parameters in the end of the file and then run:
    • python scripts/generate_random_sampling_datasets_script.py to generate random sampling datasets (no hidden messages) with different parameters
    • python scripts/generate_steganography_datasets_script.py to generate stego message (with random hidden messages) with different parameters
    • python scripts/train_models_script.py to train classification models on pairs of datasets of random sampling a stegodatasets
    • if you want to try the app itself with different messages and see encoded result: python app.py

Source Code Structure

.
├── abstract_classes  # abstract class definiition for unified interface
├── plugins # Pluggable components
│   ├── models # different LLMs (e.g. GPT-2, Llama)
│   ├── samplers # different samplers (Top P, Top K etc.)
│   └── steganography_algorithms # currently only 3 (Arithmetic, Meteor, Discop)
├── app.py # interface for encoding and decoding with different parameters
├── config.py # Configuration settings
├── scripts
│         ├── generate_random_sampling_datasets_script.py # Script to create different random sampling datasets with 1 command
│         ├── generate_steganography_datasets_script.py # Script to create different steganographic datasets with 1 command
│         └── train_models_script.py # Script to train Random Forest, GBDT on those created datasets
└── utils.py # Helper functions

About

My Bachelor Thesis about using LLMs in Steganography and assessing their security claims with addiotional experiments

Topics

Resources

Stars

Watchers

Forks