Skip to content

ristov/llm-td

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-TD toolkit and data sets
============================

Introduction
------------
This repository provides an implementation of the LLM-TD algorithm and
Linux security syslog data sets used for evaluating LLM-TD with different
local LLMs. Note that LLM-TD relies on local LLMs which are used through 
Ollama framework, and you have to install Ollama before using LLM-TD.
The data sets are provided in the 'logs' directory and ground truth data 
can be found in the 'ground-truth' directory, while the 'results'
directory contains the templates detected by evaluated approaches.

Here is an example command line for invoking LLM-TD to detect templates
from event log logs/sshd.log with OpenChat LLM:

./llm-td.pl --model=openchat --logfile=logs/sshd.log --script=./llm-query.sh

To learn more about command line options of LLM-TD, execute:

./llm-td.pl --help

The scripts in the 'ground-truth' directory contain regular expressions for
each distinct event type in the event log, and also provide information
about the ground truth and the result produced by each evaluated approach
(LLM-TD was used with the default settings).

For example, to produce a summary about all event types in logs/sshd.log,
execute the following command line:

cat logs/sshd.log | ground-truth/sshd-results.pl

Information about ground truth and results from evaluated approaches are
provided as comments inside the ground truth script. For example, consider 
the following stanza in ground-truth/sshd-results.pl:

  # template: sshd[<*>]: Accepted <*> for <*> from <*> port <*> ssh2
  # openchat: yes (sshd[<*>]: Accepted <*> for <*> from <*> port <*> ssh2)
  # drain: no (two more specific templates detected instead of one)
  # mistral: yes (sshd[<*>]: Accepted <*> for <*> from <*> port <*> ssh2)
  # wizardlm2: no

In the above stanza, 'template' keyword provides the ground truth template,
whereas 'openchat', 'drain', 'mistral', and 'wizardlm2' keywords indicate
if the relevant approach managed to identify the ground truth template.
Note that 'yes' indicates a successful detection according to P1 and P2 
heuristic principles, and after 'yes' the identified template is provided
in brackets. Also, if the detection was not successful, relevant comments
may optionally follow.

LLM-TD has been primarily designed for syslog log files and it uses syslog
program names for recognizing templates in LLM responses. However, LLM-TD can 
be employed for analyzing non-syslog log files which contain free-form textual
messages. If the messages do not begin with the name of the logging program,
you can add a custom program name (e.g., AppName) to the beginning of each
log message, so that LLM-TD can use it for recognizing templates:

cat mylog | sed 's/^/AppName /' >test.log

After that, you can configure LLM-TD to pick up the custom program name
from the beginning of each log message:

./llm-td.pl --model=openchat --logfile=test.log --script=./llm-query.sh \
  --regexp='(?<line>(?<program>AppName).+)'


Academic attribution
--------------------
When you use the software or data sets from this repository for your research,
please provide a reference to the following paper in your publication:

Risto Vaarandi and Hayretdin Bahsi, 
"Using large language models for template detection from security event logs," 
International Journal of Information Security, vol. 24, article 104, 2025,
https://doi.org/10.1007/s10207-025-01018-y


Availability and licensing
--------------------------
This toolkit is available from https://github.com/ristov/llm-td, 
and is distributed under the terms of GNU General Public License version 2. 
The data sets in the 'logs' directory are distributed under the terms of
Creative Commons Attribution 4.0 International License.


Author
------
Risto Vaarandi (firstname d0t lastname at gmail d0t c0m)

About

llm-td toolkit and data sets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages