Skip to content

cti-graph-reasoner/TITAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

42 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TITAN - Threat Intelligence Through Automated Navigation

A Typed & Interpretable Framework for Cyber Threat Intelligence Reasoning
Bridging MITRE ATT&CK, Knowledge Graphs, and Large Language Models

TITAN Framework


TITAN is a typed, bidirectional knowledge graph framework for Cyber Threat Intelligence (CTI) reasoning and question answering. It integrates data from the MITRE ATT&CK STIX bundles, builds a TITAN Ontology, generates reasoning (CoT) and non-reasoning (NoCoT) datasets, and provides an end-to-end pipeline for model training, evaluation, and graph execution.

🎬 Demos

TITAN with Chain of Thought (CoT)
TITAN with Chain of Thought (CoT)

No Chain of Thought (Example 1)
No Chain of Thought (Example 1)

No Chain of Thought (Example 2)
No Chain of Thought (Example 2)

No Chain of Thought (Example 1)
TITAN as a tool for a Cybersecurity Agent


Overview

TITAN implements the full pipeline described in the paper TITAN: Graph-Executable Reasoning for Cyber Threat Intelligence.
It comprises:

  1. Typed Graph Construction β€” builds a bidirectional knowledge graph from MITRE ATT&CK STIX data using the TITAN Ontology, where each edge is semantically typed (e.g., uses_attack_pattern, mitigates_attack_pattern).
  2. Dataset Generation β€” creates large-scale QA/navigation datasets in both CoT and NoCoT formats, with executable relational paths (<PATH>…</PATH>).
  3. Data Splitting β€” produces train/validation/test splits across CTI sections.
  4. Path-Planner Training β€” fine-tunes LLMs for path generation using LoRA adapters (Unsloth + TRL).
  5. Graph Execution β€” executes generated paths over the TITAN Graph to return grounded entities and interpretable reasoning traces.

Repository Structure

TITAN/
β”œβ”€ datasets/
β”‚  β”œβ”€ CoT/
β”‚  β”œβ”€ NoCoT/
β”‚  └─ create_dataset_splits.py          # split into train/val/test
β”œβ”€ utils/
β”‚  β”œβ”€ build_graph.py                    # STIX β†’ TITAN Ontology Graph (GraphML)
β”‚  β”œβ”€ build_dataset.py                  # Graph + YAML templates β†’ dataset JSON
β”‚  β”œβ”€ paraphrase.py                     # optional: generate target variations via LLM
β”‚  └─ useful_cot.yaml                   # question templates with <PATH>...</PATH> and target
β”œβ”€ graph_algorithm.py                   # deterministic path execution utilities
β”œβ”€ train_titan.py                       # LoRA SFT training (Unsloth + TRL)
β”œβ”€ test_titan.py                        # interactive tester for path planning & execution
β”œβ”€ modify_target.py                     # apply paraphrased targets to YAML/JSON
└─ README.md

Notes

  • paraphrase.py is optional and not used unless applied via modify_target.py.
  • Update the <img src="images/..."> path if your image file name differs.

Requirements

  • Python 3.9+
  • Local MITRE ATT&CK STIX JSON bundles (e.g., ../attack-stix-data/)
  • (Optional) GPU for LLM steps (paraphrase.py, training)

Installation

python -m venv .venv
source .venv/bin/activate           # Windows: .venv\Scripts\activate
pip install -U pip

pip install networkx pandas pyyaml tqdm scikit-learn
# For model training and testing:
pip install torch transformers accelerate datasets trl unsloth

1. Build the TITAN Graph

Script: utils/build_graph.py
Generates titan_graph.graphml (bidirectional, typed graph).

python utils/build_graph.py --base ../attack-stix-data --out titan_graph.graphml --log-file build_log.txt

The resulting graph follows the TITAN Ontology, distinguishing semantic directions (e.g., uses_attack_pattern ↔ used_by_intrusion_set) and ensuring all relations are mirrored with coherent inverse semantics.


2. Generate CoT / NoCoT Datasets

Script: utils/build_dataset.py
Inputs:

  • titan_graph.graphml
  • utils/useful_cot.yaml β€” templates with <PATH>...</PATH> and target

Outputs:

  • datasets/CoT/NAVIGATION_DATASET.json
  • datasets/CoT/NAVIGATION_QUESTION_PER_SECTION.json

Example:

python utils/build_dataset.py \
  --templates utils/useful_cot.yaml \
  --graph titan_graph.graphml \
  --out datasets/CoT/NAVIGATION_DATASET.json \
  --out datasets/CoT/NAVIGATION_QUESTION_PER_SECTION.json

Re-run for NoCoT using the corresponding output folder:

datasets/NoCoT/

(Optional) Convert JSON β†’ CSV

python - <<'PY'
import json, pandas as pd, os
inp="datasets/CoT/NAVIGATION_DATASET.json"; out="datasets/CoT/NAVIGATION_DATASET.csv"
data=json.load(open(inp,"r",encoding="utf-8"))
df=pd.DataFrame(data)
if "question" in df.columns: df=df.rename(columns={"question":"Question"})
os.makedirs(os.path.dirname(out), exist_ok=True)
df.to_csv(out, index=False, encoding="utf-8")
print("Saved", out)
PY

3. Enhance Targets with LLM (Optional)

You may refine the Objective/target terms using utils/paraphrase.py.
This creates target_variations.csv, which can be applied to YAML or JSON via modify_target.py.

Apply to YAML templates

python modify_target.py --csv target_variations.csv \
  --in utils/useful_cot.yaml --out utils/useful_cot.improved.yaml --pick first

Apply to dataset JSON

python modify_target.py --csv target_variations.csv \
  --in datasets/CoT/NAVIGATION_DATASET.json \
  --out datasets/CoT/NAVIGATION_DATASET.improved.json \
  --pick longest

4. Create Train/Val/Test Splits

Script: datasets/create_dataset_splits.py
Inputs:

  • CSV dataset (Question column required)
  • Section mapping JSON

Outputs:

datasets/CoT/COMPLETE/train_dataset.csv
datasets/CoT/COMPLETE/val_dataset.csv
datasets/CoT/COMPLETE/test_dataset.csv

Example:

python datasets/create_dataset_splits.py \
  --csv datasets/CoT/NAVIGATION_DATASET.csv \
  --json datasets/CoT/NAVIGATION_QUESTION_PER_SECTION.json \
  --out datasets/CoT/COMPLETE \
  --train 0.80 --val 0.05 --test 0.15 --seed 42

5. Train the Path-Planner (LoRA SFT)

Script: train_titan.py β€” fine-tunes an LLM (e.g., Phi-3.5, LLaMA, Qwen) using LoRA adapters.

Dataset directory structure:

TITAN_COMPLETE_DATASET/
  β”œβ”€ train_dataset.csv
  β”œβ”€ val_dataset.csv
  └─ test_dataset.csv

Example:

python train_titan.py \
  --data TITAN_COMPLETE_DATASET \
  --out MODELS/phi_titan \
  --model unsloth/Phi-3.5-mini-instruct \
  --lr 3e-4 --train-bsz 8 --eval-bsz 8 --grad-accum 2 \
  --epochs 8 --seq-len 2048 --seed 42

This script saves LoRA adapters and tokenizer into the --out directory.
Reduce --train-bsz or increase --grad-accum if GPU memory is insufficient.


6. Interactive Testing and Graph Execution

Script: test_titan.py
Loads the trained model, generates an executable <PATH>...</PATH> plan, and executes it over the TITAN Graph.

python test_titan.py \
  --model MODELS/phi_titan \
  --names NAMES.txt \
  --graph titan_graph.graphml \
  --rels Relationship_Descriptions.txt

Example query:

Which mitigations apply to techniques used by the Carberp malware?

The system generates a CoT reasoning trace, an executable path, and the final grounded entities.


Troubleshooting

  • Missing columns β€” rename question β†’ Question before splitting.
  • Unknown mappings β€” may be excluded or labeled as Unknown.
  • Small sections β€” the splitter balances small groups automatically.
  • GPU unavailable β€” training runs on CPU but will be slow.
  • CLI arguments not supported β€” set paths directly in scripts.

Quick CoT Pipeline Example

# 1. Build graph
python utils/build_graph.py --base ../attack-stix-data --out titan_graph.graphml

# 2. Build dataset
python utils/build_dataset.py \
  --templates utils/useful_cot.yaml \
  --graph titan_graph.graphml \
  --out datasets/CoT/NAVIGATION_DATASET.json \
  --out datasets/CoT/NAVIGATION_QUESTION_PER_SECTION.json

# 3. (Optional) Apply paraphrased targets
python modify_target.py --csv target_variations.csv \
  --in datasets/CoT/NAVIGATION_DATASET.json \
  --out datasets/CoT/NAVIGATION_DATASET.improved.json

# 4. Convert to CSV
python - <<'PY'
import json, pandas as pd, os
inp="datasets/CoT/NAVIGATION_DATASET.json"; out="datasets/CoT/NAVIGATION_DATASET.csv"
data=json.load(open(inp,"r",encoding="utf-8")); df=pd.DataFrame(data)
if "question" in df.columns: df=df.rename(columns={"question":"Question"})
os.makedirs(os.path.dirname(out), exist_ok=True); df.to_csv(out, index=False, encoding="utf-8")
print("Saved", out)
PY

# 5. Split
python datasets/create_dataset_splits.py \
  --csv datasets/CoT/NAVIGATION_DATASET.csv \
  --json datasets/CoT/NAVIGATION_QUESTION_PER_SECTION.json \
  --out datasets/CoT/COMPLETE --train 0.80 --val 0.05 --test 0.15

# 6. Train
python train_titan.py --data TITAN_COMPLETE_DATASET --out MODELS/phi_titan

# 7. Test
python test_titan.py --model MODELS/phi_titan --names NAMES.txt --graph titan_graph.graphml --rels Relationship_Descriptions.txt

About

A research prototype for navigating Cyber Threat Intelligence (CTI) graphs and answering questions through automated reasoning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages