TITAN - Threat Intelligence Through Automated Navigation

A Typed & Interpretable Framework for Cyber Threat Intelligence Reasoning
_{Bridging MITRE ATT&CK, Knowledge Graphs, and Large Language Models}

TITAN is a typed, bidirectional knowledge graph framework for Cyber Threat Intelligence (CTI) reasoning and question answering. It integrates data from the MITRE ATT&CK STIX bundles, builds a TITAN Ontology, generates reasoning (CoT) and non-reasoning (NoCoT) datasets, and provides an end-to-end pipeline for model training, evaluation, and graph execution.

🎬 Demos

_{TITAN with Chain of Thought (CoT)}

_{No Chain of Thought (Example 1)}

_{No Chain of Thought (Example 2)}

_{TITAN as a tool for a Cybersecurity Agent}

Overview

TITAN implements the full pipeline described in the paper TITAN: Graph-Executable Reasoning for Cyber Threat Intelligence.
It comprises:

Typed Graph Construction — builds a bidirectional knowledge graph from MITRE ATT&CK STIX data using the TITAN Ontology, where each edge is semantically typed (e.g., uses_attack_pattern, mitigates_attack_pattern).
Dataset Generation — creates large-scale QA/navigation datasets in both CoT and NoCoT formats, with executable relational paths (<PATH>…</PATH>).
Data Splitting — produces train/validation/test splits across CTI sections.
Path-Planner Training — fine-tunes LLMs for path generation using LoRA adapters (Unsloth + TRL).
Graph Execution — executes generated paths over the TITAN Graph to return grounded entities and interpretable reasoning traces.

Repository Structure

TITAN/
├─ datasets/
│  ├─ CoT/
│  ├─ NoCoT/
│  └─ create_dataset_splits.py          # split into train/val/test
├─ utils/
│  ├─ build_graph.py                    # STIX → TITAN Ontology Graph (GraphML)
│  ├─ build_dataset.py                  # Graph + YAML templates → dataset JSON
│  ├─ paraphrase.py                     # optional: generate target variations via LLM
│  └─ useful_cot.yaml                   # question templates with <PATH>...</PATH> and target
├─ graph_algorithm.py                   # deterministic path execution utilities
├─ train_titan.py                       # LoRA SFT training (Unsloth + TRL)
├─ test_titan.py                        # interactive tester for path planning & execution
├─ modify_target.py                     # apply paraphrased targets to YAML/JSON
└─ README.md

Notes

paraphrase.py is optional and not used unless applied via modify_target.py.

Update the <img src="images/..."> path if your image file name differs.

Requirements

Python 3.9+
Local MITRE ATT&CK STIX JSON bundles (e.g., ../attack-stix-data/)
(Optional) GPU for LLM steps (paraphrase.py, training)

Installation

python -m venv .venv
source .venv/bin/activate           # Windows: .venv\Scripts\activate
pip install -U pip

pip install networkx pandas pyyaml tqdm scikit-learn
# For model training and testing:
pip install torch transformers accelerate datasets trl unsloth

1. Build the TITAN Graph

Script: utils/build_graph.py
Generates titan_graph.graphml (bidirectional, typed graph).

python utils/build_graph.py --base ../attack-stix-data --out titan_graph.graphml --log-file build_log.txt

The resulting graph follows the TITAN Ontology, distinguishing semantic directions (e.g., uses_attack_pattern ↔ used_by_intrusion_set) and ensuring all relations are mirrored with coherent inverse semantics.

2. Generate CoT / NoCoT Datasets

Script: utils/build_dataset.py
Inputs:

titan_graph.graphml
utils/useful_cot.yaml — templates with <PATH>...</PATH> and target

Outputs:

datasets/CoT/NAVIGATION_DATASET.json
datasets/CoT/NAVIGATION_QUESTION_PER_SECTION.json

Example:

python utils/build_dataset.py \
  --templates utils/useful_cot.yaml \
  --graph titan_graph.graphml \
  --out datasets/CoT/NAVIGATION_DATASET.json \
  --out datasets/CoT/NAVIGATION_QUESTION_PER_SECTION.json

Re-run for NoCoT using the corresponding output folder:

datasets/NoCoT/

(Optional) Convert JSON → CSV

python - <<'PY'
import json, pandas as pd, os
inp="datasets/CoT/NAVIGATION_DATASET.json"; out="datasets/CoT/NAVIGATION_DATASET.csv"
data=json.load(open(inp,"r",encoding="utf-8"))
df=pd.DataFrame(data)
if "question" in df.columns: df=df.rename(columns={"question":"Question"})
os.makedirs(os.path.dirname(out), exist_ok=True)
df.to_csv(out, index=False, encoding="utf-8")
print("Saved", out)
PY

3. Enhance Targets with LLM (Optional)

You may refine the Objective/target terms using utils/paraphrase.py.
This creates target_variations.csv, which can be applied to YAML or JSON via modify_target.py.

Apply to YAML templates

python modify_target.py --csv target_variations.csv \
  --in utils/useful_cot.yaml --out utils/useful_cot.improved.yaml --pick first

Apply to dataset JSON

python modify_target.py --csv target_variations.csv \
  --in datasets/CoT/NAVIGATION_DATASET.json \
  --out datasets/CoT/NAVIGATION_DATASET.improved.json \
  --pick longest

4. Create Train/Val/Test Splits

Script: datasets/create_dataset_splits.py
Inputs:

CSV dataset (Question column required)
Section mapping JSON

Outputs:

datasets/CoT/COMPLETE/train_dataset.csv
datasets/CoT/COMPLETE/val_dataset.csv
datasets/CoT/COMPLETE/test_dataset.csv

Example:

python datasets/create_dataset_splits.py \
  --csv datasets/CoT/NAVIGATION_DATASET.csv \
  --json datasets/CoT/NAVIGATION_QUESTION_PER_SECTION.json \
  --out datasets/CoT/COMPLETE \
  --train 0.80 --val 0.05 --test 0.15 --seed 42

5. Train the Path-Planner (LoRA SFT)

Script: train_titan.py — fine-tunes an LLM (e.g., Phi-3.5, LLaMA, Qwen) using LoRA adapters.

Dataset directory structure:

TITAN_COMPLETE_DATASET/
  ├─ train_dataset.csv
  ├─ val_dataset.csv
  └─ test_dataset.csv

Example:

python train_titan.py \
  --data TITAN_COMPLETE_DATASET \
  --out MODELS/phi_titan \
  --model unsloth/Phi-3.5-mini-instruct \
  --lr 3e-4 --train-bsz 8 --eval-bsz 8 --grad-accum 2 \
  --epochs 8 --seq-len 2048 --seed 42

This script saves LoRA adapters and tokenizer into the --out directory.
Reduce --train-bsz or increase --grad-accum if GPU memory is insufficient.

6. Interactive Testing and Graph Execution

Script: test_titan.py
Loads the trained model, generates an executable <PATH>...</PATH> plan, and executes it over the TITAN Graph.

python test_titan.py \
  --model MODELS/phi_titan \
  --names NAMES.txt \
  --graph titan_graph.graphml \
  --rels Relationship_Descriptions.txt

Example query:

Which mitigations apply to techniques used by the Carberp malware?

The system generates a CoT reasoning trace, an executable path, and the final grounded entities.

Troubleshooting

Missing columns — rename question → Question before splitting.
Unknown mappings — may be excluded or labeled as Unknown.
Small sections — the splitter balances small groups automatically.
GPU unavailable — training runs on CPU but will be slow.
CLI arguments not supported — set paths directly in scripts.

Quick CoT Pipeline Example

# 1. Build graph
python utils/build_graph.py --base ../attack-stix-data --out titan_graph.graphml

# 2. Build dataset
python utils/build_dataset.py \
  --templates utils/useful_cot.yaml \
  --graph titan_graph.graphml \
  --out datasets/CoT/NAVIGATION_DATASET.json \
  --out datasets/CoT/NAVIGATION_QUESTION_PER_SECTION.json

# 3. (Optional) Apply paraphrased targets
python modify_target.py --csv target_variations.csv \
  --in datasets/CoT/NAVIGATION_DATASET.json \
  --out datasets/CoT/NAVIGATION_DATASET.improved.json

# 4. Convert to CSV
python - <<'PY'
import json, pandas as pd, os
inp="datasets/CoT/NAVIGATION_DATASET.json"; out="datasets/CoT/NAVIGATION_DATASET.csv"
data=json.load(open(inp,"r",encoding="utf-8")); df=pd.DataFrame(data)
if "question" in df.columns: df=df.rename(columns={"question":"Question"})
os.makedirs(os.path.dirname(out), exist_ok=True); df.to_csv(out, index=False, encoding="utf-8")
print("Saved", out)
PY

# 5. Split
python datasets/create_dataset_splits.py \
  --csv datasets/CoT/NAVIGATION_DATASET.csv \
  --json datasets/CoT/NAVIGATION_QUESTION_PER_SECTION.json \
  --out datasets/CoT/COMPLETE --train 0.80 --val 0.05 --test 0.15

# 6. Train
python train_titan.py --data TITAN_COMPLETE_DATASET --out MODELS/phi_titan

# 7. Test
python test_titan.py --model MODELS/phi_titan --names NAMES.txt --graph titan_graph.graphml --rels Relationship_Descriptions.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TITAN - Threat Intelligence Through Automated Navigation

🎬 Demos

Overview

Repository Structure

Requirements

Installation

1. Build the TITAN Graph

2. Generate CoT / NoCoT Datasets

(Optional) Convert JSON → CSV

3. Enhance Targets with LLM (Optional)

Apply to YAML templates

Apply to dataset JSON

4. Create Train/Val/Test Splits

5. Train the Path-Planner (LoRA SFT)

6. Interactive Testing and Graph Execution

Troubleshooting

Quick CoT Pipeline Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
datasets		datasets
images		images
utils		utils
README.md		README.md
graph_algorithm.py		graph_algorithm.py
modify_target.py		modify_target.py
stix_graph_correct.graphml		stix_graph_correct.graphml
test.py		test.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

TITAN - Threat Intelligence Through Automated Navigation

🎬 Demos

Overview

Repository Structure

Requirements

Installation

1. Build the TITAN Graph

2. Generate CoT / NoCoT Datasets

(Optional) Convert JSON → CSV

3. Enhance Targets with LLM (Optional)

Apply to YAML templates

Apply to dataset JSON

4. Create Train/Val/Test Splits

5. Train the Path-Planner (LoRA SFT)

6. Interactive Testing and Graph Execution

Troubleshooting

Quick CoT Pipeline Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages