GliDe - Glitch Detection in Gameplay Videos

A LangGraph-based multimodal LLM pipeline for automated video game glitch detection.

Architecture

GliDe processes a video through five sequential stages:

Preprocess — Extracts frames at a fixed FPS (default 4 fps) and stitches them into windows (default 8 frames per window) for downstream processing.
Scanner — Runs a fast initial screening over every window to produce a glitch hypothesis (has_glitch, category, confidence) and a game_context description used as a RAG-like knowledge base by later stages.
Analyzer — For windows flagged by the Scanner, runs an iterative investigation loop: a Planner selects the next tool, an Executor runs it, and a Reflector evaluates the result via an adversarial debate between an Advocate (game test engineer, argues for glitch), a Skeptic (game designer, argues for normal behavior), and a Judge (tech lead, makes the ruling).
Grounder — Clusters analysis results across windows, merges adjacent occurrences of the same glitch, and performs bidirectional temporal boundary refinement.
Summarizer — Converts grounded glitch records into the final report, translating frame indices to timestamps and using an LLM to produce clean, coherent descriptions.

Tools

Tool	Status	Description
`vqa`	Active	Visual QA on the full stitched window image via MLLM
`zoom_in`	Active	Crop and magnify a region of interest, then run VQA
`object_tracking`	Optional	Frame-by-frame SAM3 tracking + automatic physics analysis (requires SAM3 installation)

object_tracking is lazily initialized. SAM3 is only loaded on the first call, and the tool disables itself gracefully if SAM3 is not installed.

Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Run with a local vLLM server

# Start vLLM first:
# vllm serve Qwen/Qwen2.5-VL-7B-Instruct --port 8000

python run.py --video data/videos/video_name.mp4

3. Run with OpenAI

python run.py \
    --video data/videos/video_name.mp4 \
    --api-key $OPENAI_API_KEY \
    --api-base https://api.openai.com/v1 \
    --model gpt-4o \
    --game-name "GTA V"

4. Batch processing

Process all videos in a folder. Per-video reports and logs are written as usual; a consolidated batch_report.json is also saved.

python run.py \
    --video-dir data/videos/ \
    --game-name "GTA V" \
    --api-key $OPENAI_API_KEY \
    --api-base https://api.openai.com/v1 \
    --model gpt-4o

Output

Single video

The report is saved to {output_dir}/results/{video_name}_report.json:

{
  "video_name": "haj831",
  "game_name": "GTA V",
  "no_bugs": false,
  "bugs": [
    "A red sports car is floating above the road surface near the highway overpass, with no visible support or propulsion."
  ],
  "time_nodes": [
    [[12, 15], [23, 24]]
  ]
}

time_nodes[i] is a list of [start_sec, end_sec] intervals for bug i.

Batch

A consolidated report is saved to {output_dir}/results/batch_report.json as a JSON array of per-video reports:

[
  {
    "video_name": "clip_01",
    "game_name": "GTA V",
    "no_bugs": false,
    "bugs": ["..."],
    "time_nodes": [[[12, 15]]]
  },
  {
    "video_name": "clip_02",
    "game_name": "GTA V",
    "no_bugs": true,
    "bugs": [],
    "time_nodes": []
  }
]

LangGraph Flow

GliDe uses LangGraph's StateGraph to wire the pipeline together. Each stage is a node that reads from and writes to a shared BugAgentState TypedDict. State is passed immutably between nodes — each node returns only the keys it updates.

The edge from scanner_node is conditional: if no glitches were found, the graph skips directly to summarizer_node, avoiding unnecessary analyzer and grounder calls.

preprocess_node → scanner_node
                       │
                       ├── (has glitches) ──► analyzer_node ──► grounder_node ──► summarizer_node
                       │
                       └── (no glitches) ────────────────────────────────────► summarizer_node

Configuration Reference

from config import BugAgentConfig

cfg = BugAgentConfig(
    output_dir="data",
    verbose=True,
    save_intermediate=True,   # saves scan/analysis/grounded JSONs to data/intermediate/
)

cfg.llm.api_key    = "EMPTY"
cfg.llm.api_base   = "http://localhost:8000/v1"
cfg.llm.model      = "Qwen/Qwen2.5-VL-7B-Instruct"
cfg.llm.temperature = 0.3
cfg.llm.max_tokens  = 1024
cfg.llm.timeout     = 120

cfg.preprocess.target_fps    = 4.0   # frames/sec to extract
cfg.preprocess.window_size   = 8     # frames per stitched window
cfg.preprocess.window_overlap = 0

cfg.scanner.temperature = 0.3
cfg.scanner.max_tokens  = 512

cfg.analyzer.max_iterations      = 5     # max Planner→Executor→Reflector cycles
cfg.analyzer.confidence_threshold = 0.70 # stop when Judge reaches this confidence

cfg.grounder.frames_per_window = 8  # must match preprocess.window_size

cfg.summarizer.fps = 4.0   # must match preprocess.target_fps

Evaluation

Evaluation compares a batch_report.json against a ground truth file using LLM-based description scoring (0–5) and temporal IoU, then reports precision, recall, and F1 in both raw and IoU-weighted forms.

1. Start a scoring LLM

Any OpenAI-compatible server works. With a local vLLM:

CUDA_VISIBLE_DEVICES=2 vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8001 --max-model-len 8192

2. Run evaluation

python evaluation/run.py --predictions data/results/batch_report.json --groundtruth groundtruth.json --api-base http://localhost:8001/v1 --model meta-llama/Llama-3.1-8B-Instruct --output data/results/eval.json

--output is optional; if provided, per-video scores and match details are saved to the specified JSON file.

Metrics

Metric	Description
`mean_score`	Average LLM description quality score (0–5) over matched pairs
`mean_iou`	Average temporal IoU over matched pairs
`precision / recall / f1`	Score-weighted detection metrics (max score = 5)
`precision_iou / recall_iou / f1_iou`	Same metrics further weighted by temporal IoU

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
analyzer		analyzer
evaluation		evaluation
figure		figure
grounder		grounder
llm		llm
preprocess		preprocess
scanner		scanner
summarizer		summarizer
.gitignore		.gitignore
README.md		README.md
config.py		config.py
graph.py		graph.py
logger.py		logger.py
requirements.txt		requirements.txt
run.py		run.py
state.py		state.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GliDe - Glitch Detection in Gameplay Videos

Architecture

Tools

Quick Start

1. Install dependencies

2. Run with a local vLLM server

3. Run with OpenAI

4. Batch processing

Output

Single video

Batch

LangGraph Flow

Configuration Reference

Evaluation

1. Start a scoring LLM

2. Run evaluation

Metrics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GliDe - Glitch Detection in Gameplay Videos

Architecture

Tools

Quick Start

1. Install dependencies

2. Run with a local vLLM server

3. Run with OpenAI

4. Batch processing

Output

Single video

Batch

LangGraph Flow

Configuration Reference

Evaluation

1. Start a scoring LLM

2. Run evaluation

Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages