Skip to content

SandyyyZheng/BugAgent

Repository files navigation

GliDe - Glitch Detection in Gameplay Videos

A LangGraph-based multimodal LLM pipeline for automated video game glitch detection.


Architecture

Framework

GliDe processes a video through five sequential stages:

  • Preprocess — Extracts frames at a fixed FPS (default 4 fps) and stitches them into windows (default 8 frames per window) for downstream processing.
  • Scanner — Runs a fast initial screening over every window to produce a glitch hypothesis (has_glitch, category, confidence) and a game_context description used as a RAG-like knowledge base by later stages.
  • Analyzer — For windows flagged by the Scanner, runs an iterative investigation loop: a Planner selects the next tool, an Executor runs it, and a Reflector evaluates the result via an adversarial debate between an Advocate (game test engineer, argues for glitch), a Skeptic (game designer, argues for normal behavior), and a Judge (tech lead, makes the ruling).
  • Grounder — Clusters analysis results across windows, merges adjacent occurrences of the same glitch, and performs bidirectional temporal boundary refinement.
  • Summarizer — Converts grounded glitch records into the final report, translating frame indices to timestamps and using an LLM to produce clean, coherent descriptions.

Tools

Tool Status Description
vqa Active Visual QA on the full stitched window image via MLLM
zoom_in Active Crop and magnify a region of interest, then run VQA
object_tracking Optional Frame-by-frame SAM3 tracking + automatic physics analysis (requires SAM3 installation)

object_tracking is lazily initialized. SAM3 is only loaded on the first call, and the tool disables itself gracefully if SAM3 is not installed.


Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Run with a local vLLM server

# Start vLLM first:
# vllm serve Qwen/Qwen2.5-VL-7B-Instruct --port 8000

python run.py --video data/videos/video_name.mp4

3. Run with OpenAI

python run.py \
    --video data/videos/video_name.mp4 \
    --api-key $OPENAI_API_KEY \
    --api-base https://api.openai.com/v1 \
    --model gpt-4o \
    --game-name "GTA V"

4. Batch processing

Process all videos in a folder. Per-video reports and logs are written as usual; a consolidated batch_report.json is also saved.

python run.py \
    --video-dir data/videos/ \
    --game-name "GTA V" \
    --api-key $OPENAI_API_KEY \
    --api-base https://api.openai.com/v1 \
    --model gpt-4o

Output

Single video

The report is saved to {output_dir}/results/{video_name}_report.json:

{
  "video_name": "haj831",
  "game_name": "GTA V",
  "no_bugs": false,
  "bugs": [
    "A red sports car is floating above the road surface near the highway overpass, with no visible support or propulsion."
  ],
  "time_nodes": [
    [[12, 15], [23, 24]]
  ]
}

time_nodes[i] is a list of [start_sec, end_sec] intervals for bug i.

Batch

A consolidated report is saved to {output_dir}/results/batch_report.json as a JSON array of per-video reports:

[
  {
    "video_name": "clip_01",
    "game_name": "GTA V",
    "no_bugs": false,
    "bugs": ["..."],
    "time_nodes": [[[12, 15]]]
  },
  {
    "video_name": "clip_02",
    "game_name": "GTA V",
    "no_bugs": true,
    "bugs": [],
    "time_nodes": []
  }
]

LangGraph Flow

GliDe uses LangGraph's StateGraph to wire the pipeline together. Each stage is a node that reads from and writes to a shared BugAgentState TypedDict. State is passed immutably between nodes — each node returns only the keys it updates.

The edge from scanner_node is conditional: if no glitches were found, the graph skips directly to summarizer_node, avoiding unnecessary analyzer and grounder calls.

preprocess_node → scanner_node
                       │
                       ├── (has glitches) ──► analyzer_node ──► grounder_node ──► summarizer_node
                       │
                       └── (no glitches) ────────────────────────────────────► summarizer_node

Configuration Reference

from config import BugAgentConfig

cfg = BugAgentConfig(
    output_dir="data",
    verbose=True,
    save_intermediate=True,   # saves scan/analysis/grounded JSONs to data/intermediate/
)

cfg.llm.api_key    = "EMPTY"
cfg.llm.api_base   = "http://localhost:8000/v1"
cfg.llm.model      = "Qwen/Qwen2.5-VL-7B-Instruct"
cfg.llm.temperature = 0.3
cfg.llm.max_tokens  = 1024
cfg.llm.timeout     = 120

cfg.preprocess.target_fps    = 4.0   # frames/sec to extract
cfg.preprocess.window_size   = 8     # frames per stitched window
cfg.preprocess.window_overlap = 0

cfg.scanner.temperature = 0.3
cfg.scanner.max_tokens  = 512

cfg.analyzer.max_iterations      = 5     # max Planner→Executor→Reflector cycles
cfg.analyzer.confidence_threshold = 0.70 # stop when Judge reaches this confidence

cfg.grounder.frames_per_window = 8  # must match preprocess.window_size

cfg.summarizer.fps = 4.0   # must match preprocess.target_fps

Evaluation

Evaluation compares a batch_report.json against a ground truth file using LLM-based description scoring (0–5) and temporal IoU, then reports precision, recall, and F1 in both raw and IoU-weighted forms.

1. Start a scoring LLM

Any OpenAI-compatible server works. With a local vLLM:

CUDA_VISIBLE_DEVICES=2 vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8001 --max-model-len 8192

2. Run evaluation

python evaluation/run.py --predictions data/results/batch_report.json --groundtruth groundtruth.json --api-base http://localhost:8001/v1 --model meta-llama/Llama-3.1-8B-Instruct --output data/results/eval.json

--output is optional; if provided, per-video scores and match details are saved to the specified JSON file.

Metrics

Metric Description
mean_score Average LLM description quality score (0–5) over matched pairs
mean_iou Average temporal IoU over matched pairs
precision / recall / f1 Score-weighted detection metrics (max score = 5)
precision_iou / recall_iou / f1_iou Same metrics further weighted by temporal IoU

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages