A structured multi-agent framework for extracting rich, contextual narratives from public event images
Paper | arXiv | Code | Dataset | Website
📊 Presentation Slides | 🎥 Video | 📄 Poster
GETReason is a novel framework that goes beyond surface-level image descriptions to infer deeper contextual meaning from publicly significant event images. Our approach uses a hierarchical multi-agent reasoning system to extract geospatial, temporal, and event-specific information, enabling comprehensive understanding of visual narratives.
- 🔍 Multi-Agent Architecture: Specialized agents for geospatial, temporal, and event reasoning
- 🔄 Cross-Generation: Collaborative validation between agents for enhanced accuracy
- 📊 GREAT Metric: Novel evaluation metric for reasoning quality assessment
- 🎯 Event Understanding: Focus on sociopolitical significance rather than just visual content
- 📈 Robust Performance: Substantial improvements over existing captioning and reasoning baselines
- 🌍 Geospatial Agent: Infers location, country, city, and geographic context
- ⏰ Temporal Agent: Extracts dates, periods, and historical context
- 🎭 Event Agent: Identifies events, political significance, and sociopolitical context
Our framework demonstrates significant improvements:
- Enhanced Accuracy: Better geospatial and temporal inference
- Reduced Hallucinations: Structured approach minimizes misleading information
- Improved Generalization: Robust performance across diverse event types
- Contextual Understanding: Deeper insights into event significance
getreason/
├── README.md # This file - Project overview
├── code/ # Implementation and experiments
│ ├── README.md # Detailed setup and usage instructions
│ ├── gpt_workbench.ipynb # GPT-4o-mini experiments
│ ├── gemini_workbench.ipynb # Gemini experiments
│ ├── dataset/ # Augmented datasets
│ │ ├── gold_tara.jsonl # TARA dataset (11,240 samples)
│ │ └── gold_wikitilo.jsonl # WikiTilo dataset (6,296 samples)
│ └── assets/ # Prompts, schemas, and data
└── Paper/ # Research paper implementation
└── GETReason/ # Single image workflow
For detailed setup and usage instructions, see the code directory:
# Clone the repository
git clone https://github.com/coral-lab-asu/getreason.git
cd getreason
# Navigate to code directory for implementation
cd code
# Follow the detailed setup instructions in code/README.mdWe provide two augmented datasets with comprehensive annotations:
- Size: 11,240 samples
- Content: Rich event information with reasoning
- Coverage: News events from 2010-2021
- Size: 6,296 samples
- Content: Temporal and geospatial information
- Coverage: Historical events from 1826-2021
GETReason addresses critical challenges in:
- 📰 Journalism: Automated event understanding for news analysis
- 📚 Education: Historical context extraction for educational content
- 🏛️ Archival Analysis: Systematic organization of event imagery
- 🔍 Fact-Checking: Reliable extraction of contextual information
If you use this work in your research, please cite:
@inproceedings{siingh-etal-2025-getreason,
title = "{GETR}eason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning",
author = "Siingh, Shikhhar and
Rawat, Abhinav and
Baral, Chitta and
Gupta, Vivek",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.1439/",
doi = "10.18653/v1/2025.acl-long.1439",
pages = "29779--29800"
}- Shikhhar Siingh - Arizona State University
- Abhinav Rawat - Arizona State University
- Chitta Baral - Arizona State University
- Vivek Gupta - Arizona State University
For questions, issues, or collaboration inquiries:
- Shikhhar Siingh: ssiingh@asu.edu
- Abhinav Rawat: arawat21@asu.edu
This project is licensed under the MIT License - see the LICENSE file for details.
