MovieCORE is a comprehensive video question answering (VQA) dataset specifically designed to evaluate and probe deeper cognitive understanding of movie content. Unlike traditional VQA datasets that focus on surface-level visual understanding, MovieCORE challenges models to demonstrate sophisticated reasoning about narrative structures, character development, thematic elements, and complex temporal relationships within cinematic content.
The MovieCORE dataset builds upon video content from MovieChat. To get started:
Download the video files from MovieChat's HuggingFace repositories:
- Training Data: MovieChat-1K Train
- Test Data: MovieChat-1K Test
Access our annotations on HuggingFace:
- MovieCORE Annotations: 🤗 HuggingFace Dataset
Extract and organize the data according to your model's requirements, then use our annotations for evaluation.
git clone https://github.com/joslefaure/MovieCORE.git
cd MovieCORE- We have provided the script to run HERMES (ICCV'25) on MovieCORE. Please check out the linked project.
MovieCORE employs a comprehensive multi-dimensional evaluation framework to assess model performance across different aspects of cognitive understanding:
| Dimension | Description |
|---|---|
| 🎯 Accuracy | Measures semantic similarity between predicted and ground truth answers |
| 📋 Comprehensiveness | Assesses coverage of all key aspects mentioned in the ground truth |
| 🧠 Depth | Evaluates level of reasoning and insight demonstrated in predictions |
| 🔍 Evidence | Checks quality and relevance of supporting evidence provided |
| 🔗 Coherence | Measures logical flow, organization, and clarity of responses |
Each dimension provides unique insights into different cognitive capabilities required for deep video understanding.
Evaluate your model's performance on MovieCORE using our evaluation script:
export OPENAI_API_KEY='your_openai_api_key'
python evaluate_moviecore.py --pred_path path/to/your/predictions.jsonYour predictions should follow this JSON structure:
{
"video_1.mp4": [
{
"question": "How does the video depict the unique adaptations of the species in the Sahara Desert, and what roles do these species play in their ecosystem?",
"answer": "The ground truth answer.",
"pred": "Your model's prediction.",
"classification": "the question classification"
},
{
"question": "The second question for video 1?",
"answer": "The ground truth answer.",
"pred": "Your model's prediction.",
"classification": "the question classification"
}
],
"video_2.mp4": [
{
"question": "The only question for video 2",
"answer": "The ground truth answer.",
"pred": "Your model's prediction.",
"classification": "the question classification"
}
]
}The evaluation script provides:
- Overall scores across all dimensions
- Classification-specific performance metrics
- Detailed breakdowns for comprehensive analysis
If you use MovieCORE in your research, please cite our paper:
@misc{faure2025moviecorecognitivereasoningmovies,
title={MovieCORE: COgnitive REasoning in Movies},
author={Gueter Josmy Faure and Min-Hung Chen and Jia-Fong Yeh and Ying Cheng and Hung-Ting Su and Yung-Hao Tang and Shang-Hong Lai and Winston H. Hsu},
year={2025},
eprint={2508.19026},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.19026},
}We welcome contributions to MovieCORE! Please feel free to:
- Report issues or bugs
- Suggest improvements or new features
- Submit baseline implementations
- Provide feedback on the evaluation framework
This dataset is provided under the MIT License. See LICENSE for more details.
