End-to-End Multimedia RAG Framework (Retrieval, SFS, QA, and Meta-Aggregation) by aravind-3105 · Pull Request #18 · VectorInstitute/interpretability-llms-agents

aravind-3105 · 2026-03-02T04:28:05Z

Summary

This pull request introduces a reference implementation of a multimedia Retrieval-Augmented Generation (RAG) pipeline for long-form video understanding. It also adds structured environment management and dataset preprocessing utilities to support reproducible experimentation.

The implementation integrates multimodal retrieval (ImageBind) with multimodal reasoning (Qwen Omni), enabling segment-level audiovisual retrieval and QA over temporally segmented video corpora.

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📝 Documentation update
🔧 Refactoring (no functional changes)
⚡ Performance improvement
🧪 Test improvements
🔒 Security fix

Changes Made

1. Project Documentation

Added a comprehensive README.md describing:
- The multimedia RAG architecture and pipeline stages
- Supported models (ImageBind, PyTorchVideo, Qwen Omni)
- Dataset download and preprocessing instructions
- Environment setup workflow
- References to relevant benchmarks and datasets

This provides a complete entry point for setup and experimentation.

2. Environment & Dependency Management

Added pyproject.toml with two isolated dependency groups:
- ref5-multimedia-rag-vlm (retrieval + embedding pipeline)
- ref5-multimedia-rag-vlm-qa (QA + multimodal reasoning)
Explicit CUDA and package version specification for reproducibility.
Designed for clean environment separation between retrieval and QA stages.

3. Source Code (`src/`)

Added a modular src/ package containing the core retrieval, segmentation, inference, meta-aggregation, and model components (AV-RAG, SFS, Qwen Omni), enabling a clean and extensible implementation of the multimedia RAG pipeline.

4. Notebook (`multimedia_rag.ipynb`)

Added an end-to-end experimental notebook demonstrating dataset preprocessing, multimodal retrieval, SFS frame selection, segment-level QA, and meta-agent aggregation within a reproducible research workflow.

Testing

Tests pass locally (uv run pytest tests/)
Type checking passes (uv run mypy <src_dir>)
Linting passes (uv run ruff check src_dir/)
Manual testing performed (describe below)

Manual testing details:

Screenshots/Recordings

Related Issues

Deployment Notes

Checklist

Code follows the project's style guidelines
Self-review of code completed
Documentation updated (if applicable)
No sensitive information (API keys, credentials) exposed

…taset_utils.py. Added QnAs

aravind-3105 requested a review from shainarazavi March 2, 2026 04:28

aravind-3105 added the enhancement New feature or request label Mar 2, 2026

aravind-3105 added 3 commits March 2, 2026 17:07

Initial commit with all files

593a87d

Refactor import paths and improve video number extraction logic in da…

4885b79

…taset_utils.py. Added QnAs

Add SFS and meta-agent aggregation support

71cf492

aravind-3105 force-pushed the video_rag branch from 03ff928 to 71cf492 Compare March 2, 2026 22:08

aravind-3105 self-assigned this Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End-to-End Multimedia RAG Framework (Retrieval, SFS, QA, and Meta-Aggregation)#18

End-to-End Multimedia RAG Framework (Retrieval, SFS, QA, and Meta-Aggregation)#18
aravind-3105 wants to merge 3 commits intomainfrom
video_rag

aravind-3105 commented Mar 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aravind-3105 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

Changes Made

1. Project Documentation

2. Environment & Dependency Management

3. Source Code (src/)

4. Notebook (multimedia_rag.ipynb)

Testing

Screenshots/Recordings

Related Issues

Deployment Notes

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aravind-3105 commented Mar 2, 2026 •

edited

Loading

3. Source Code (`src/`)

4. Notebook (`multimedia_rag.ipynb`)