RAG for Video Retrieval

This project explores the implementation of Retrieval-Augmented Generation (RAG) for video data retrieval using open-source libraries and models.

Challenges and Goals

The project tackles the challenge of applying RAG to a non-standard domain - video data. This involved data pre-processing, multimodal embedding generation, and retrieval based on user queries.

Technical Breakdown

Data Creation & Transformation

Transcription:
- YouTube videos: Transcripts are directly used.
- Local videos (.mp4): OpenAI Whisper model is used to generate VTT files for videos lacking built-in transcripts.
- Context Extraction: Frames are extracted from each video segment's median timestamp to provide contextual information.

Embedding & Storage

Embedding Generation: CLIP library is used to generate embeddings for both video frames and text data.
Data Storage: LanceDB is chosen for efficient storage of the generated embeddings.

Retrieval & UI

Multimodal RAG System: LangChain is used to build a multimodal RAG system for retrieving relevant video segments based on user queries.
UI Interface: Gradio provides a user-friendly interface for interacting with the system. Users can input queries and the system will identify the matching segment and play the video from the correct timestamp.

Technology Stack

OpenAI Whisper, CLIP, LanceDB, LangChain, Gradio, Llama 3.2 11b Vision

License

This project is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
MultimodalEmbeddings.py		MultimodalEmbeddings.py
README.md		README.md
app.py		app.py
create_vectordb.py		create_vectordb.py
lvlm.py		lvlm.py
multimodal_lancedb.py		multimodal_lancedb.py
setup_data.py		setup_data.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG for Video Retrieval

Challenges and Goals

Technical Breakdown

Technology Stack

License

About

Uh oh!

Releases

Packages

Languages

License

shaddydevops/VideoRAG

Folders and files

Latest commit

History

Repository files navigation

RAG for Video Retrieval

Challenges and Goals

Technical Breakdown

Technology Stack

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages