Flask-based AI app that summarizes surveillance videos using Whisper (audio), ViT-GPT2 (frame captions), and Groq LLM (narratives). Produces both general and law enforcement-style summaries.
-
Updated
May 23, 2026 - Python
Flask-based AI app that summarizes surveillance videos using Whisper (audio), ViT-GPT2 (frame captions), and Groq LLM (narratives). Produces both general and law enforcement-style summaries.
A powerful Streamlit application that analyzes images using multiple vision models and responds to queries about visual content through conversational AI.
An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app provides enhanced captions by integrating detected objects into the generated text.
Add a description, image, and links to the vit-gpt2 topic page so that developers can more easily learn about it.
To associate your repository with the vit-gpt2 topic, visit your repo's landing page and select "manage topics."