Flask-based AI app that summarizes surveillance videos using Whisper (audio), ViT-GPT2 (frame captions), and Groq LLM (narratives). Produces both general and law enforcement-style summaries.
-
Updated
Jul 14, 2025 - Python
Flask-based AI app that summarizes surveillance videos using Whisper (audio), ViT-GPT2 (frame captions), and Groq LLM (narratives). Produces both general and law enforcement-style summaries.
AI-powered image captioning using InceptionV3+LSTM and ViT-GPT2 models. Trained on Flickr8k dataset with interactive Streamlit interface.
An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app provides enhanced captions by integrating detected objects into the generated text.
Developed an image captioning system using the BLIP model to generate detailed, context-aware captions. Achieved an average BLEU score of 0.72, providing rich descriptions that enhance accessibility and inclusivity.
The chrome extension that gets input images and generates the captions for them.
A powerful Streamlit application that analyzes images using multiple vision models and responds to queries about visual content through conversational AI.
Add a description, image, and links to the vit-gpt2 topic page so that developers can more easily learn about it.
To associate your repository with the vit-gpt2 topic, visit your repo's landing page and select "manage topics."