This project provides a content description application that can analyze frames from real-time video streams, supports multiple languages, and includes text-to-speech (TTS) capabilities. Utilizing powerful AI models from Google, the application analyzes images and delivers descriptions in various languages, which can also be audibly presented. It features a user-friendly interface with customizable style options.
- Real-Time Video Streaming: View and manage video streams from different camera sources.
- Frame Description: AI-powered content description that analyzes and describes video frames.
- Multi-Language Support: Content descriptions in English, Turkish, German, and Arabic.
- Text-to-Speech Conversion: Listen to descriptions in the selected language.
- User-Friendly Interface: Intuitive and easy-to-use interface.
- Customizable Style: Personalize the application's appearance with style settings.
-
content_description.py:
- Contains the content description class and functions.
- Provides user input handling, frame description, and text-to-speech functionality.
-
main.py:
- The main entry point of the application.
- Sets up the user interface using Tkinter and applies style settings.
- Initializes the video stream handler and content describer objects.
-
video_stream.py:
- Contains the class that manages video streaming.
- Processes video streams from the camera and displays them on a Tkinter Canvas.
- Provides functions to start, stop, and update the video stream.
-
Install Required Libraries:
pip install -r requirements.txt
-
Set Up Environment Variables:
Add your Google API key to the
.env
file. -
Run the Application:
python main.py
-
Select Camera and Language:
In the application interface, select a camera from the available options and choose the desired description language.
-
Start Video Stream and Get Description:
- Press the "Select Camera" button to start the video stream.
- Press the "Describe the frame" button to get the description of the video frame.
- Press the "Text-to-Speech" button to hear the description audibly.
To see an example of the application in action, refer to the following steps:
- Select Camera: Choose the desired camera source from the dropdown menu.
- Select Language: Choose the language for the description from the language menu.
- Start Video Stream: Click "Select Camera" to start the video stream.
- Describe Frame: Click "Describe the frame" to get a textual description of the current video frame.
- Text-to-Speech: Click "Text-to-Speech" to hear the description spoken aloud.