Vision of Voice Application

This application allows users to describe images using their voice, converting the audio input into text with the OpenAI Whisper-1 model, and then generating an image from that text using the DALL-E model. Users can also obtain descriptions of their generated images via GeminiAI.

How does App work?

Features

Whisper-1 Model: Utilizes the OpenAI Whisper-1 model to convert audio recordings into text.
DALL-E Model: Employs the DALL-E model to generate images from text.
Description Retrieval: Users can click the "Describe" button to obtain descriptions of their generated images.

Usage Instructions

API Keys: Enter your OpenAI and GoogleAI API keys in the left-side menu.
Record Audio: Use the application interface to record your voice.
Check Audio: Click the "Check" button to review your recorded audio.
Send to AI: You can either send the audio or use the direct send option.

Libraries

openai==1.48.0
streamlit==1.38.0
Wave==0.0.2
google-generativeai==0.8.2
streamlit-audiorec

Requirements

OpenAI API Key
GoogleAI API Key

Notes

This application is developed for individual experiences and is not a commercial product.

Contributing

If you have any suggestions or feedback regarding errors, please feel free to reach out to me on LinkedIn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

How does App work?

Features

Usage Instructions

Libraries

Requirements

Notes

Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

How does App work?

Features

Usage Instructions

Libraries

Requirements

Notes

Contributing