Vision of Voice Application
This application allows users to describe images using their voice, converting the audio input into text with the OpenAI Whisper-1 model, and then generating an image from that text using the DALL-E model. Users can also obtain descriptions of their generated images via GeminiAI.
- Whisper-1 Model: Utilizes the OpenAI Whisper-1 model to convert audio recordings into text.
- DALL-E Model: Employs the DALL-E model to generate images from text.
- Description Retrieval: Users can click the "Describe" button to obtain descriptions of their generated images.
- API Keys: Enter your OpenAI and GoogleAI API keys in the left-side menu.
- Record Audio: Use the application interface to record your voice.
- Check Audio: Click the "Check" button to review your recorded audio.
- Send to AI: You can either send the audio or use the direct send option.
openai==1.48.0
streamlit==1.38.0
Wave==0.0.2
google-generativeai==0.8.2
streamlit-audiorec
- OpenAI API Key
- GoogleAI API Key
- This application is developed for individual experiences and is not a commercial product.
If you have any suggestions or feedback regarding errors, please feel free to reach out to me on LinkedIn.