GitHub

Vision of Voice Application

This application allows users to describe images using their voice, converting the audio input into text with the OpenAI Whisper-1 model, and then generating an image from that text using the DALL-E model. Users can also obtain descriptions of their generated images via GeminiAI.

How does App work?

Features

Whisper-1 Model: Utilizes the OpenAI Whisper-1 model to convert audio recordings into text.
DALL-E Model: Employs the DALL-E model to generate images from text.
Description Retrieval: Users can click the "Describe" button to obtain descriptions of their generated images.

Usage Instructions

API Keys: Enter your OpenAI and GoogleAI API keys in the left-side menu.
Record Audio: Use the application interface to record your voice.
Check Audio: Click the "Check" button to review your recorded audio.
Send to AI: You can either send the audio or use the direct send option.

Libraries

openai==1.48.0
streamlit==1.38.0
Wave==0.0.2
google-generativeai==0.8.2
streamlit-audiorec

Requirements

OpenAI API Key
GoogleAI API Key

Notes

This application is developed for individual experiences and is not a commercial product.

Contributing

If you have any suggestions or feedback regarding errors, please feel free to reach out to me on LinkedIn.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
media		media
LICENSE		LICENSE
README.md		README.md
app.py		app.py
audiotext.py		audiotext.py
imagetext.py		imagetext.py
requirements.txt		requirements.txt
script.js		script.js
textimage.py		textimage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How does App work?

Features

Usage Instructions

Libraries

Requirements

Notes

Contributing

About

Releases

Packages

Languages

License

HuseyinBaytar/VisionOfVoice

Folders and files

Latest commit

History

Repository files navigation

How does App work?

Features

Usage Instructions

Libraries

Requirements

Notes

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages