Vision2Audio - Giving the blind an understanding through AI

More details can be found here - Hackster.io

Vision2Audio is a web application designed to enhance the lives of visually impaired and blind individuals by enabling them to capture images, ask questions about them, and receive spoken answers using cutting-edge AI technologies.

The application leverages NVIDIA's Riva Automatic Speech Recognition (ASR) to convert spoken questions into text. This text is then fed into the LLaVA (Large Language-and-Vision Assistant) multimodal model using llama.cpp server implementation, which provides comprehensive image description. Finally, NVIDIA's Riva Text-to-Speech (TTS) technology converts the generated text into spoken audio, delivering the answers to the user in an accessible format.

Usage

For simplicity we will assume everything is installed. Start Nvidia Riva server by running the command:

bash riva_start.sh

Once the Riva server status is running, open another terminal and execute the following command to run llava server via llama.cpp:

./bin/server -m models/llava1.5-13B/ggml-model-q4_k.gguf
   --mmproj models/llava1.5-13B/mmproj-model-f16.gguf
   --port 8080
   -ngl 35

You can download the models from here. Keep the server running in the background. Open another terminal and run:

python3 -m flask run --host=0.0.0.0 --debug

Open another terminal and run cloudflared tunnel using the following command:

cloudflared tunnel --url http://127.0.0.1:5000

Acknowledgements

The implementation of the project relies on:

It would not be possible without the llama.cpp project by @ggerganov
LLaVaVision project - A simple Be My Eyes web app with a llama.cpp/llava backend.

I thank the original authors for their open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
static		static
templates		templates
README.md		README.md
app.py		app.py
audio_recording.wav		audio_recording.wav
demo.png		demo.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision2Audio - Giving the blind an understanding through AI

More details can be found here - Hackster.io

Usage

Acknowledgements

About

Releases

Packages

Languages

shahizat/Vision2Audio

Folders and files

Latest commit

History

Repository files navigation

Vision2Audio - Giving the blind an understanding through AI

More details can be found here - Hackster.io

Usage

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages