Skip to content

EchoGuide is a low-cost, wearable assistive device that helps visually impaired users navigate safely using real-time audio cues. It uses an ESP32-CAM and a Flask server running lightweight object detection to recognize obstacles and respond to voice queries, promoting independence through smart vision and speech-based guidance.

Notifications You must be signed in to change notification settings

SaadZiaatharKhan/EchoGuide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EchoGuide: AI-Powered Navigation for the Visually Impaired

EchoGuide is a wearable assistive device that empowers visually impaired users with real-time audio feedback about their surroundings. Utilizing an ESP32-CAM module and a Python FastAPI server running lightweight computer vision models (e.g., MobileNet SSD or Tiny YOLO), EchoGuide identifies objects, estimates distances, and delivers simple voice cues. It can also save and recognize faces, making it a valuable companion for those with visual impairments.

Table of Contents

Features

  • Real-time object detection and distance estimation
  • Voice feedback for objects and faces
  • Face recognition for saved faces
  • Easy-to-use Python API for customization and integration
  • Compatible with ESP32-CAM modules

Technologies Used

  • ESP32-CAM
  • ESP32
  • Python
  • Flask
  • OpenCV
  • YOLO
  • DeepGram
  • face_recongition

Hardware Requirements

  • ESP32-CAM module
  • FTDI
  • ESP32-CAM module Development Board (Optional)
  • ESP32
  • PAM8403 Amplifier
  • Any small speaker
  • Jumper Wires
  • Breadboard or PCB
  • Power Supply
  • Micro USB Cable
  • Arduino Cable
  • Capacitors and Resistors for noise reduction (Optional)

Software Requirements

System Architecture

Workflow

  • ESP32-CAM captures frames and streams via HTTP POST.
  • Flask Server receives images, runs object detection and distance estimation.
  • System Microphone listens for voice commands; server responds with targeted frame analysis.
  • DeepGram processes audio and sends text to ESP32 for TTS and hence audio feedback to the user via speaker.

Installation

For Python Server

  1. Navigate to Root Directory.
  2. Create a virtual environment with Python 3.11.4 with name .venv.
  3. Activate the virtual environment using command .\venv\Scripts\activate.
  4. Run the following commands one by one
     pip install --upgrade pip
     pip install wheel setuptools
     pip install cmake==3.27.7
     pip install dlib==19.24.2
     pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cpu
     pip install -r requirements.txt
  5. While installing face_recognition, if you encounter any error regarding cmake or dlib, then you will have to manually install cmake from here.
  6. While installing, if there is any package compatibility issue, you will have to resolve it.

For ESP32

  1. Navigate to Speaker directory.

  2. Follow the tutorial to install the ESP32 development board.

  3. Upload the code to your ESP32.

  4. Connect ESP32, speaker and PAM8403 Amplifier to the ESP32 in following manners.

    ESP32 PAM8403 Amplifier Speaker
    3.3 V (or 5 V, Choose Accordingly) 5 V +
    GND GND
    GPIO 25 L IN
    L OUT + + (Positive Terminal)
    L OUT - - (Negative Terminal)

For ESP32-CAM Module

  1. Navigate to ESP32_Camera directory.

  2. Connect the ESP32-CAM module to the FTDI in following manner.

    ESP32 Cam Module FTDI
    5 V VCC
    GND GND
    UOT RX
    UOR TX
  3. Connect the IO0 and GND of the ESP32-CAM module. (This is only done while uploading the code)

  4. Upload the code to your ESP32-CAM module.

  5. Disconnect the IO0 and GND of the ESP32-CAM module.

  6. Restart the ESP32-CAM module.

Setting Up Variables

  1. Connect the ESP32, ESP32-CAM module and Laptop (or PC) to the same network (Common WiFi or can be Mobile Hotspot).
  2. Get a DeepGram API Key from here.
  3. Paste the API Key in DEEPGRAM_API_KEY (line no. 35) of EchoGuide.py.
  4. Go to terminal and type ipconfig. Note the IPv4 Address and paste in pythonServerIP (line no. 13) of Speaker.ino. You can also also check whether the Python server is working or not by running the code and then typing YOUR_IPv4_ADDRESS_:5000. If it does not work, try with different IP Address.
  5. Write your WiFi Name and Password in both ESP32 and ESP32-CAM Module before uploading.
  6. While uploading the code in ESP32-CAM Module, use Boards: ESP32 Dev Board and in Tools > Partition Scheme : Default 4 MB with spiffs (1.2 MB APP / 1.5 MB SPIFFS). If this does not works use Huge APP (3 MB No OTA / 1 MB SPIFFS). If this also does not works, check on the internet and try again.
  7. After Uploading the codes in respective boards, open serial monitor and check their IP addresses on which they are running.
  8. Then paste the IP address of ESP32-CAM Module in esp32_ip (line no. 32) and ESP32 in esp32_audio_ip (line no. 36) of EchoGuide.py.

Running the Application

  1. After doing all the processes, connect the ESP32 and ESP32-CAM Module to a power source or laptop (or PC).
  2. Run the Python file. It should start the Flask server.
  3. It will automatically download yolov8n.pt.
  4. This system is able to detect only a few day-to-day items. But if you want, you can add more to it by reading YOLO documentation or using internet.
  5. You can visit http://<YOUR_IP_ADDRESS>:5000 to see the basic website created. There on clicking View Live Stream, you can see the live streaming of object detection in your browser. Remember to close this tab to able to run the application smoothly as this is only for viewing.
  6. Say, for example, Find <Object> and it will find Object for you.
  7. If you find a face and say Capture <Name>, it will save that face with the given name in a json file.
  8. If you say Detect <Name>, it will find that face.
  9. The system gives Object Name, Distance (in meters) and Left/ Right/ Center.

Special Notes

  1. You can also use a microphone.By default it uses your systems microphone (or microphone module), but you will need to change little code here and there or will also need to write different function for microphone and then Speech-To-Text.
  2. In this code we have used PWM hardware (PAM8403). But it is recommended to use I2C type amplifier (e.g., MAX98357) and microphone (or microphone module) due to better compatibility with ESP boards, but you will need to change little code here and there.
  3. Here we used two boards as ESP32-CAM module is not that much powerful. But if you get a powerful board (which is costlier) then by tweaking the code, it should be fine.
  4. The ESP32-CAM module can be laggy, slow and even break break sometimes as it is not that much powerful. In that disconnect the power supply and try again.
  5. You can also use capacitors and resistors in the circuit to reduce the noise.

PPT

PPT 1 PPT 2 PPT 3 PPT 4 PPT 5 PPT 6 PPT 7

About

EchoGuide is a low-cost, wearable assistive device that helps visually impaired users navigate safely using real-time audio cues. It uses an ESP32-CAM and a Flask server running lightweight object detection to recognize obstacles and respond to voice queries, promoting independence through smart vision and speech-based guidance.

Topics

Resources

Stars

Watchers

Forks