This project implements a Visual Question Answering (VQA) system using a pre-trained model from Hugging Face or a locally stored model.
-
Clone the repository:
git clone https://github.com/tedoaba/Vision-Question-Answering.git cd Vision-Question-Answering -
Create a virtual environment:
python -m venv venv source venv/Scripts/activate # On Windows
-
Install the dependencies:
pip install -r requirements.txt
Run the VQA system with an image and a question:
python scripts/run_vqa.py --image "path_to_image.jpg" --question "What is the person doing?"Run the VQA system with a local model:
python scripts/run_vqa.py --image "path_to_image.jpg" --question "What is the color of the car?" --model_path "models/vqa/"Run the VQA system with an image URL:
python scripts/run_vqa.py --image "https://example.com/image.jpg" --question "What is happening in the image?" --urlTo run the Streamlit app, use the following command:
streamlit run app.pyTo run the tests locally, follow these steps:
-
Install the dependencies:
pip install -r requirements.txt
-
Run the tests:
python -m unittest discover -s tests
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.