An image to text encoder decoder built using pytorch
You will need python 3.10 and poetry installed to run this project. I recommend that you run it in a devcontainer, though.
First, allocate an ubuntu (22.04) vm instace with gpu and ssh into it.
- create the working directory:
sudo mkdir /workspace && cd /workspace
- clone the repository
sudo git clone https://github.com/alita-moore/img-to-text && sudo chown -R ubuntu:ubuntu && cd img-to-text
- setup the vm
sudo bash .devcontainer/setup-vm.sh
(you'll need to press enter / yes during the process) - Restart your machine:
sudo shutdown -r now
(you'll need to ssh back into the system) - cd into the project directory
cd /workspace/img-to-text
- run
docker login
and login to docker, this is necessary to pull the relevant cuda image - build and launch the devcontainer
devcontainer up --workspace-folder . --remove-existing-container
- setup a local docker context and connect to the running devcontainer remotely via vscode (tutorial: https://www.doppler.com/blog/visual-studio-code-remote-dev-containers-on-aws)
- once inside of the container navigate to /workspaces/img-to-text and run
poetry install
You can test inference capabilities via the dev.py file which mimics a jupyter notebook. To collect torch trace logs you should run the model with the following command:
TORCH_TRACE=/logs poetry run python dev.py
If you wish to run this code locally instead, make sure you have at least cuda 12.4 installed and then run poetry install
. You can install poetry via pip install pipx && pipx install poetry && pipx ensurepath
if it's not already installed.
The text_decoder code was largely provided by https://github.com/EleutherAI/aria-amt