Powered by Streamlit + AWS Rekognition. Specifically, Streamlit runs the user interaction and AWS Rekognition does the OCR, and another Python library URLExtract does the URL matching.
- Requires AWS Account with access to Rekognition service (AWS Tutorial)
- Copy or Rename
.env.example
as.env.dev
and fill in AWS Access Key, Secret Key, Region for your Rekognition account
mv .env.example .env.dev
Requires docker-compose to be installed (this comes with Docker Desktop).
docker-compose up
# Open localhost:8501
Use -d
to detach from logs.
Use --build
on subsequent runs to rebuild dependencies / docker image.
# Linting
docker-compose run streamlit-app nox.sh -s lint
# Unit Testing
docker-compose run streamlit-app nox.sh -s test
# Both
docker-compose run streamlit-app nox.sh
# As needed:
docker-compose build
# E2E Testing
docker-compose up -d --build
# Replace screenshots
docker-compose exec streamlit-app nox -s test -- -m e2e --visual-baseline
# Compare to visual baseline screenshots
docker-compose exec streamlit-app nox -s test -- -m e2e
# Turn off / tear down
docker-compose down
For code completion / linting / developing / etc.
python -m venv venv
. ./venv/bin/activate
# .\venv\Scripts\activate for Windows
python -m pip install -r ./streamlit_app/requirements.dev.txt
pre-commit install
# Linting / Static Checking / Unit Testing
python -m black streamlit_app
python -m isort --profile=black streamlit_app
python -m flake8 --config=./streamlit_app/.flake8 streamlit_app
- Containerization with Docker
- Dependency installation with Pip
- Test automation with Nox
- Linting with pre-commit and Flake8
- Code formatting with Black
- Testing with pytest
- Code coverage with Coverage.py
This version sends binary image data to AWS Rekognition, which is limited to 5mb. To account for this, image uploads that are larger than this size are resized down before sending to AWS.
Rekognition's text detection is limited to 100 words. Images with more than this limit may benefit from AWS Textract
- makefile
- 5 mb limit without S3 version
- Option for Textract OCR backend
- 5 mb limit with S3 version (X from env / config)
- X mb limit from env / config
- FastAPI backend
- API_key access to backend without streamlit
- Option for Tesseract / non-aws OCR backend