Skip to content

gerardrbentley/streamlit-url-scanner

Repository files navigation

URL Scan

Powered by Streamlit + AWS Rekognition. Specifically, Streamlit runs the user interaction and AWS Rekognition does the OCR, and another Python library URLExtract does the URL matching.

Local Run

Update AWS connection secrets

  • Requires AWS Account with access to Rekognition service (AWS Tutorial)
  • Copy or Rename .env.example as .env.dev and fill in AWS Access Key, Secret Key, Region for your Rekognition account
mv .env.example .env.dev

Run with Docker

Requires docker-compose to be installed (this comes with Docker Desktop).

docker-compose up
# Open localhost:8501

Use -d to detach from logs.

Use --build on subsequent runs to rebuild dependencies / docker image.

Lint, Check, Test with Docker

# Linting
docker-compose run streamlit-app nox.sh -s lint
# Unit Testing
docker-compose run streamlit-app nox.sh -s test
# Both
docker-compose run streamlit-app nox.sh
# As needed:
docker-compose build

# E2E Testing
docker-compose up -d --build
# Replace screenshots
docker-compose exec streamlit-app nox -s test -- -m e2e --visual-baseline
# Compare to visual baseline screenshots
docker-compose exec streamlit-app nox -s test -- -m e2e
# Turn off / tear down
docker-compose down

Local Python environment

For code completion / linting / developing / etc.

python -m venv venv
. ./venv/bin/activate
# .\venv\Scripts\activate for Windows
python -m pip install -r ./streamlit_app/requirements.dev.txt
pre-commit install

# Linting / Static Checking / Unit Testing
python -m black streamlit_app
python -m isort --profile=black streamlit_app
python -m flake8 --config=./streamlit_app/.flake8 streamlit_app

Features

Rekognition Limitations

This version sends binary image data to AWS Rekognition, which is limited to 5mb. To account for this, image uploads that are larger than this size are resized down before sending to AWS.

Rekognition's text detection is limited to 100 words. Images with more than this limit may benefit from AWS Textract

Next Steps / Ideas

  • makefile
  • 5 mb limit without S3 version
  • Option for Textract OCR backend
  • 5 mb limit with S3 version (X from env / config)
  • X mb limit from env / config
  • FastAPI backend
  • API_key access to backend without streamlit
  • Option for Tesseract / non-aws OCR backend

About

Scan URLs from Images with Streamlit + AWS Rekognition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published