Skip to content

LLM wrapper service (currently `gpt4-vision-preview`) to batch generate alt text and title attributes for images defined in markdown formatted text.

License

Notifications You must be signed in to change notification settings

jonathanalgar/alttexter

Repository files navigation

alttexter

ci Bugs Vulnerabilities PRs Welcome License: GPLv3

Overview

Diagram of the system architecture of the alttexter microservice, showing its integration with GitHub client

LLM wrapper service (currently gpt4-vision-preview) to batch generate alt text and title attributes for images defined in markdown formatted text.

Exists to abstract the LLM and LangSmith APIs and provide a single interface for clients, for example, alttexter-ghclient.

See OpenAPI specification for the service here. All credit to @BCapinha for the original idea behind this service and being a great collaborator in its development.

Why?

via gov.uk:

When uploading images and visuals online, or in documents shared digitally, adding alt text can help people using assistive technologies to 'hear' those visuals. We aim to make sure that anyone using alt text through assistive technologies can get the same information from the description of an image as someone who relies on the visuals. Alt text often assists visually impaired people but is also used for search engine optimisation and for making sense of an image if it isn't visible or doesn't load.

Usage

  1. Clone the repo.

  2. Copy .env-example to .env and fill in the required env variables.

  3. Optionally edit config.json to customize CORS and logging.

  4. Run docker-compose up (v1) or docker compose up (v2) to build and start the service.

  5. Run python client-example.py example/apis.ipynb to test. Expected output:

    $ python client-example.py example/apis.ipynb
    Enter endpoint URL (eg. https://alttexter-prod.westeurope.cloudapp.azure.com:9100/alttexter):
    Enter ALTTEXTER_TOKEN:
    INFO [30-12-2023 07:32:33] File read successfully.
    INFO [30-12-2023 07:32:33] Unsupported image type: https://colab.research.google.com/assets/colab-badge.svg
    INFO [30-12-2023 07:32:33] Encoded image: api_use_case.png
    INFO [30-12-2023 07:32:33] Encoded image: api_function_call.png
    INFO [30-12-2023 07:32:33] Payload Summary:
    INFO [30-12-2023 07:32:33] Total local images (encoded): 2
    INFO [30-12-2023 07:32:33] Total image URLs: 2
    INFO [30-12-2023 07:32:33] Image URLs: ['https://github.com/langchain-ai/langchain/blob/b9636e5c987e1217afcdf83e9c311568ad50c304/docs/static/img/api_chain.png?raw=true', 'https://github.com/langchain-ai/langchain/blob/b9636e5c987e1217afcdf83e9c311568ad50c304/docs/static/img/api_chain_response.png?raw=true']
    INFO [30-12-2023 07:32:33] Sending payload to alttexter...
    INFO [30-12-2023 07:32:46] Response received at 30-12-2023 07:32:46
    {"images":[{"name":"api_use_case.png","title":"API Use Case Diagram","alt_text":"Diagram illustrating the use case of an LLM interacting with an external API."},{"name":"api_function_call.png","title":"API Function Call Process","alt_text":"Flowchart showing the process of an LLM formulating an API call based on a user query."},{"name":"https://github.com/langchain-ai/langchain/blob/b9636e5c987e1217afcdf83e9c311568ad50c304/docs/static/img/api_chain.png?raw=true","title":"API Request Chain Trace","alt_text":"Screenshot of a LangSmith trace showing the API request chain for generating an API URL."},{"name":"https://github.com/langchain-ai/langchain/blob/b9636e5c987e1217afcdf83e9c311568ad50c304/docs/static/img/api_chain_response.png?raw=true","title":"API Response Chain Trace","alt_text":"Screenshot of a LangSmith trace showing the API response chain for providing a natural language answer."}],"run_url":"https://smith.langchain.com/public/7596e591-559d-4ba4-b35e-58f93db6d25d/r"}
  6. This is a very basic client. Check alttexter-ghclient to integrate the service into your docs-as-code pipeline.

Features

  • Uses LangChain's Pydantic parser as foundation for system prompt to reliably generate a JSON of expected format (function calling will be even cooler).
  • Optionally integrates with LangSmith to serve trace URL for each generation.

TODO

  • Better error handling
  • Unit tests
  • Special handling for large files and images
  • Rate limiting at the service level
  • Explore extending to multimodal models beyond OpenAI
  • Option to use Azure OpenAI Services

About

LLM wrapper service (currently `gpt4-vision-preview`) to batch generate alt text and title attributes for images defined in markdown formatted text.

Resources

License

Stars

Watchers

Forks

Packages