GitHub - stackmodel/llm-pydantic-json: Unlock Precise JSON Outputs from Multimodal LLMs with Pydantic

LLM -> JSON (using Pydantic)

This is a simple Streamlit app that uses a Google Gemini Flash Multimodal LLM to extract text from a receipt image and outputs the result as a structured JSON format using Pydantic.

Features

Upload a receipt image (JPG, JPEG, PNG).
Extract readable text from the image using Google Gemini Flash Multimodal LLM.
Present the extracted content in a structured JSON format.
Use Pydantic models to ensure the output is clean and accurate.
Easy-to-use Streamlit interface for interacting with the app.

Setup Instructions

Clone the repository

git clone https://github.com/stackmodel/llm-pydantic-json.git
cd llm-pydantic-json

install Dependencies:
- Make sure you have Python 3.7 or higher installed. Then, create a virtual environment and install the dependencies:
```
python -m venv env
source env/bin/activate  # For Linux/macOS
.\env\Scripts\activate   # For Windows
pip install -r requirements.txt
```
Rename .env.example to .env file and populate the google gemini api key. You can obtain your API key from Google AI Studio.
Run the app using the following command: streamlit run app.py This will launch the app in your browser. Upload the sample.png file.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
app.py		app.py
readme.md		readme.md
requirements.txt		requirements.txt
sample.png		sample.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

License

stackmodel/llm-pydantic-json

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages