Extract structured menu information from images into JSON using a fine-tuned E2E model or LLM.
demo.mp4
Currently supports the following information from menu images:
- Restaurant Name
- Business Hours
- Address
- Phone Number
- Dish Information
- Name
- Price
For the JSON schema, see tools directory.
- Donut (Document Parsing Task) - Base model by Clova AI (ECCV ’22)
- Google Gemini API
- OpenAI GPT API
Use uv to set up the development environment:
uv syncor use
pip install -r requirements.txtif it has any problems
Please refer train.ipynb. Use Jupyter Notebook for training:
uv run jupyter-notebookFor VSCode users, please install Jupyter extension, then select
.venv/bin/pythonas your kernel.
uv run python app.py