GitHub - matejsarlija/donut_hr: Finetuning Donut transformer on a batch of store receipts in Croatian. No OCR needed. You can define your own key-value extraction pairs.

DONUT_HR

This is a a script made for finetuning a Donut model on a batch of receipts in Croatian, AFTER finetuning the base Donut model on the SROIE dataset using HuggingFace's Transformer library. The script is adapted (well, swiped) from Phil Schmid https://www.philschmid.de/fine-tuning-donut .

The Donut model consists of a text transformer (BERT) plus a Vision Transformer (SWIN). Luckily the text part is multilingual so it does ok with Croatian language.

The task we're looking at in this example is Document Visual Question Answering (or DocVQA in short).

The SROIE dataset consists of about 1000 images (624 in the end) that are paired with K-V list of items in the receipt that we want to train our model on - so no OCR is done beforehand, there is no bounding boxes drawing needed, we just tell the model that certain pieces of info are on the receipts and that we want to find what their values are on a specific receipt.

The hard part was collecting store receipts (and some other similar documents, receipts for non-store services, toll receipts...) in Croatian, and of course labeling. The collecting part was done over the summer with generous help from other studymates, and from around 250 collected receipts I ended up using around 130.

"Labeling" I did solo, and it ended up taking considerable time.

So we have two scripts in the repository - the first one is responsible for processing the SROIE dataset, finetuning the base Donut model on that data, and then uploading the result to HuggingFace. The second one uses the result model from the first process and finetunes it one step further, after processing our dataset consisting of receipts in Croatian and uploads the final model checkpoint to Huggingface.

The end result can be found on HuggingFace of course, keeping it all open source in spirit https://huggingface.co/oxioxi/donut-base-sroie-v1.5.

The python notebook files are helpers for a presentation I did at the end of the project, using nbconvert package that can "serve" notebooks are webpages.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
processor		processor
processor_lipik		processor_lipik
racuni/img		racuni/img
.gitignore		.gitignore
README.md		README.md
donut_lipik.html		donut_lipik.html
donut_lipik.ipynb		donut_lipik.ipynb
donut_sample.ipynb		donut_sample.ipynb
ocred.py		ocred.py
preprocess_lipik.py		preprocess_lipik.py
process_ocred.py		process_ocred.py
requirements.txt		requirements.txt
tasks_batch1.json		tasks_batch1.json
tasks_batch2.json		tasks_batch2.json
tasks_batch3.json		tasks_batch3.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DONUT_HR

About

Releases

Packages

Languages

matejsarlija/donut_hr

Folders and files

Latest commit

History

Repository files navigation

DONUT_HR

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages