Skip to content

Finetuning Donut transformer on a batch of store receipts in Croatian. No OCR needed. You can define your own key-value extraction pairs.

Notifications You must be signed in to change notification settings

matejsarlija/donut_hr

Repository files navigation

DONUT_HR

This is a a script made for finetuning a Donut model on a batch of receipts in Croatian, AFTER finetuning the base Donut model on the SROIE dataset using HuggingFace's Transformer library. The script is adapted (well, swiped) from Phil Schmid https://www.philschmid.de/fine-tuning-donut .

The Donut model consists of a text transformer (BERT) plus a Vision Transformer (SWIN). Luckily the text part is multilingual so it does ok with Croatian language.

The task we're looking at in this example is Document Visual Question Answering (or DocVQA in short).

The SROIE dataset consists of about 1000 images (624 in the end) that are paired with K-V list of items in the receipt that we want to train our model on - so no OCR is done beforehand, there is no bounding boxes drawing needed, we just tell the model that certain pieces of info are on the receipts and that we want to find what their values are on a specific receipt.

The hard part was collecting store receipts (and some other similar documents, receipts for non-store services, toll receipts...) in Croatian, and of course labeling. The collecting part was done over the summer with generous help from other studymates, and from around 250 collected receipts I ended up using around 130.

"Labeling" I did solo, and it ended up taking considerable time.

So we have two scripts in the repository - the first one is responsible for processing the SROIE dataset, finetuning the base Donut model on that data, and then uploading the result to HuggingFace. The second one uses the result model from the first process and finetunes it one step further, after processing our dataset consisting of receipts in Croatian and uploads the final model checkpoint to Huggingface.

The end result can be found on HuggingFace of course, keeping it all open source in spirit https://huggingface.co/oxioxi/donut-base-sroie-v1.5.

The python notebook files are helpers for a presentation I did at the end of the project, using nbconvert package that can "serve" notebooks are webpages.

About

Finetuning Donut transformer on a batch of store receipts in Croatian. No OCR needed. You can define your own key-value extraction pairs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published