Document AI with Hugging Face Transformers

Document AI s a term that has become popular over the last 3 years. It defines machine learning models, tasks, and techniques to classify, parse, and extract information from documents in digital and print forms, like invoices, receipts, licenses, contracts, and business reports.

This repository contains different example and tutorials on how to get started with Document AI and Transformers. Below you can also find a compendium of available models, tasks, datasets and other resources.

Training

Inference

Data-processing

convert FUNSD to donut document for vqa

Demos/Spaces

Community:

popular models are layoutlm.... and Donut which we will use today get a first impression of how you can build you own document AI System using Hugging Face Transformers.

Machine Learning Models (Transformers)

Below you can find a table of the currently available Transformers models, who are achieving state-of-the-art performance on Document AI tasks.

model	paper	license	checkpoints
Donut	arxiv	MIT	huggingface
LiLT	arxiv	MIT	huggingface
LayoutLM	arxiv	MIT	huggingface
LMLayoutXLM	arxiv	CC BY-NC-SA 4.0	huggingface
LayoutLMv2	arxiv	CC BY-NC-SA 4.0	huggingface
LayoutLMv3	arxiv	CC BY-NC-SA 4.0	huggingface
DiT	arxiv	CC BY-NC-SA 4.0	huggingface
TrOCR	arxiv	MIT	huggingface

Tasks

Document AI includes the following use cases and tasks:

document classification (image-classification)
document parsing (form understanding & information extraction)
visual question answering
table detection/layout analysis
optical character recognition (OCR)

Datasets

Dataset	Task	Hugging Face Datasets
SROIE	document parsing	darentang/sroie
RVL-CDIP	document classification	rvl_cdip
XFUND	document parsing	ranpox/xfund
FUNSD	document parsing	nielsr/funsd
CORD	information extraction/parsing	naver-cola-ix/cord-v2
DocVQA	visual question answering	load manually
WildReceipt	document parsing	Theivaprakasham/wildreceipt
TableBank	table detection/layout analysis	load manually
DocBank	table detection/layout analysis	load manually
ReadingBank	table detection/layout analysis	load manually
EATEN	document parsing	load manually
PubLayNet	table detection/layout analysis	jordanparker6/publaynet
ICDAR2019_cTDaR	table detection/layout analysis	load manually

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
data_processing		data_processing
inference		inference
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document AI with Hugging Face Transformers

Machine Learning Models (Transformers)

Tasks

Datasets

APIs and existing Solutuions

Other Tools

Resources

About

Releases

Packages

Languages

License

philschmid/document-ai-transformers

Folders and files

Latest commit

History

Repository files navigation

Document AI with Hugging Face Transformers

Machine Learning Models (Transformers)

Tasks

Datasets

APIs and existing Solutuions

Other Tools

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages