GitHub - limitmhw/Document-Layout-Analysis: Object Detection Model for Scanned Documents

Layout Analysis of Scanned Documents

Document Layout Analysis using YOLOv8
View Demo · Report Bug · Request Feature

Table of Contents

Updates
About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Works Cited
Acknowledgments

Updates

In this project, I provided 1 object detection model trained on the existing YOLOv8 weights. They are uploaded in my Hugging Face Space of the project. If you feel the need to use or fine-tune the models in any parts of your work, please cite this repository. Thank you, and don't forget to give this repo a 🌟!

About The Project

Due to the lack of computational resources, I only performed the training process on the Doclaynet-base dataset which contains 6910 train images, 648 val images, 499 test images. However, the model could perform relatively well, further proving the superiority of YOLOv8 model.

(back to top)

Built With

(back to top)

Prerequisites

python 3
ultralytics
numpy
opencv-python

Installation

Clone the repo

git clone https://github.com/LynnHaDo/Document-Layout-Analysis.git

Install packages

pip install ultralytics
pip install numpy
pip install opencv-python

Download Doclaynet dataset and save it as datasets/doclaynet-base
(Optional) Download pretrained YOLOv8s weights

(back to top)

Works Cited

Ultralytics YOLOv8

authors:
 - family-names: Jocher
   given-names: Glenn
   orcid: "https://orcid.org/0000-0001-5950-6979"
 - family-names: Chaurasia
   given-names: Ayush
   orcid: "https://orcid.org/0000-0002-7603-6750"
 - family-names: Qiu
   given-names: Jing
   orcid: "https://orcid.org/0000-0003-3783-7069"
title: "YOLO by Ultralytics"
version: 8.0.0
date-released: 2023-1-10
license: AGPL-3.0
url: "https://github.com/ultralytics/ultralytics"

Doclaynet-base dataset

@article{doclaynet2022,
 title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation},
 doi = {10.1145/3534678.353904},
 url = {https://doi.org/10.1145/3534678.3539043},
 author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},
 year = {2022},
 isbn = {9781450393850},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
 pages = {3743–3751},
 numpages = {9},
 location = {Washington DC, USA},
 series = {KDD '22}
 }

Contact

Linh Do - do24l@mtholyoke.edu/dohalinh2303@gmail.com (personal)

Project Link: https://github.com/LynnHaDo/Document-Layout-Analysis

LinkedIn: https://linkedin.com/in/Linh Do

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
images		images
CITATION.cff		CITATION.cff
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
doclaynet.yaml		doclaynet.yaml
document_layout_analysis.ipynb		document_layout_analysis.ipynb
enhanced_document_layout_analysis.ipynb		enhanced_document_layout_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Layout Analysis of Scanned Documents

Updates

About The Project

Built With

Prerequisites

Installation

Works Cited

Contact

About

Releases

Packages

Languages

License

limitmhw/Document-Layout-Analysis

Folders and files

Latest commit

History

Repository files navigation

Layout Analysis of Scanned Documents

Updates

About The Project

Built With

Prerequisites

Installation

Works Cited

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages