Document Layout Analysis using YOLOv8
View Demo
·
Report Bug
·
Request Feature
Table of Contents
In this project, I provided 1 object detection model trained on the existing YOLOv8 weights. They are uploaded in my Hugging Face Space of the project. If you feel the need to use or fine-tune the models in any parts of your work, please cite this repository. Thank you, and don't forget to give this repo a 🌟!
Due to the lack of computational resources, I only performed the training process on the Doclaynet-base dataset which contains 6910 train images, 648 val images, 499 test images. However, the model could perform relatively well, further proving the superiority of YOLOv8 model.
- python 3
- ultralytics
- numpy
- opencv-python
- Clone the repo
git clone https://github.com/LynnHaDo/Document-Layout-Analysis.git
- Install packages
pip install ultralytics pip install numpy pip install opencv-python
- Download Doclaynet dataset and save it as
datasets/doclaynet-base
- (Optional) Download pretrained YOLOv8s weights
-
Ultralytics YOLOv8
authors: - family-names: Jocher given-names: Glenn orcid: "https://orcid.org/0000-0001-5950-6979" - family-names: Chaurasia given-names: Ayush orcid: "https://orcid.org/0000-0002-7603-6750" - family-names: Qiu given-names: Jing orcid: "https://orcid.org/0000-0003-3783-7069" title: "YOLO by Ultralytics" version: 8.0.0 date-released: 2023-1-10 license: AGPL-3.0 url: "https://github.com/ultralytics/ultralytics"
-
Doclaynet-base dataset
@article{doclaynet2022, title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation}, doi = {10.1145/3534678.353904}, url = {https://doi.org/10.1145/3534678.3539043}, author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J}, year = {2022}, isbn = {9781450393850}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, pages = {3743–3751}, numpages = {9}, location = {Washington DC, USA}, series = {KDD '22} }
Linh Do - do24l@mtholyoke.edu/dohalinh2303@gmail.com (personal)
Project Link: https://github.com/LynnHaDo/Document-Layout-Analysis
LinkedIn: https://linkedin.com/in/Linh Do