Skip to content

sidmishraw/docpruner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocPruner

Prunes the bad PDFs(probably scanned images of IEEE documents from IEEE Xplore) and moves them out of the input_pdfs folder and moves folders pdf_jsons and pdf_grouped_jsons out of the cs267_project folder so that the PDF - JSON generation process can be started from scratch.

The artifact/jar (executable) jar is located in here

Usage:

java -jar path_to_DocPruner.jar <path-to-pdfprocessor.log> <path-to-pdf_jsons> <path-to-pdf_grouped_jsons>

Incase of concerns contact: sidharth.mishra@sjsu.edu

About

DocPruner is an utility for pruning bad PDFs for cs 267 project and PDF processor

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages