This project is developed on Python Platform. This is a PDF Scraping program.
- Python Version : 3.6+
$ pip3 install PyPDF2
$ pip3 install pdf2image
$ pip3 install tabula-py
- $ git clone https://github.com/abasu17/scraping.git
- $ cd scraping
$ python3 scraping.py Absolute_PDF_File_Path Header_String OCR_Mode_On/Off
$ python3 scraping.py "/home/myDesktop/ACC.pdf" "Management Discussion and Analysis" 0
- Keep in mind : If OCR Mode is enable, it will take longer time.