scrapping

This project is developed on Python Platform. This is a PDF Scraping program.

Python Version : 3.6+

Setup Environment

Install PyPDF2

$ pip3 install PyPDF2

Install pdf2image

$ pip3 install pdf2image

Install tabula-py

$ pip3 install tabula-py

Setup Project

Clone GIT

$ git clone https://github.com/abasu17/scraping.git

$ cd scraping

Run Project

$ python3 scraping.py Absolute_PDF_File_Path Header_String OCR_Mode_On/Off

Example

$ python3 scraping.py "/home/myDesktop/ACC.pdf" "Management Discussion and Analysis" 0

Keep in mind : If OCR Mode is enable, it will take longer time.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ACC.pdf		ACC.pdf
README.md		README.md
scraping.py		scraping.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrapping

Setup Environment

Install PyPDF2

Install pdf2image

Install tabula-py

Setup Project

Clone GIT

Run Project

Example

About

Uh oh!

Releases

Packages

Languages

abasu17/scraping

Folders and files

Latest commit

History

Repository files navigation

scrapping

Setup Environment

Install PyPDF2

Install pdf2image

Install tabula-py

Setup Project

Clone GIT

Run Project

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages