Skip to content

Commit 0dc71a2

Browse files
authored
Create Readme.md
1 parent c272c28 commit 0dc71a2

File tree

1 file changed

+24
-0
lines changed
  • BasicPythonScripts/Text Extractor from PDF

1 file changed

+24
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Extract Text from PDF using Python Module
2+
3+
## Modules Required
4+
- PyPDF2 - It is used in Python for PDF related operations
5+
6+
## How it Works?
7+
* Import PyPDF2 Module to:-
8+
* Read the pdf into the program to further manipulate it
9+
* Count the number of pages in the PDF
10+
* Extract the text from a single PDF page
11+
* Initialize an empty string
12+
* A for loop parses through each page
13+
* The extractText() function is used to extract text from the parsed PDF page
14+
* The extracted text is added to the emptry string initialized
15+
* After parsing is done, the string in which the extracted text is stored is written in a new file named **extracted_text.txt** using basic File Handling in Python
16+
17+
----------------------------------------------------------------------------------
18+
## PyPDF2:
19+
A Pure-Python library built as a PDF toolkit.
20+
To know more: [PyPDF2 Docs](https://pythonhosted.org/PyPDF2/)
21+
22+
## File handling
23+
Python has some inbuilt methods to handles files and perform operations like reading and writing.
24+
read about them : [File Handling Docs](https://www.geeksforgeeks.org/reading-writing-text-files-python/)

0 commit comments

Comments
 (0)