You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: BasicPythonScripts/Text Extractor from PDF/README.md
+12-4Lines changed: 12 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,15 +3,18 @@
3
3
## Modules Required
4
4
- PyPDF2 - It is used in Python for PDF related operations
5
5
6
-
## How it Works?
6
+
## AIM
7
+
To build a Python Script using the PyPDF2 Module which can extract text from a PDF file.
8
+
9
+
## COMPILATION STEPS
7
10
* Import PyPDF2 Module to:-
8
11
* Read the pdf into the program to further manipulate it
9
12
* Count the number of pages in the PDF
10
13
* Extract the text from a single PDF page
11
-
* Initialize an empty string
12
-
* A for loop parses through each page
14
+
* Initialize an empty string which will store the text being extracted from the PDF file
15
+
* A for loop is made to parse through each page
13
16
* The extractText() function is used to extract text from the parsed PDF page
14
-
* The extracted text is added to the emptry string initialized
17
+
* The extracted text is added to the emptry string initialized using simple string concatenation
15
18
* After parsing is done, the string in which the extracted text is stored is written in a new file named **extracted_text.txt** using basic File Handling in Python
16
19
17
20
## PDF FILE WITH TEXT
@@ -32,3 +35,8 @@ To know more: [PyPDF2 Docs](https://pythonhosted.org/PyPDF2/)
32
35
## File handling
33
36
Python has some inbuilt methods to handles files and perform operations like reading and writing.
34
37
read about them : [File Handling Docs](https://www.geeksforgeeks.org/reading-writing-text-files-python/)
0 commit comments