A framework made for converting an image file to PDF file.
Usage of just 4 packages of Python.
- First, this will search for the text inside the image file(most probably .jpg/.PNG file).
- Then, it'll fetch the text & store it in a variable.
- And then, writes it into a docx file(So that if it extracts wrong text, you can manually type the correct text).
- And finally, convert the word file hence created into a pdf file.
multipleImageToPdfConverter.mp4
- You need to download tesseract.
- You need following packages of python:
- PIL
- pytesseract
- docx
- docx2pdf
- IDLE (PyCharm Community Edition preferrable)
- Go to main file.
- Just check the following lines in the code:
text = extractTextFromImg(i) # - extract text from image file
appendDataToWord(j, text) # - append data into word file
convertDocxToPdf(j, k) # - convert it into pdf
- You don't need to change anything in this file. The changes which you need to make is in variables file.
- Just change the value of following pattern variable in variables:
pattern = str(i)
- Also, you need to put value of total files inside the imgFiles directory in the variable noOfFiles variable.
noOfFiles = 3 # Put the total no of files present.
- pattern is nothing but how image files are arranged in the directory
- Let's say the sequence in which filename is present
- For eg:- If the filename is img1.PNG, put
pattern = 'img' +str(i)+
![]() |
---|
Nitin Kumar |
![]() ![]() ![]() |