Streamlit project with Tesseract OCR running on Streamlit Cloud.
- Upload an image with text on it
- Select the language
- Select the image preprocessing options (if needed) and check the result in the preview
- Crop the image to the text area (if needed)
- Run the OCR and check the result in the text preview
- Adjust the settings or image preprocessing and run the OCR again (if needed)
- Download the result as a text file or copy from the text preview
Installed languages for Tesseract OCR
Streamlit application is working - 04.06.2024
- Change layout of the app
- Change checkboxes to toggle buttons
- Add cropping functionality: https://github.com/turner-anderson/streamlit-cropper
- Add more CSS styling
- Cleanup of python app and repository
- Use Pillow for image preprocessing instead of OpenCV
- any advantages?
- Add Ace Editor for text preview
- any advantages?
- Add other OCR engines and test them
- Add
easyocr
and test it - Try
tesserocr
instead ofpytesseract
- Add
PyMuPDF
and test it - Add
ocrmypdf
and test it - Add
PaddleOCR
and test it - Add
keras-ocr
and test it
- Tesseract Documentation
- pytesseract Documentation
- OCR with Tesseract
OpenCV is used for image preprocessing before running OCR with Tesseract.
- OpenCV Image Processing Documentation
- OpenCV Python Tutorial
- OCR in Python Tutorials
import cv2
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# or
coefficients = [1,0,0] # Gives blue channel all the weight
# for standard gray conversion, coefficients = [0.114, 0.587, 0.299]
m = np.array(coefficients).reshape((1,3))
blue = cv2.transform(im, m)
- CLAHE (Contrast Limited Adaptive Histogram Equalization)
- https://www.tutorialspoint.com/how-to-change-the-contrast-and-brightness-of-an-image-using-opencv-in-python
- https://stackoverflow.com/questions/50474302/how-do-i-adjust-brightness-contrast-and-vibrance-with-opencv-python
- https://stackoverflow.com/questions/32609098/how-to-fast-change-image-brightness-with-python-opencv
- https://github.com/milahu/document-photo-auto-threshold
- https://stackoverflow.com/questions/56905592/automatic-contrast-and-brightness-adjustment-of-a-color-photo-of-a-sheet-of-pape
- https://stackoverflow.com/questions/39308030/how-do-i-increase-the-contrast-of-an-image-in-python-opencv
- https://stackoverflow.com/questions/63243202/how-to-auto-adjust-contrast-and-brightness-of-a-scanned-image-with-opencv-python
Methods to rotate an image with different libraries.
https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.rotate
from PIL import Image
with Image.open("hopper.jpg") as im:
# Rotate the image by 60 degrees counter clockwise
theta = 60
white = (255,255,255)
# Angle is in degrees counter clockwise
im_rotated = im.rotate(angle=theta, resample=Image.Resampling.BICUBIC, expand=1, fillcolor=white)
destructive rotation, loses image data
import cv2
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, 1)
rotated = cv2.warpAffine(image, M, (w, h))
non-destructive rotation, keeps image data
import imutils
rotate = imutils.rotate_bound(image, angle)
destructive or non-destructive rotation, can be chosen py parameter
reshape
from scipy.ndimage import rotate as rotate_image
rotated_img1 = rotate_image(input, angle, reshape, mode, cval)