Requires the libraries tesseract >= 5.4.1, imagemagick >= 7.1.1-36 and bc >= 1.07.1.
Example installation command (Manjaro):
pamac install tesseract imagemagick bcThe accepted extensions are jpg, jpeg and png.
Run to extract text:
./scanner.py /path/to/imagesIn the same directory of the images that was informed when executing the command, a txt will be generated with the extracted information. An image with the suffix _converted will also be generated, this is the image processed for better reading.
To clear the path (remove txt and images with _converted):
./scanner.py /path/to/images --clearMore information about the textcleaner script can be found at this link.