Optical character recognition (OCR) is the technique that allows a computer to read static images of text, and convert them into editable, searchable data.
Current OCR model, such as Google Cloud Vision, can perform well on text recognition, however, it cannot provide a correct reading order. Textract utilizes image processing techniques for layout analysis and determines the reading order based on topological ordering. The result shows that we can improve 20% Levenshtein similarity on Google OCR model by applying our layout analys
Furthermore, this package also provides other deep learning model, CRNN, for the text recognition. The original CRNN paper can be referred by "An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" and also thanks to the github repo for building the Tensorflow version CRNN model.
You can install all python dependencies by both Anaconda or pip.
> conda env create -f conda_env.yml
(if you want to use tensorflow-gpu, please replace conda_env.yml to conda_env_gpu.yml) This will create an Anaconda environment textract.
or
> pip install -r requirements.txt
(if you want to use tensorflow-gpu, please replace requirements.txt to requiremetns_gpu.txt)
Please download the pretrain weights and the model. Put both files in the ./textract/model.
Please follow the instructions in Google Vision API How-to Guild to setup your Google API Services. Remember to download the service account key (.json file) and add it as an environment variable in your computer.
> export GOOGLE_APPLICATION_CREDENTIALS = ~/path/to/your/service_account_key.json
For more detail, you can watch this youtube video, Setting up API and Vision Intro - Google Cloud Python Tutorials p.2
You can run a simple test easily by execute the below command in the terminal. The output text file will be generated into your output folder.
> python app.py --img_dir ./path/to/image/folder --out_dir ./path/to/output/folder
For example,
> python app.py --img_dir ./test/images --out_dir ./test/output
Then, you will get your ocr text files in your output folder.
If you want to do the similarity test for a batch images, you can utilize the evaluate.py. The image folder should be organized as below structure (but the folder name can be arbitrary).
- your images folder
- your groundtruth file folder
Then, run the below command in your terminal.
> python evaluate.py --img_dir path/to/image/folder --gd_dir path/to/groundtruth/folder --out_dir path/to/output/folder
For example
> python evaluate.py --img_dir ./evaluate/images --gd_dir ./evaluate/groundtruth --out_dir ./evaluate/output
- the generated result folder








