The repo gives the necessary steps to set the latest Tesseract OCR engine (3.04.01) on a AWS EC virtual machine.
Alternatively, you can copy tess-deploy.sh
script, then run for once. sudo bash tess-deploy.sh
😁.
ssh <environment_name>
sudo yum update
sudo yum install autoconf aclocal automake
sudo yum install libtool
sudo yum install libjpeg-devel libpng-devel libtiff-devel zlib-devel
cd ~/libs
mkdir leptonica && cd leptonica
wget http://www.leptonica.com/source/leptonica-1.73.tar.gz
tar -zxvf leptonica-1.73.tar.gz
rm leptonica-1.73.tar.gz
cd leptonica-1.73
./configure
make
sudo make install
cd ~
mkdir tesseract && cd tesseract
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz
tar -zxvf 3.04.01.tar.gz
rm 3.04.01.tar.gz
cd tesseract-3.04.01
./autogen.sh
./configure
make
sudo make install
sudo ldconfig
cd /usr/local/share/tessdata
sudo wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.eng.tar.gz
sudo tar xvf tesseract-ocr-3.02.eng.tar.gz
sudo rm tesseract-ocr-3.02.eng.tar.gz
export TESSDATA_PREFIX=/usr/local/share/
sudo mv tesseract-ocr/tessdata/* .
nano ~/.bash_profile
export TESSDATA_PREFIX=/usr/local/share/
tesseract
(1) - Use grab-train-langs.sh
to obtain all language training files, or customize as your needs.
- Alan Gunning, author of the original blog post.
- shantanusingh, author of Tesseract-Amazon-AMI gist.
- Abdullah Barrak upgrade, and shell scripts.