Skip to content

nishisahlot/tesseract-on-aws

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Deploy Tesseract to AWS Elastic Beanstalk.

Introduction

The repo gives the necessary steps to set the latest Tesseract OCR engine (3.04.01) on a AWS EC virtual machine. Alternatively, you can copy tess-deploy.sh script, then run for once. sudo bash tess-deploy.sh 😁.


[1] SSH to your EC instance

ssh <environment_name>
sudo yum update

[2] Dependencies

sudo yum install autoconf aclocal automake
sudo yum install libtool
sudo yum install libjpeg-devel libpng-devel libtiff-devel zlib-devel

[3] Install Leptonica

cd ~/libs
mkdir leptonica && cd leptonica
wget http://www.leptonica.com/source/leptonica-1.73.tar.gz
tar -zxvf leptonica-1.73.tar.gz
rm leptonica-1.73.tar.gz
cd leptonica-1.73
./configure
make
sudo make install

[4] Install Tesseract

cd ~
mkdir tesseract && cd tesseract
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz
tar -zxvf 3.04.01.tar.gz
rm 3.04.01.tar.gz
cd tesseract-3.04.01
./autogen.sh
./configure
make
sudo make install
sudo ldconfig

[5] Tesseract Training Data.

cd /usr/local/share/tessdata
sudo wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.eng.tar.gz
sudo tar xvf tesseract-ocr-3.02.eng.tar.gz
sudo rm tesseract-ocr-3.02.eng.tar.gz
export TESSDATA_PREFIX=/usr/local/share/
sudo mv tesseract-ocr/tessdata/* .

[6] Source TESSERACT_PREFIX

nano ~/.bash_profile

Then Copy this line to the end:

export TESSDATA_PREFIX=/usr/local/share/

[7] Verify

tesseract

Notes

(1) - Use grab-train-langs.sh to obtain all language training files, or customize as your needs.


Credits

About

Deploy tesseract to aws elastic beanstalk

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%