distributed_ocr

distributed_ocr applies OCR to images in a distributed manner. The server runs on AWS EC2 instances, with the clients using SQS queues to request jobs to be completed by the server. The result is then uploaded to S3.

Usage

You need to create a Worker AMI and a manager AMI(and supply them to the local client), which consist of Ubuntu instances with the respective JAR uploaded to the home dir, with Java installed on both and Tesseract installed on the Worker AMI. To use the client you need to provide it with the Manager/Worker AMIs.

An ARN string also needes to be provided to give the manager the necessary AWS permissions.

To run the client enter the following command: java -jar yourjar.jar inputFileName outputFileName n managerAmId workerAmId terminate

where:

inputFileName is a file with URLs to the images you would like to apply OCR to
outputFileName is an HTML formatted file consisting of the strings returned by the OCR app
n is the number of links you'd like a single worker to process(on average)
terminate(can be any string) signals to the manager this will be the last job he will receive. If you'd like to run multiple jobs, simply withhold the terminate message until the last job you want to run.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Local		Local
Manager		Manager
Worker		Worker
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

distributed_ocr

About

Uh oh!

Releases

Packages

Languages

artyom-ar/distributed_ocr

Folders and files

Latest commit

History

Repository files navigation

distributed_ocr

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages