Skip to content

artyom-ar/distributed_ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

distributed_ocr

distributed_ocr applies OCR to images in a distributed manner. The server runs on AWS EC2 instances, with the clients using SQS queues to request jobs to be completed by the server. The result is then uploaded to S3.

Usage

You need to create a Worker AMI and a manager AMI(and supply them to the local client), which consist of Ubuntu instances with the respective JAR uploaded to the home dir, with Java installed on both and Tesseract installed on the Worker AMI. To use the client you need to provide it with the Manager/Worker AMIs.

An ARN string also needes to be provided to give the manager the necessary AWS permissions.

To run the client enter the following command: java -jar yourjar.jar inputFileName outputFileName n managerAmId workerAmId terminate

where:

  • inputFileName is a file with URLs to the images you would like to apply OCR to
  • outputFileName is an HTML formatted file consisting of the strings returned by the OCR app
  • n is the number of links you'd like a single worker to process(on average)
  • terminate(can be any string) signals to the manager this will be the last job he will receive. If you'd like to run multiple jobs, simply withhold the terminate message until the last job you want to run.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages