ocr-test

Prerequisites

Drop the scanned images for a particular state in the training/input_images folder
run ./preprocess.sh to generate the training images
open jTessBoxEditorFX java program and edit the box files to be accurate for images in the training_images/processed folder (make sure to save as and overwrite after you edit the box values)
cd training and edit the train.sh file's last line to indicate where you want to move the finalized trained language file to. This should be the tessdata folder of wherever tesseract is installed 4.1 the script is precoded to receipt_nh as the output language name. You can change that in the train.sh file
run ./train.sh and you're done! the new language file will be copied to tessdata and available to the tesseract command using the -l flag

Happy OCRing

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
node_modules		node_modules
training		training
.gitignore		.gitignore
README.md		README.md
config		config
fixnumbers.js		fixnumbers.js
opencv.js		opencv.js
package.json		package.json
preprocess.sh		preprocess.sh
receipt_nh.user-patterns		receipt_nh.user-patterns
textcleaner		textcleaner
wordlist		wordlist