-
The main motive behind this project was that we often faced the problem of separately typing any content instead of copy-pasting from an already existing document or image which are not in typed format.
-
Hence, a text extractor which would simply scanning and extracting the content of the file would save loads of time and also reduce the chances of typographical error to 0%.
- Desktop Version (For images & PDF's) : http://18.222.220.89:5000/ (currently AWS instance is terminated)
- Mobile Version (Downloadable APK package only for images) : https://drive.google.com/file/d/1JLeMwO4LZjzQcbgoA5ARX-z3L8OsZ4-M/view?usp=sharing
-
Our system takes the scanned image/document from the user as an input.
-
Then perform some image pre-processing techniques, like scaling, binarization and noise removal.
-
Use Optical Character Recognition using Tesseract Engine and extract the text.
-
The link http://18.222.220.89:5000/ lands on this page, where you can submit the file from which you want to extract text.
-
After uploading and submitting a file, the result appear as shown in the image and you click on Copy To Clipboard to copy and use the text as you want.
-
After downloading the APK package from this link, install it in your device and start the app.
-
Upload an image and the results come out as follows. Then simply copy-paste the text and use it as per you requirement.
- Flask
- Tesseract OCR Engine
- TensorFlow, OpenCV
- Flutter
- AWS (Deployment)