Skip to content

Easily add OCR to scanned Books and Documents PDFs using Google Colab , supports many languages .

License

Notifications You must be signed in to change notification settings

MossabDiae/colab-book-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Google Colab Pdf OCR

Add OCR to your PDF books and documents easily using Google Colab , this will help you search text content or copy text form PDFs made from scanned images.

How to use:

  1. Open the notebook in Google Colab Open In Colab

  2. Set variables in the first Cell

    • make sure original_pdf matches the pdf's file name.
    • set correct lang_code (ara = Arabic, eng = English , jpn = Japanese, ..etc ) more codes here
    • (optional) you can set first and last pages to ocr only a range/ chapter ..etc
  3. Upload the pdf or uncomment # !wget in the second cell and set the correct url. wget will make sure to set the correct name to the PDF when downloaded .

  4. DONE ! Run all cells Runtime > Run all or run them cell by cell without skipping .

Features

  • Easy setup.
  • Basic Error checking .
  • Shows Progress.

Credit:

Google Colab Tesseract

About

Easily add OCR to scanned Books and Documents PDFs using Google Colab , supports many languages .

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published