Skip to content

Perform batch OCR processing for images and gather the text into a formatted file with Gemini 1.5 Flash

License

Notifications You must be signed in to change notification settings

yuandere/screenshotscribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Logo

Screenshot Scribe

Perform batch OCR processing on your images and screenshots, and gather the text into one formatted file with Gemini 1.5 Flash.

About

Ever take lots of screenshots or bookmarks to reference later but never actually do? This tool is intended to make processing text from those images and screenshots much easier.

Gemini 1.5 Flash was chosen for the job since it greatly outperforms open-source OCR solutions like Tesseract and EasyOCR, but is quicker and free/cheaper relative to many other multimodal LLMs.

Features

  • Batch OCR processing of images
  • Easy to customize system prompt
  • .json output by default, with optional formatted .md and .docx file types

Getting Started

Prerequisites

You will need to have the following:

Installation

  1. Clone the repo
    git clone https://github.com/yuandere/screenshotscribe
  2. Create a .env file in the project directory and add your API key
    GEMINI_API_KEY = XXXXXX
    
  3. Move any images you want processed into the folder /images_to_process
  4. Run
    uv run screenshotscribe
    
    optionally add the -t flag to specify output file type: (j)son, (m)arkdown, or (d)ocx
    uv run screenshotscribe -t m
    

Contributing

Contributions are welcome. Feel free to create a pull request or submit an issue for new features, fixing bugs, or improving documentation.

License

Distributed under the MIT License. See LICENSE.txt for more information.

Acknowledgments

About

Perform batch OCR processing for images and gather the text into a formatted file with Gemini 1.5 Flash

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages