Skip to content

This application uses Tesseract and Google's Gemini API to convert images of code into clean Python. It then launches an interactive session to answer your questions about the generated code.

License

Notifications You must be signed in to change notification settings

Freak29/CodeScribe-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CodeScribe AI: From Whiteboard to Working Code

Open In Colab License: MIT Python Status

CodeScribe AI is a powerful proof-of-concept tool that bridges the physical and digital worlds of programming. It transforms handwritten pseudocode or algorithms from an image into clean, executable Python code. Leveraging a sophisticated pipeline of computer vision and a Large Language Model (LLM), it not only generates the code but also allows you to interactively ask questions about its logic, complexity, and functionality.

Turn your whiteboard sketches and notebook scribbles into a reality!


🚀 The Core Idea

In the world of software development, ideas often begin on a whiteboard or in a notebook. Translating these handwritten thoughts into functional code is a manual, time-consuming, and error-prone process. This project automates that translation.

Before: The Handwritten Mess

A typical input is a photo of handwritten code, which may be messy, have inconsistent formatting, and contain non-code elements.

Handwritten Code Input

Input

After: The AI-Generated Result

The script outputs clean, commented, and functional Python code, followed by an interactive Q&A session to explain it.

Generated Code and Q&A Output

Output


⚙️ Project Workflow & Architecture

The magic happens in a four-stage pipeline. Each stage prepares the data for the next, culminating in an intelligent, interactive experience.

graph TD
    A[📷 Image Input] -->|image.png| B(Stage 1: Image Pre-processing);
    B -->|Grayscale & Binary Image| C(Stage 2: OCR with Tesseract);
    C -->|Messy Extracted Text| D(Stage 3: Code Generation with Gemini);
    D -->|Clean Python Code| E(Stage 4: Interactive Q&A with Gemini);

    subgraph "Computer Vision"
        B
        C
    end

    subgraph "Generative AI"
        D
        E
    end

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style E fill:#ccf,stroke:#333,stroke-width:2px
Loading
  1. Image Pre-processing (with OpenCV): The input image is first processed to be machine-readable. This involves:

    • Reading the Image: Loading the image file into memory.
    • Grayscaling: Converting the image from color to grayscale, as color information is not needed for OCR.
    • Binarization: Applying an Otsu and Binary Inverse threshold. This crucial step converts the image to pure black and white, making the characters stand out sharply from the background, which significantly improves OCR accuracy.
  2. Text Extraction (with Tesseract OCR): The pre-processed image is fed into the Tesseract engine.

    • Tesseract scans the image and performs Optical Character Recognition (OCR), converting the pixels representing characters into a raw text string.
    • As seen in the sample output, this raw text is often imperfect and contains recognition errors (e.g., rl att ee sean! or a{j-1) > afj)) ¢).
  3. Code Generation (with Google Gemini): This is where the LLM's power shines.

    • The messy, OCR-extracted text is sent to the Gemini API with a carefully crafted prompt.
    • The prompt instructs the model to act as a code interpreter, recognize that the input is OCR'd text with errors, and translate the underlying logic into a clean, single Python method.
    • Gemini doesn't just translate; it infers the original intent, corrects syntax errors, formats the code correctly, and even adds comments explaining how it interpreted the garbled sections.
  4. Interactive Q&A (with Google Gemini): After generating the code, the script enters a conversational loop.

    • The user can ask questions in natural language about the generated code.
    • For each question, a new prompt is sent to Gemini, providing the generated code as context. This allows the model to give accurate, context-aware answers about the algorithm's logic, time complexity, or specific lines of code. Architecture

      Architecture


🛠️ Key Technologies & Libraries

This project stands on the shoulders of several powerful open-source libraries and APIs.

  • OpenCV (cv2): The cornerstone of the image pre-processing stage. It provides the tools necessary to filter and transform the image to make it optimal for OCR.
  • Pytesseract: A Python wrapper for Google's Tesseract-OCR Engine. It serves as the bridge between our pre-processed image and the raw, extracted text.
  • Pillow (PIL): Used to handle image data and pass it between OpenCV and Pytesseract seamlessly.
  • Google Gemini (google-generativeai): The intelligent core of the application. The gemini-2.5-flash model is used for its speed and powerful reasoning capabilities. It excels at:
    • Error Correction: Fixing gibberish from the OCR process.
    • Contextual Understanding: Inferring programming logic (like loops and swaps) from incomplete pseudocode.
    • Code Generation: Producing syntactically correct and idiomatic Python.
    • Natural Language Processing: Powering the final interactive Q&A session.

🏁 Getting Started

Follow these steps to get the project running on your local machine.

1. Prerequisites

  • Python 3.8+
  • Tesseract-OCR Engine: This is a system dependency, not a Python package. You must install it separately.
    • Windows: Download and run the installer from the Tesseract at UB Mannheim page. Make sure to note the installation path.
    • macOS: brew install tesseract
    • Linux (Debian/Ubuntu): sudo apt-get install tesseract-ocr

2. Clone the Repository

git clone <your-repository-url>
cd <repository-directory>

3. Install Python Dependencies

Create a requirements.txt file with the following content:

opencv-python
numpy
pytesseract
Pillow
google-generativeai

Then, install them using pip:

pip install -r requirements.txt

4. Configuration

  • Tesseract Path (Windows Only): If you are on Windows, you may need to specify the path to your Tesseract installation within your Python script:
    # Example for Python script
    # import pytesseract
    # pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
  • Google Gemini API Key: The script will require a Gemini API key. It's recommended to set this up as an environment variable (API_KEY). You can get a free API key from Google AI Studio.

5. Usage

  1. Place an image of handwritten code in the project's root directory (or use the web interface to upload it).
  2. Run the application.
  3. Upload the image and click "Generate Code".
  4. Observe the output and interact with the Q&A bot!

✍️ Author

Vaibhav Shikhar Singh

About

This application uses Tesseract and Google's Gemini API to convert images of code into clean Python. It then launches an interactive session to answer your questions about the generated code.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published