Document Processor using Docling

Sample python scripts for converting PDF/DOCX documents to Markdown format with enhanced image processing capabilities using Docling.

Adds a feature to the original Docling library to use Gemini models for describing images.

Features

PDF to Markdown conversion with image extraction
Two processing modes:
- Local VLM (Vision Language Model) processing
- Remote VLM processing via API
Automatic image captioning and description generation
Support for database schema visualization and description
High-quality image scaling and processing
CUDA acceleration support for faster processing

Requirements

Python 3.x
CUDA-capable GPU (recommended)
Required Python packages:
- docling
- docling_core
- flask (for remote processing)
- requests
- python-dotenv
- flashattention (required for local VLM processing)

Configuration

Environment Variables

For remote processing, you need to set up the following environment variable in your .env file:

GEMINI_API_KEY=your_api_key_here
GEMINI_MODEL_NAME=gemini_model_name

as well as this default value (also available in .env.example):

GEMINI_URL=https://generativelanguage.googleapis.com/v1/models/{model_name}:generateContent

Usage

Local Processing

Place your PDF files in the input/ directory
Open local_vlm_pdf_to_md.py and modify the input_doc_paths list to include your PDF files:

input_doc_paths = [
    Path("input/your-file.pdf"),
    # Add more files as needed
]

Run the local processing script:

python local_vlm_pdf_to_md.py

The processed files will be saved in the output/ directory.

Remote Processing

Start the proxy server:

python gemini_proxy.py

Note: The gemini_proxy.py is a Flask application that translates OpenAI-compatible payloads from docling to Gemini-compatible format.

Place your PDF files in the input/ directory
Open remote_vlm_pdf_to_md.py and modify the input_doc_paths list to include your PDF files:

input_doc_paths = [
    Path("input/your-file.pdf"),
    # Add more files as needed
]

Run the remote processing script:

python remote_vlm_pdf_to_md.py

The processed files will be saved in the output/ directory.

NOTE: The same procecess can be applied to convert word files to markdown using the remote_vlm_word_to_md.py script.

Output Format

The processor generates Markdown files with:

Extracted text content
Embedded images
Image captions
Detailed image descriptions
Database schema analysis (when applicable)

Directory Structure

docling_processor/
├── input/          # Input PDF files
├── output/         # Generated Markdown files
├── prompt_templates/ # Prompts used by VLM
├── __init__.py
├── local_vlm_pdf_to_md.py # Local processing
├── remote_vlm_pdf_to_md.py # Remote processing
├──  gemini_proxy.py # Gemini proxy
├── README.md
└── .env

Notes

You can modify the VLM instructions (prompts) in the prompt_templates/tmf_images.txt directory. The default prompt is optimized for TMForum (TMF) documents.
The local processing mode uses the IBM Granite Vision model
Remote processing mode requires a running Gemini proxy server that translates OpenAI-compatible payloads to Gemini-compatible format
Image processing quality can be adjusted via the images_scale parameter
Processing time may vary based on document size and complexity
FlashAttention is required for optimal performance with local VLM processing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Processor using Docling

Features

Requirements

Configuration

Environment Variables

Usage

Local Processing

Remote Processing

Output Format

Directory Structure

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
input		input
output		output
prompt_templates		prompt_templates
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
gemini_proxy.py		gemini_proxy.py
local_vlm_pdf_to_md.py		local_vlm_pdf_to_md.py
remote_vlm_pdf_to_md.py		remote_vlm_pdf_to_md.py
remote_vlm_word_to_md.py		remote_vlm_word_to_md.py
requirements.txt		requirements.txt

amirkiarafiei/docling-processor

Folders and files

Latest commit

History

Repository files navigation

Document Processor using Docling

Features

Requirements

Configuration

Environment Variables

Usage

Local Processing

Remote Processing

Output Format

Directory Structure

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages