Skip to content

mattweg/pdf-form-filler

Repository files navigation

PDF Form Filler

Tests Lint License: MIT Python 3.8+

Automated tool for filling insurance claim forms and other PDF documents. This tool analyzes PDF forms, detects field locations, and fills them with provided data.

Features

  • Automatic Field Detection: Uses computer vision to detect form fields in PDFs
  • Template Management: Save and reuse field templates for consistent form filling
  • Multiple Filling Methods: Choose between image overlay or ReportLab methods
  • Google Drive Integration: Store and retrieve forms from Google Drive
  • Batch Processing: Fill multiple forms with different data sets
  • Command Line Interface: Easy-to-use CLI for all operations

Installation

  1. Install system dependencies:
sudo apt-get update
sudo apt-get install poppler-utils tesseract-ocr
  1. Install Python packages:
cd pdfform
pip install -r requirements.txt
  1. Make the CLI executable:
chmod +x pdfform.py

Quick Start

1. Analyze a PDF Form

First, analyze your PDF to detect fields and create a template:

./pdfform.py analyze path/to/form.pdf -o templates/

This creates a template file with detected field coordinates.

2. Edit Template (Optional)

Review and name the detected fields:

./pdfform.py edit-template templates/form_template.json

3. Create Data File

Generate a data template from your form template:

./pdfform.py create-data templates/form_template.json -o data/my_data.json

Edit the JSON file to add your actual data.

4. Fill the Form

Fill the PDF with your data:

./pdfform.py fill path/to/form.pdf templates/form_template.json data/my_data.json -o output/filled_form.pdf

Usage Examples

Quick Fill (Auto-detect fields)

./pdfform.py quick-fill form.pdf data.json --auto-detect

Batch Processing

for data in data/*.json; do
    output="output/$(basename $data .json)_filled.pdf"
    ./pdfform.py fill form.pdf template.json "$data" -o "$output"
done

Google Drive Integration

Upload forms to Google Drive:

./pdfform.py drive upload output/filled_form.pdf --folder "Insurance_Claims"

List forms in Drive:

./pdfform.py drive list --folder "Insurance_Claims"

Download a form:

./pdfform.py drive download FILE_ID output/downloaded_form.pdf

Configuration

Google Drive Credentials

For Google Drive integration, place your service account credentials in one of:

  • ~/.config/gcloud/application_default_credentials.json
  • ~/google_credentials.json
  • config/google_credentials.json

Commands

  • analyze - Analyze PDF and extract field coordinates
  • fill - Fill PDF using template and data
  • quick-fill - Quick fill with optional auto-detection
  • create-data - Create data template from form template
  • edit-template - Interactively edit field template
  • drive upload - Upload PDF to Google Drive
  • drive download - Download PDF from Google Drive
  • drive list - List PDFs in Google Drive
  • setup - Check dependencies and configuration

Workflow for Insurance Claims

  1. Initial Setup (one-time per form type):

    ./pdfform.py analyze insurance_claim_form.pdf
    ./pdfform.py edit-template templates/insurance_claim_form_template.json
  2. Monthly Claims:

    # Create data file for this month
    cp data/sample_insurance_data.json data/claim_2024_12.json
    # Edit with current month's data
    nano data/claim_2024_12.json
    # Fill the form
    ./pdfform.py fill insurance_claim_form.pdf templates/insurance_claim_form_template.json data/claim_2024_12.json -o output/claim_2024_12_filled.pdf
    # Upload to Drive
    ./pdfform.py drive upload output/claim_2024_12_filled.pdf

Advanced Features

Custom Field Types

The tool supports different field types:

  • text: Regular text fields (default)
  • checkbox: For checkboxes (places X or checkmark)
  • signature: For signature fields

Fine-tuning Detection

Adjust detection parameters in the analyze command:

./pdfform.py analyze form.pdf --dpi 400 --interactive

Higher DPI provides better accuracy but slower processing.

Troubleshooting

  • Fields not detected: Try higher DPI or use interactive mode
  • Text misaligned: Adjust coordinates in template JSON
  • Google Drive errors: Check credentials and permissions
  • Missing dependencies: Run ./pdfform.py setup to check

Data Privacy

  • All processing is done locally on your machine
  • Google Drive integration is optional
  • No data is sent to external services

About

Automated PDF form filler for insurance claims and other documents

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages