Skip to content

SohamTolwala/Complex-PDF-Table-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Complex-PDF-Table-Extractor

Use Case

Handling PDFs with complex structures, especially those containing multiple tables, bills, or unstructured data, can be challenging for simple Retrieval-Augmented Generation (RAG) models. Standard RAG approaches often fail when dealing with complex tables or non-textual elements. This repository presents a solution that accurately detects and extracts table data from such PDFs, leveraging advanced multimodal vision models to ensure precise and structured output.

Approach

Our method follows these steps:

1. PDF Parsing:

Parse the PDF to identify individual pages.

2. Table Detection:

Utilize a transformer-based table detection model to locate tables on each page.

3. Table Extraction:

Crop the detected tables from the PDF pages. Use multimodal vision models (OpenAI Vision, Claude, Google Gemini Vision Pro) to extract structured data from these cropped table images.

4. Data Structuring:

Convert the extracted table data into a structured JSON format. Optionally, concatenate the JSON data into CSV or XLSX files for further analysis.

Future Integrations

This solution can be further enhanced by integrating it directly into a Multimodal RAG pipeline, allowing for seamless handling of both text and visual data within PDFs. Potential integrations include:

OpenAI Assistant:

  • Directly querying structured data files for insights and analysis.
  • Enhancing the accuracy of responses based on complex table data.
  • Developing an end-to-end automated workflow that combines table detection, data extraction, and advanced querying into a single pipeline.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published