Skip to content

AI-powered PDF to data converter designed to extract information from PDF documents and convert it into structured data.

Notifications You must be signed in to change notification settings

maheshsathe07/AI-powered-PDF-to-Data-Converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AI-Powered PDF to Data Converter

Convert PDF files into structured data using AI-powered extraction and generate insights from the extracted data.

Overview

This project provides a solution for converting PDF files into structured data formats, such as JSON, using advanced AI techniques. It offers various functionalities including generalized data extraction, schema-based extraction, and generating insights from the extracted data.

Functionalities

  1. Generalized Data Extraction: Extract all data from PDF files in JSON format.
  2. Schema-Based Extraction: Extract data based on predefined schemas for specific document types.
  3. Generate Insights: Upload the extracted data in CSV format and generate insights by plotting graphs based on user queries.

Dependencies

The project utilizes the following libraries and tools:

  • Langchain: For AI-powered PDF data extraction.
  • Gemini-pro: For converting extracted data into JSON format.
  • Pydantic: For schema-based data extraction.
  • Streamlit: For building interactive web applications.
  • PandasAi: For data Visualization.
  • GROQ with Mixtral: For getting query based insights from .csv data.

Screenshots

  1. Dashboard Dashboard

  2. Extract Data Extract Data

  3. Extracted Data Output Extracted Data Output

  4. Schema-Based Data Extraction Schema-Based Data Extraction

  5. Schema-Based Data Output Schema-Based Data Output

  6. Generate Insights Generate Insights

  7. Generated Insights Output Generated Insights Output

  8. Pie Chart Pie Chart

How to Run

  1. Clone the repository:

    git clone https://github.com/maheshsathe07/AI-powered-PDF-to-Data-Converter.git
  2. Navigate to the project directory:

    cd src
  3. Install the required dependencies:

    pip install -r requirements.txt
  4. Run the Streamlit app:

    streamlit run main.py
  5. Access the app in your browser at http://localhost:8501.

Contributors

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

AI-powered PDF to data converter designed to extract information from PDF documents and convert it into structured data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •