Convert PDF files into structured data using AI-powered extraction and generate insights from the extracted data.
This project provides a solution for converting PDF files into structured data formats, such as JSON, using advanced AI techniques. It offers various functionalities including generalized data extraction, schema-based extraction, and generating insights from the extracted data.
- Generalized Data Extraction: Extract all data from PDF files in JSON format.
- Schema-Based Extraction: Extract data based on predefined schemas for specific document types.
- Generate Insights: Upload the extracted data in CSV format and generate insights by plotting graphs based on user queries.
The project utilizes the following libraries and tools:
- Langchain: For AI-powered PDF data extraction.
- Gemini-pro: For converting extracted data into JSON format.
- Pydantic: For schema-based data extraction.
- Streamlit: For building interactive web applications.
- PandasAi: For data Visualization.
- GROQ with Mixtral: For getting query based insights from .csv data.
-
Clone the repository:
git clone https://github.com/maheshsathe07/AI-powered-PDF-to-Data-Converter.git
-
Navigate to the project directory:
cd src -
Install the required dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run main.py
-
Access the app in your browser at
http://localhost:8501.
This project is licensed under the MIT License - see the LICENSE file for details.







