Skip to content

kumar045/DocumentsReader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

# DocumentsReader

# Document to JSON Converter

This project provides a script to extract data in json format from various types of documents (PDF, DOCX, TXT, PPT, Excel, HTML) using a pre-trained model.

## Requirements

Before you begin, ensure you have the following software installed:

- Python 3.10+
- pip (Python package installer)

## Installation

1. Install Poppler (required for `pdf2image`):

    - Ubuntu/Debian:
      ```sh
      sudo apt-get install poppler-utils
      ```
    - macOS (using Homebrew):
      ```sh
      brew install poppler
      ```
2. Install the packages
      ```sh
      pip install -r requirements.txt
      ```
## Example

1. The script will load document, apply the pre-trained model, and generate JSON responses.

- Documents Processing:
  - The script loads a pre-trained model using the `transformers` library.
  - It generates a JSON response using the model.
  - The responses can be printed or saved for further analysis.

This README provides an overview of how to set up and use the scripts for to generating JSON data from these documents using a pre-trained model.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages