PDF Chatter

Question Answering over PDFs using Nougat-OCR and GPT-4.

Getting Started

Prerequisites

Python 3.9 or later
a NVIDIA GPU with CUDA support
environment variable OPENAI_API_KEY set to your OpenAI API key

Installation

pip install pdf-chatter

Usage

pdf-chatter path/to/pdf

which opens a REPL where you can ask questions, and GPT-4 will answer them based on the content of the PDF.

Note: pdf-chatter will save a .mmd (multi-markdown) next to the target pdf. This contains the extracted text from the PDF, and is used as a cache so the same PDF doesn't need to be re-processed every time you run pdf-chatter.

Additionally you can run the summarize command to get a summary of the PDF before entering the REPL.

pdf-summarize path/to/pdf

Example

Tips & Notes

Nougat-OCR doesn't extract images, so any questions about images in the document will not be answered
Nougart-OCR works best on documents similar to scientific papers, reports, etc.

How it works

Extract text from the PDF using Nougat-OCR
The entire document is fed to GPT-4 as part of its chat history via the OpenAI API
A simple REPL collects the user's questions and feeds them to GPT-4, which streams the answer back.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
assets		assets
pdf_chatter		pdf_chatter
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Chatter

Getting Started

Prerequisites

Installation

Usage

Example

Tips & Notes

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

david-andrew/pdf-chatter

Folders and files

Latest commit

History

Repository files navigation

PDF Chatter

Getting Started

Prerequisites

Installation

Usage

Example

Tips & Notes

How it works

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages