Basic PDF Summarization and Query System

Create a basic script that extracts text from a PDF by chapter, summarizes each chapter, stores these summaries in an SQLite database, and answers questions about these chapters using OpenAI's API.

Running the script

You just need to create an .env file and place there the OPENAPI api key.

To run execute the script:

python -m task

Approach

I did the task in the simplest way possible by covering the functionality asked in the assignment
Used PyMuPDF library to read and process the PDF
Used openai async client
Used Chat completions API to generate text summaries and ask questions about them

Assumptions

There is an "easy" way to split the pdf file in chapters by executing a regex expression here. It can be tweaked depending of the PDF structure.
The PDF to work with is called test.pdf. It can be changed here

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
task		task
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
README.md		README.md
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basic PDF Summarization and Query System

Running the script

Approach

Assumptions

About

Releases

Packages

Languages

adricu/openai-api-test

Folders and files

Latest commit

History

Repository files navigation

Basic PDF Summarization and Query System

Running the script

Approach

Assumptions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages