Skip to content

A tool that converts PDF documents to Markdown format using OpenAI's vision model

License

Notifications You must be signed in to change notification settings

AbyssSkb/pdf2md

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ PDF to Markdown Converter

Python License OpenAI

A tool that converts PDF documents to Markdown format using OpenAI's vision model for accurate text extraction and formatting.

✨ Features

  • πŸ“ Convert PDF files to Markdown format
  • βž— Maintain mathematical equations formatting
  • 🈺 Support for Chinese text
  • πŸ–ΌοΈ Automatic image processing and handling
  • πŸ“Š Progress bar for conversion status

πŸš€ Installation

  1. Clone the repository:
git clone https://github.com/AbyssSkb/pdf2md.git
cd pdf2md
  1. Install dependencies:
pip install -r pyproject.toml
  1. Create a .env file with your OpenAI configuration:
OPENAI_API_KEY=your_api_key
OPENAI_BASE_URL=your_base_url (optional)
OPENAI_LLM_MODEL=your_preferred_model (default: gpt-4o)

πŸ“– Usage

The converter supports command-line arguments for input and output files:

python main.py input.pdf --output output.md

Or simply:

python main.py input.pdf

The converted markdown will be saved to the specified output file (defaults to output.md).

πŸ› οΈ Requirements

  • Python 3.10 or higher
  • OpenAI API access
  • pdf2image
  • python-dotenv

Note: pdf2image requires additional system dependencies:

  • On Windows: Install poppler
  • On Linux: apt-get install poppler-utils
  • On macOS: brew install poppler

For detailed installation instructions, please check pdf2image documentation.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

A tool that converts PDF documents to Markdown format using OpenAI's vision model

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages