A tool that converts PDF documents to Markdown format using OpenAI's vision model for accurate text extraction and formatting.
- π Convert PDF files to Markdown format
- β Maintain mathematical equations formatting
- πΊ Support for Chinese text
- πΌοΈ Automatic image processing and handling
- π Progress bar for conversion status
- Clone the repository:
git clone https://github.com/AbyssSkb/pdf2md.git
cd pdf2md- Install dependencies:
pip install -r pyproject.toml- Create a
.envfile with your OpenAI configuration:
OPENAI_API_KEY=your_api_key
OPENAI_BASE_URL=your_base_url (optional)
OPENAI_LLM_MODEL=your_preferred_model (default: gpt-4o)The converter supports command-line arguments for input and output files:
python main.py input.pdf --output output.mdOr simply:
python main.py input.pdfThe converted markdown will be saved to the specified output file (defaults to output.md).
- Python 3.10 or higher
- OpenAI API access
- pdf2image
- python-dotenv
Note: pdf2image requires additional system dependencies:
- On Windows: Install poppler
- On Linux:
apt-get install poppler-utils- On macOS:
brew install popplerFor detailed installation instructions, please check pdf2image documentation.
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.