pdf2md_llm is a Python package that converts PDF files to Markdown using a local Large Language Model (LLM).
The package leverages the pdf2image library to convert PDF pages to images and a vision language model to generate Markdown text from these images.
- Convert PDF files to images.
- Generate Markdown text from images using a local LLM.
- Keep your data private. No third-party file uploads.
You need a CUDA compatible GPU to run local LLMs with vLLM.
You can use pip to install the package:
pip install pdf2md-llmYou can use the pdf2md_llm package via the command line interface (CLI).
To convert a PDF file to Markdown, run the following command:
pdf2md_llm <pdf_file> [options]pdf_file: Path to the PDF file to convert.--model: Name of the model to use (default:Qwen/Qwen2.5-VL-3B-Instruct-AWQ).--dtype: Data type for the model weights and activations (default:None).--max_model_len: Max model context length (default:7000).--prompt: Custom prompt for the LLM. (default:None).--size: Image size as a tuple (default:(700, None)).--dpi: DPI of the images (default:200).--fmt: Image format (default:jpeg).--output_folder: Folder to save the output Markdown file (default:./out).
pdf2md_llm example.pdf --model "Qwen/Qwen2.5-VL-3B-Instruct-AWQ" --output_folder "./output"Currently the following Qwen2.5-VL models are supported:
Qwen/Qwen2.5-VL-3B-InstructQwen/Qwen2.5-VL-3B-Instruct-AWQQwen/Qwen2.5-VL-7B-InstructQwen/Qwen2.5-VL-7B-Instruct-AWQQwen/Qwen2.5-VL-72B-InstructQwen/Qwen2.5-VL-72B-Instruct-AWQ
If you want to use a different model, feel free to add a vLLM compatible model to the factory function llm_model() in llm.py
You can use the pdf2md_llm package via the Python API.
Basic usage:
from vllm import SamplingParams
from pdf2md_llm.llm import llm_model
from pdf2md_llm.pdf2img import PdfToImg
pdf2img = PdfToImg(size=(700, None), output_folder="./out")
img_files = pdf2img.convert("example.pdf")
llm = llm_model(
model="Qwen/Qwen2.5-VL-3B-Instruct-AWQ", # Name of the huggingface model
dtype="half", # Model data type
)
sampling_params = SamplingParams(
temperature=0.1,
min_p=0.1,
max_tokens=8192,
stop_token_ids=[],
)
# Append all pages to one output Markdown file
for img_file in img_files:
markdown_text = llm.generate(
img_file, sampling_params=sampling_params
) # convert image to Markdown with LLM
with open("example.md", "a", encoding="utf-8") as myfile:
myfile.write(markdown_text)For a full example, see example_api.py
This project is licensed under the MIT License. See the LICENSE file for details.
-
pdf2image for converting PDF files to images.
-
Qwen2.5-VL LLM model
-
vLLM for efficient LLM model inference