What is this?
A tool that translates pdf files using pre-trained language models. It supports various models, such as those from Helsinki-NLP.
➜ input_pdf_path=media/test.pdf # Path to pdf file
➜ output_pdf_path=media/translated.pdf # Path to output file
➜ translation_model=Helsinki-NLP/opus-mt-de-en # https://huggingface.co/Helsinki-NLP
$ pip3 install -r requirements.txt
$ python3 main.py '$input_pdf_path' '$output_pdf_path' '$translation_model'- Extract text from a pdf file.
- Split text into chunks (sentences).
- Translate sentences using a pre-trained model.
- Save translated sentences to a new pdf file.
The tool automatically uses the GPU for translation if available; otherwise, it falls back to the CPU.