ghostscript

Ghostscript

Compress PDFs

This script will create a compressed file "output.pdf" of the original PDF handed as argument:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dDownsampleColorImages=true -dColorImageResolution=150 -dNOPAUSE  -dBATCH -sOutputFile=output.pdf input.pdf

-> This works also for documents which do not allow text highlighting/markup, ...

Other pdf tools

Requirements

gs (sudo apt-get install ghostscript)
pdftk (sudo apt-get install pdftk)
pdfjam (sudo apt-get install pdfjam)
pdftocairo (sudo apt-get install poppler-utils)

Repair broken pdf's

First try using ghostscript | also useful for pdf compression

gs -o output.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress input.pdf 
#                                                ^^^^^^
#                                    = quality (prepress = highest)
# ----------------------------------------------------------------------------
# Set quality:
# -dPDFSETTINGS=/screen   (screen-view-only quality, 72 dpi images)
# -dPDFSETTINGS=/ebook    (low quality, 150 dpi images)
# -dPDFSETTINGS=/printer  (high quality, 300 dpi images)
# -dPDFSETTINGS=/prepress (high quality, color preserving, 300 dpi imgs)
# -dPDFSETTINGS=/default  (almost identical to /screen)
# ----------------------------------------------------------------------------
# More fine grained reductions:
# -dDownsampleColorImages=true -dColorImageResolution=110

Second try using poppler-utils

If rescure with ghostscript fails but e.g. evince (but not Adobe Reader as an example) displays it correctly it give poppler-utils a try:

pdftocairo -pdf input.pdf output.pdf

Unify page size

Make all pages -papersize (units) or --paper (format; f.e.x --paper a4paper) big using pdfjam.

pdfjam --outfile output.pdf --papersize '{5.5in,8.5in}' input.pdf

Cut out pages / Slicing

Using pdftk with cat option:

pdftk longPdf.pdf cat 12-15 60 65-end output outfile_p12-15+p60+65-lastPage.pdf

OCR

The sad story: it is really hard to get pdf's with OCR generated text overlay (>> searchable pdf's).

Tesseract

From version 4 (uses neural nets) and upwards tesseract produces quite good results (currently an alpha version and you need to do some hacks to compile it from sources; f.ex. commening out the if statement where leptonica is searched and only use cmake's find_package()).

tesseract test-onepager.tif -l deu out pdf
#         ^^^^^^^^^^^^^^^^  ^^^^^      ^^
#    input: (must be tiff)   ||        | 
#                       language     output a pdf instead of an image

For languages you may need to download additional tessdata files (into the 'tessdata' directory probably located in /usr/share/tesseract-ocr/tessdata or /usr/share/tessdata).

Exports pdf's with text overlay using the pdf option like above.

PDF-XChange Editor

OK. This is not the best option (and Windows and NOT open-source tool) but it works quite well. There is also a portable version and work under Linux using Wine or PlayOnLinux (not tested by my side).

PDF-XChange Editor has an integraded OCR Engine which works well. But no Tesseract integration.

Exports pdf's with text overlay.

Adobe Acrobat

In my experience the best option but non-free.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License *.

Code (snippets) are licensed under a MIT License *.

* Unless stated otherwise

Home

Python 3

^(un)fold

Snippets

General

Libs

Linux/bash

^(un)fold

Guides

Scripts

Git

^(un)fold

C/C++

^(un)fold

Video

^(un)fold

Databases

^(un)fold

PostgreSQL

Misc

^(un)fold

Windows

^(un)fold

Mac

^(un)fold

General

SW recommendations

^(un)fold

(Angular) Dart

^(un)fold

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ghostscript

Ghostscript

Compress PDFs

Other pdf tools

Requirements

Repair broken pdf's

First try using ghostscript | also useful for pdf compression

Second try using poppler-utils

Unify page size

Cut out pages / Slicing

OCR

Tesseract

PDF-XChange Editor

Adobe Acrobat

Home

Python 3

Linux/bash

Git

C/C++

Video

Databases

Misc

Windows

Mac

SW recommendations

(Angular) Dart

Becoming a Software Eng

Clone this wiki locally