Description
At this point WKHTMLtoPDF is deprecated, with the source repos archived and it drifting out of OS repos (Alpine as an example).
Would be good to allow an alternative to fill the non-dompdf gap, and then deprecate and remove WKHTMLtoPDF specific support in the future.
Instead of supporting specific export options, we could maybe support a generic interface to allow adapting to different options where desired. Something like a configurable command path with a placeholder parameter to take a location of where BookStack writes out HTML to for conversion.
Research
- Chromium - Can't see way to completely disable networking & JS.
- Firefox - Can't find direct way for PDF capture. Does have a
--no-remote
option which may disable networking?
For browsers, there's a WebDriver standard in progress which may open up possibilities in this area via a standard API.
- TCPDF - LGPL
- New version being built here, existing stable version support-only. New version slow to develop (look like it's been 6 years in the making).
- Need to test output against BookStack HTML.
- Need to assess options (disable network).
- WeasyPrint - BSD
- Need to test output against BookStack HTML.
- Looks pretty good.
- Need to assess options (disable network).
- Not specifically built for untrusted content, though should be able to disable network calls via python wrapper script to define custom URL fetcher.
- Need to test output against BookStack HTML.
- Pandoc - GPL (More of an abstraction layer to other libs)
- Need to test output against BookStack HTML.
- Need to assess options (disable network).
- Tested using pagejs-cli below, works fine, can pass options to underlying lib used, so output and security depends on underlying lib.
- PagedJS-CLI
- Does a fair job at output, but fails some common CSS properties used in example page. Really feels like it's intended for print use more that matching web output.
- Has good controls to prevent fetching.
There's also commercial offerings, which may or may not be better (PDFreactor, PrinceXML).
Still makes sense to me to create a generic command line rather than supporting specific libraries in this case, as it's a moving area, and by allowing a user to call their own wrapper script it can be built upon like for customizing generation options, and allow flexibility in solution used without over complicating our support. There could be other options to think about too (for example, running chrome in a network-limited container).
Implementation
- Add new
EXPORT_PDF_COMMAND
env option.- Supports placeholders:
{input_html_path}
- Path to input HTML file to convert.{output_pdf_path}
- Path that the output PDF file should be written to.
- Supports placeholders:
Notes: Should update existing LDAP_USER_FILTER
env option to support this format.