Skip to content

Find and remove watermarks in PDF file #1855

Discussion options

You must be logged in to vote

The watermark in you example file are stored as so-called marked-content /Artifacts.
There is no direct, dedicated high-level function in PyMuPDF to deal with these object types.
But you can use PyMuPDF's low-level interface to locate and remove them if you follow a strict procedure.

1. Determine presence of marked-content watermarks

First standardize the page's /Contents objects. This will produce a predictable source code structure - and also repair any potential issues. There also will be left over only one such object.
Then confirm the presence of this watermark type.

page.clean_contents()
xref = page.get_contents()[0]  # get xref of resulting /Contents object
cont = bytearray(page.re…

Replies: 3 comments 13 replies

Comment options

You must be logged in to vote
9 replies
@JorjMcKie
Comment options

@val-fatale
Comment options

@MoritzImendoerffer
Comment options

@JorjMcKie
Comment options

@dezoito
Comment options

Answer selected by Jason-XII
Comment options

You must be logged in to vote
1 reply
@Jason-XII
Comment options

Comment options

You must be logged in to vote
3 replies
@Jason-XII
Comment options

@Jason-XII
Comment options

@marcodkts
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants