-
Hi, I have some PDF files which mix portrait and landscape layouts. In order to exclude some headers/footers that are not excluded automatically, I want to set margins. However, margins are different depending on the page layout. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
You need to invoke method import pymupdf, pymupdf4llm
doc = pymupdf.open("input.pdf")
hdr_info = pymupdf4llm.IdentifyHeaders(doc) # perform a one-time scan of font sizes
mdtext = ""
for page in doc:
rect = page.rect # the page rectangle
if rect.width > rect.height: # do some computations for margins
margins = (...) # landscape values
else:
margins = (...) # portrait values
mdtext += pymupdf4llm.to_markdown(doc, pages=[page.number], margins=margins, hdr_info=hdr_info) |
Beta Was this translation helpful? Give feedback.
You need to invoke method
to_markdown
page-wise, i.e. use the parameterpages=[page.number]
. Then you can give each page some special treatment and / or adjust invocation parameters.The one thing to take care of is to execute Section Header Identifications only once and provide that result to each invocation.
Here is a snippet.