Skip to content

Pymupdf output different in google colab vs jupyter notebook #773

Discussion options

You must be logged in to vote

The text lines you mention represent image metadata, which this getText output variant produces for each encountered image of the page.
This can happen only if

  1. a page indeed has images
  2. the flags parameter of page.getText() requests that any images of the page should be included: flags & fitz.TEXT_PRESERVE_IMAGES is True. This option is set by default for "blocks".

A different behaviour therefore must go back to either a different flags setting or documents that are indeed not equal.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@JorjMcKie
Comment options

@JSchoonmaker
Comment options

@JorjMcKie
Comment options

Answer selected by JorjMcKie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants