Skip to content

Commit

Permalink
BUG: Prevent deduplication of PageObject (#1105)
Browse files Browse the repository at this point in the history
Make sure that PageObject is not deduplicated if it is not exactly same page object.
Adobe Reader/Acrobat doesn't like it if same page is referred more than one time.

Closes #1102

Co-authored-by: Harry Karvonen <harry.karvonen@onebyte.fi>
  • Loading branch information
Hatell and Harry Karvonen authored Jul 16, 2022
1 parent 9bbe827 commit dd2d69a
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion PyPDF2/_page.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,11 @@ def __init__(
self.pdf: Optional[PdfReader] = pdf
self.indirect_ref = indirect_ref

def hash_value_data(self) -> bytes:
data = super().hash_value_data()
data += b"%d" % id(self)
return data

@staticmethod
def create_blank_page(
pdf: Optional[Any] = None, # PdfReader
Expand Down Expand Up @@ -1287,7 +1292,9 @@ def process_operation(operator: bytes, operands: List) -> None:
)
if isinstance(cmap[0], str):
try:
t = tt.decode(cmap[0], "surrogatepass") # apply str encoding
t = tt.decode(
cmap[0], "surrogatepass"
) # apply str encoding
except Exception: # the data does not match the expectation, we use the alternative ; text extraction may not be good
t = tt.decode(
"utf-16-be" if cmap[0] == "charmap" else "charmap",
Expand Down

0 comments on commit dd2d69a

Please sign in to comment.