-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for /Kids and /Limits in page labels #2560
Comments
Maintaining/validating example images inside a PR is complicated. Rather use the existing issue #2560 if there are new findings.
For an example file and the corresponding data see #2561 (comment) Corresponding docs are "Table 37 – Entries in a number tree node dictionary" and "Table 159 – Entries in a page label dictionary". |
I just gave it a try and it seems like the following patch is sufficient to generate the correct page numbers for the aforementioned document: diff --git a/pypdf/_page_labels.py b/pypdf/_page_labels.py
index 6f41067..3a43f2a 100644
--- a/pypdf/_page_labels.py
+++ b/pypdf/_page_labels.py
@@ -57,7 +57,7 @@ a Lowercase letters (a to z for the first 26 pages,
aa to zz for the next 26, and so on)
"""
-from typing import Iterator, Optional, Tuple, cast
+from typing import Iterator, List, Optional, Tuple, cast
from ._protocols import PdfCommonDocProtocol
from ._utils import logger_warning
@@ -131,7 +131,8 @@ def index2label(reader: PdfCommonDocProtocol, index: int) -> str:
if "/PageLabels" not in root:
return str(index + 1) # Fallback
number_tree = cast(DictionaryObject, root["/PageLabels"].get_object())
- if "/Nums" in number_tree:
+
+ def handle_nums(dictionary_object: DictionaryObject) -> str:
# [Nums] shall be an array of the form
# [ key 1 value 1 key 2 value 2 ... key n value n ]
# where each key_i is an integer and the corresponding
@@ -139,7 +140,7 @@ def index2label(reader: PdfCommonDocProtocol, index: int) -> str:
# The keys shall be sorted in numerical order,
# analogously to the arrangement of keys in a name tree
# as described in 7.9.6, "Name Trees."
- nums = cast(ArrayObject, number_tree["/Nums"])
+ nums = cast(ArrayObject, dictionary_object["/Nums"])
i = 0
value = None
start_index = 0
@@ -165,16 +166,18 @@ def index2label(reader: PdfCommonDocProtocol, index: int) -> str:
start = value.get("/St", 1)
prefix = value.get("/P", "")
return prefix + m[value.get("/S")](index - start_index + start)
- if "/Kids" in number_tree or "/Limits" in number_tree:
- logger_warning(
- (
- "/Kids or /Limits found in PageLabels. "
- "Please share this PDF with pypdf: "
- "https://github.com/py-pdf/pypdf/pull/1519"
- ),
- __name__,
- )
- # TODO: Implement /Kids and /Limits for number tree
+
+ if "/Nums" in number_tree:
+ return handle_nums(number_tree)
+
+ if "/Kids" in number_tree:
+ kids: List[DictionaryObject] = number_tree["/Kids"]
+ for kid in kids:
+ limits: List[int] = kid["/Limits"]
+ if limits[0] <= index <= limits[1]:
+ return handle_nums(kid)
+
+ logger_warning(f"Could not reliably determine page label for {index}.")
return str(index + 1) # Fallback if /Nums is not in the number_tree
This more or less is the same as before, only looking into the |
Maintaining/validating example images inside a PR is complicated. Rather use the existing issue #2560 if there are new findings.
Currently,
/Kids
and/Limits
are not supported for page labels.Some examples for the actual implementation might be found at #1519.
The text was updated successfully, but these errors were encountered: