-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add page label support to PdfWriter #1558
Conversation
How did you test it? Please add the test script |
Thank you a lot! This week I'm pretty busy, but I will try to address everything as soon as possible. At the latest, I think, next week I will be able to work on it. |
I have added the tests and changed a bit the behavior to manage correctly when ranges of page labels overlap. The three |
Co-authored-by: pubpub-zz <4083478+pubpub-zz@users.noreply.github.com>
Thank you for the mypy suggestions! |
|
Codecov ReportBase: 91.87% // Head: 91.84% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1558 +/- ##
==========================================
- Coverage 91.87% 91.84% -0.04%
==========================================
Files 33 33
Lines 6130 6202 +72
Branches 1210 1228 +18
==========================================
+ Hits 5632 5696 +64
- Misses 322 326 +4
- Partials 176 180 +4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are nearly there.
I wanted your opinion about the following: @MartinThoma @pubpub-zz
pypdf/_writer.py
Outdated
def set_page_label( | ||
self, | ||
page_number_from: int, | ||
page_number_to: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about page ranges indexes.
Now set_page_label
requires page numbers starting with 1 while _set_page_label
requires them starting with 0 (the parameters name change accordingly from page_number
to page_index
)
Also in both cases extremes are included.
I did it this way in the public interface because I think it is more natural this way for the user that probably is setting these watching a pdf (where page numbers start from 1), but I understand if you don't agree, especially considering that the private interface is 0 based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to use page_index
everywhere and hence starting from 0.
Don't forget that pypdf users are Python developers. Starting a list of pages with index 0 is a lot more natural than starting with 1.
My suggestion would be to rename the parameters to page_index_from
and page_index_to
and letting it start with 0.
@pubpub-zz / @MasterOdin What's your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with your approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hehehe 1 based indexing is never the answer. The more I think about it the more I agree with you. I'll wait a bit for MasterOdin answer and then I'll change that. Unfortunately, I will have to change all the indexes in the test but I knew what I was risking when I did it:sweat_smile:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree with making it 0-based indexed for the simple reason that all other functions that refer to pages uses a 0-based index.
I agree with @MartinThoma that using page_index
is the more "proper" way to refer to this as people may conflate page_number
with a 1-based index scheme, whereas page_index
, in my opinion, really only refers to 0-based index schemes in the context of Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'll do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit changes everything to 0-based indexing. Are you convinced by including both extremes or should we do lower included and upper excluded?
I still need to go over a lot of details, but I wanted to give you a small heads-up: Excellent work 🎉 ❤️ |
Overall it looks good to me. There are some minor docstring improvements I suggested (the first line should always be a summary of the function) + the page_number/page_index question needs to be clarified before we merge. I'm confident that this can be merged + released until Sunday :-) @lorenzomanini If you want, you can add yourself to the https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html ( |
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Thank you!:blush: I'm having a lot of fun. And thank you for being so welcoming!:heart: |
Cool! Thank you! |
@lorenzomanini Looks good from my side - let me know when I can merge :-) |
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Good, I think it is ready! Thank you @MartinThoma and @pubpub-zz for the help and for being so nice. |
I'm happy to hear that you feel welcome 🤗 Your contribution is now in the Really good work! |
New Features (ENH): - Add page label support to PdfWriter (#1558) - Accept inline images with space before EI (#1552) - Add circle annotation support (#1556) - Add polygon annotation support (#1557) - Make merging pages produce a deterministic PDF (#1542, #1543) Bug Fixes (BUG): - Fix error in cmap extraction (#1544) - Remove erroneous assertion check (#1564) - Fix dictionary access of optional page label keys (#1562) Robustness (ROB): - Set ignore_eof=True for read_until_regex (#1521) Documentation (DOC): - Paper size (#1550) Developer Experience (DEV): - Fix broken combination of dependencies of docs.txt - Annotate tests appropriately (#1551) [Full Changelog](3.2.1...3.3.0)
Introduce a new PdfWriter interface
set_page_labels
that sets page labels to a page interval (as mentioned in #1554).It should support setting multiple page lables for different page intervals.
I've done some local testing and it seems to be working.
Pending: tests to be added
It's my first time contributing to an open-source project and I have little experience with python so my code will certainly need corrections, so I would like to know what you think about it.
Also, I've no idea how to set up correctly tests for my code but I'll look into it as soon as I will have some spare time.