-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-21475: Support the Sitemap extension in robotparser #6883
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. When your account is ready, please add a comment in this pull request Thanks again to your contribution and we look forward to looking at it! |
I signed the CLA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I have several comments (quite nitpicky) but the rest looks good.
In addition, I would suggest adding both yours and Peter's name into Misc/ACKs file.
Doc/library/urllib.robotparser.rst
Outdated
|
||
.. versionadded:: 3.8 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being so picky, but we only need two extra spaces between .. versionadded
and The following example...
So please remove the extra lines.
@@ -0,0 +1,2 @@ | |||
Added support for optional Site Map extension to urllib robotparser. Patch | |||
by Lady Red |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please end the sentence with a period. In addition, since it was based off another person's patch, it would be good to also mention Based on patch by Peter Wirtz.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
If I don't end up merging this, I'd suggest the core dev merging to remember to add "Co-authored by: Peter Wirtz" in the commit message, since it seems like this was based off Peter's patch. |
I have made the requested changes; please review again. |
Thanks for making the requested changes! @Mariatta: please review the changes made to this pull request. |
Yes please give credit to Peter in the commit message. PS this is my first contribution to cpython! \o/ |
Lib/test/test_robotparser.py
Outdated
""" | ||
good = ['/', '/test.html'] | ||
bad = ['/cyberworld/map/index.html'] | ||
site_maps = ["http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: Please use single quotes.
Lib/test/test_robotparser.py
Outdated
@@ -292,7 +313,7 @@ def setUp(self): | |||
# Short poll interval to make the test finish quickly. | |||
# Time between requests is short enough that we won't wake | |||
# up spuriously too many times. | |||
kwargs={'poll_interval':0.01}) | |||
kwargs={'poll_interval': 0.01}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't make unrelated cosmetic changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh drat, it's my pep8 autoformatter doing that automatically. will remove
Lib/test/test_robotparser.py
Outdated
@@ -353,5 +374,6 @@ def test_read_404(self): | |||
self.assertIsNone(parser.crawl_delay('*')) | |||
self.assertIsNone(parser.request_rate('*')) | |||
|
|||
if __name__=='__main__': | |||
|
|||
if __name__ == '__main__': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't make unrelated cosmetic changes.
@@ -189,6 +196,11 @@ def request_rate(self, useragent): | |||
return entry.req_rate | |||
return self.default_entry.req_rate | |||
|
|||
def site_maps(self): | |||
if not self.sitemaps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a test for this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this branch is tested by test_site_maps on all the other tests for robotparser - they each test that it is none except for my single class that tests the positive case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, you're correct, I didn't click to the expand button and didn't notice that the test_site_maps
method is part of BaseRobotTest
.
@@ -0,0 +1,2 @@ | |||
Added support for optional Site Map extension to urllib robotparser. Patch by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add some links to the new site_maps
(:meth:`RobotFileParser.site_maps() <urllib.robotparser.RobotFileParser.site_maps>` -- untested, you'll need to try it locally :)) method or to the urllib.robotparser
(:mod:`urllib.robotparser`) module.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
…to robotparser_site_maps Conflicts: Misc/NEWS.d/next/Library/2018-05-15-15-03-48.bpo-28612.E9dz39.rst
@berkerpeksag I added the link to the news as you suggested but I can't find any documentation in the dev guide that explains how to do whatever build step I need to do to build the news to evaluate that link. Mind pointing me to the right place? |
I have made the requested changes; please review again. |
Thanks for making the requested changes! @Mariatta, @berkerpeksag: please review the changes made to this pull request. |
Oh, I think I figured out how to make the news, I have to just make the documentation right? |
Successfully Tested! The news link works |
@mcscope I assume you've found https://devguide.python.org/committing/#what-s-new-and-news-entries but I will share it anyway in case someone else wonders how to build the docs :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for helping to finish bpo-21475. This was in my TODO list along with other urllib.robotparser issues but I couldn't find the time to work on them.
Congratulations on your first cpython PR, @mcscope! |
@berkerpeksag Any other todo-list items you have I could take care of? I'm looking at bpo but having a hard time finding ones that require code, instead of requiring developer consensus. |
This ticket has been open for 3 years just because it was awaiting tests. I took the existing patch and added a test
https://bugs.python.org/issue21475