[MRG+1] Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases. #77

starrify · 2016-10-25T06:37:29Z

Helps resolving the issue in such cases, which does happen in several websites:

>>> from w3lib import html
>>> html.get_base_url("""<!-- <base href="http://example.com/" /> -->""")
'http://example.com/'

Fixes #70 (since the original #70 report is about this scenario; for other scenarios, we should have separate issues)

codecov-io · 2016-10-25T06:39:37Z

Current coverage is 94.10% (diff: 100%)

Merging #77 into master will increase coverage by 0.01%

@@             master        #77   diff @@
==========================================
  Files             7          7          
  Lines           406        407     +1   
  Methods           0          0          
  Messages          0          0          
  Branches         84         84          
==========================================
+ Hits            382        383     +1   
  Misses           16         16          
  Partials          8          8

Powered by Codecov. Last update 03c28d2...11b5d26

redapple · 2016-10-25T09:27:06Z

Can you add tests for this?
Can you provide example websites showing this issue?

scrapy#70, but does help in some cases.

starrify · 2016-10-26T07:21:38Z

Thanks for the notice @redapple . A test has been added.

Here's a sample site which triggers this issue: http://planweb01.rother.gov.uk/OcellaWeb/planningSearch

Gallaecio · 2019-09-17T06:20:42Z

@kmike Could you have a look?

Gallaecio · 2019-09-17T12:26:38Z

This is related to #70

yozachar · 2022-07-20T06:44:30Z

Bumping to close outdated PR.

Added: Removing comments before extracting base URLs. Not a solution to

11b5d26

scrapy#70, but does help in some cases.

starrify force-pushed the remove-comments-for-base-url branch from 93a90c5 to 11b5d26 Compare October 26, 2016 07:18

Gallaecio approved these changes Aug 8, 2019

View reviewed changes

Gallaecio changed the title ~~Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases.~~ [MRG+1] Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases. Aug 8, 2019

Gallaecio mentioned this pull request Aug 8, 2019

get_base_url: Ignore comments #130

Closed

Felipe Boff Nunes added 4 commits November 4, 2022 13:56

conflicts

a2234e9

small refactor

4d605f7

add unit_tests

95e4e97

black

a013663

Gallaecio requested review from kmike and wRAR November 4, 2022 17:57

Gallaecio approved these changes Nov 4, 2022

View reviewed changes

wRAR approved these changes Nov 7, 2022

View reviewed changes

wRAR merged commit fb70566 into scrapy:master Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MRG+1] Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases. #77

[MRG+1] Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases. #77

Uh oh!

starrify commented Oct 25, 2016 •

edited by Gallaecio

Loading

Uh oh!

codecov-io commented Oct 25, 2016 •

edited

Loading

Uh oh!

redapple commented Oct 25, 2016

Uh oh!

starrify commented Oct 26, 2016

Uh oh!

Gallaecio commented Sep 17, 2019

Uh oh!

Gallaecio commented Sep 17, 2019

Uh oh!

yozachar commented Jul 20, 2022

Uh oh!

Uh oh!

[MRG+1] Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases. #77

[MRG+1] Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases. #77

Uh oh!

Conversation

starrify commented Oct 25, 2016 • edited by Gallaecio Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Oct 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 94.10% (diff: 100%)

Uh oh!

redapple commented Oct 25, 2016

Uh oh!

starrify commented Oct 26, 2016

Uh oh!

Gallaecio commented Sep 17, 2019

Uh oh!

Gallaecio commented Sep 17, 2019

Uh oh!

yozachar commented Jul 20, 2022

Uh oh!

Uh oh!

starrify commented Oct 25, 2016 •

edited by Gallaecio

Loading

codecov-io commented Oct 25, 2016 •

edited

Loading