Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML SAX parsing works when using a string but fails when using a file #1577

Closed
gregors opened this issue Jan 5, 2017 · 2 comments
Closed
Milestone

Comments

@gregors
Copy link
Contributor

gregors commented Jan 5, 2017

I noticed that the HTML SAX parser fails when reading the same html from a file vs a string. It chokes on any badly formed tags such as <br > from an IO source

I've added a test to highlight the issue in #1576

If anyone could point me in a general direction I would be more than happy to try fix this.

@flavorjones
Copy link
Member

Thanks for reporting this, and for PRing the failing test. This is a weird one, looking into it now.

@flavorjones
Copy link
Member

Yeeeeaaaaaah, this is a weird one. The TL;DR is that HTML::SAX::Parser is using XML::ParserContext instead of HTML::ParserContext for parsing IO. This bug has been present since ... the dawn of time.

SOOOOoooo ... how does it feel to be the first person in the universe to SAX-parse an HTML file through an IO object? Must feel good, right? Right?

Sigh.

I'll have a fix as soon as I work through the inconsistencies in API between the XML and HTML SAX parsers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants