Skip to content

Conversation

@myndzi
Copy link
Contributor

@myndzi myndzi commented Nov 30, 2012

Still doesn't pass npm test, but it came that way

…quoted attribute values.

Require self-closing tags to be void
…g the attributes count. Here's a different way to accomplish the same thing.
@fb55
Copy link
Owner

fb55 commented Dec 1, 2012

Yes, the failing test is my fault.

Could you please give a reason why you think this needs to be done? And also, please don't change the indentation.

@myndzi
Copy link
Contributor Author

myndzi commented Dec 2, 2012

Apologies for the indentation, it was unintentional. As for why, that depends on how correctly you want to parse html. It caused me a problem so I fixed it, you are welcome to incorporate the change if you like.

@fb55
Copy link
Owner

fb55 commented Dec 2, 2012

I only need to understand what problem it triggered. Just show me the error. (I prefer a test, but that's not absolutely necessary.)

@myndzi
Copy link
Contributor Author

myndzi commented Dec 2, 2012

Oh, sure. I'll do something up on Monday. The situation being incorrectly processed is an unsuited attribute value of a tag ending in a / being treated as a self closing tag:

<a href=http://www.foo.com/>text<a>

Because the self-closing tag test just checks the last character before the closing angle bracket, the parser incorrectly emits a close tag before the text node. I read through the html5 spec and determined that unquoted tags are illegal in all cases that allow self closing tags; in xhtml and svg/mathml attributes must be quoted; in html, but self closing syntax is valid only for certain classes of tag, but they are all void (no attributes). Therefore, the simplest solution was to avoid interpreting a tag as self-closing if it contains any attributes.

fb55 and others added 9 commits December 4, 2012 15:19
This reverts commit 181c31b.
…close is implied by other tags being opened, and these are closed when those tags are opened. This helps correctly parse things like lists and tables with unterminated LI or TD tags.
@fb55
Copy link
Owner

fb55 commented Feb 15, 2013

Could you please add test cases for your changes? Also, please don't change the whitespace in the benchmark script.

@myndzi
Copy link
Contributor Author

myndzi commented Feb 15, 2013

I tried to keep your whitespace straight, it doesn't appear to have changed in the benchmark (dunno why the diff is like that).

Since the tests don't pass currently, I have no idea how to add more. I don't have the time to make sense of all that and I don't currently understand it. The two updates are for my sake, you can take them or leave them or be informed by them, whichever you want.

The first case prevents the code from interpreting a tag such as <a href=http://www.google.com/> as a self-closing tag, and the second case closes certain tags to make parsing more robust in cases such as:

<ol>
    <li>foo
    <li>bar
    <table>
        <tr>
            <td>this is annoying
            <td>but even google does it
    </table>
</ol>

.. if that helps you to knock out a quick test then it will take you 2 minutes to do what might take me 2 hours.

@fb55
Copy link
Owner

fb55 commented Feb 15, 2013

As of today, the tests pass again, and mostly I'd like an additional commit
to have Travis run them again :)

Tests are pretty easy to write, just create a .json file in tests/Events
containing the expected events (just have a look at another file, should be
easy to understand). Shouldn't take you longer than a couple of minutes.

@myndzi
Copy link
Contributor Author

myndzi commented Feb 16, 2013

Okay, I wasn't sure where what belonged. Is there any way to 'debug' the parsing, e.g. run the test script over some data and look at what it produces?

Edit: I just hacked the test runner. This could be easier.

lib/Parser.js Outdated
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid parsing attributes at this point, but I see the issue. This should also be fixable via something like

if((!this._options.xmlMode && name in emptyTags) || (data.substr(-1) === "/" && data.replace(_reAttrib, "").substr(-1) === "/")){

(This solution would also allow self-closing tags to contain attributes, which is okay for XML.)

@myndzi
Copy link
Contributor Author

myndzi commented Feb 16, 2013

Yes, it's not a parsing solution -- it's just a partial solution. Your code isn't structured in a way that makes it easy to properly implement HTML5 parsing rules. I appreciate your devotion to code consistency, and while I've tried to write like the code that exists, I can't help but feel like you're spending more time telling me about your preferences than it would take for you to simply edit them in. As I stated originally, I'm not here to become a part of the project, I just encountered a couple edge cases that were causing me problems. You can incorporate the code or not, the ideas or not, or just treat this as a bug report, but I'm too busy to go back and forth with you over formatting issues.

@fb55
Copy link
Owner

fb55 commented Feb 16, 2013

I know of the issues with the code and I was working on a solution for some time now. The codebase is still based on the general ideas of the htmlparser module, although most of the tree-building parts were moved out of the parser.

Coding style has a single priority, and that's consistency. That's a lesson everybody has to learn, and correcting the style myself would send the wrong message. Plus, I currently don't have access to my Mac and my phone isn't such a great device for coding.

@myndzi
Copy link
Contributor Author

myndzi commented Feb 16, 2013

Do us all a favor next time and simply reject the pull request then. What sends the wrong message in my view here is a pull request first submitted 3 months ago with more comments than changed lines of code. If coding style is important to you, then don't wait for people to submit changes and then harangue them about spacing -- make a style guide and make it obvious to people. If you see a problem with the code submitted that it takes you longer to comment on than to edit, just edit it -- I can pull upstream commits as easily as you can. Or, maybe even look at the code before you demand unit tests on your broken and inconvenient test system so that all changes can be made at once.

If what you want is to teach a coding class, get a job at an institution. If what you want is the most effective solution for the task, you'd do well to work with people instead of against them. I don't expect I'll contribute to this project again.

fb55 added a commit that referenced this pull request Feb 16, 2013
basic support for implied close tags, bugfix for attribute values containing a slash at the end being recognized as self-closing tags.
@fb55 fb55 merged commit 33d55cd into fb55:master Feb 16, 2013
@fb55
Copy link
Owner

fb55 commented Feb 16, 2013

Please, calm!

As you didn't mention which problems you are fixing, I needed the unit tests mainly to figure that out. That's also the reason I didn't prioritized it when you opened the pull request.

A style guide would be a nice thing to have, but I'd spend hours on it, while a lot of code I open-sourced lately lacks basic documentation. For now, the style guide is "how it was done above". At least for tabs vs. spaces, that should help.

Sadly, contributing to open source software is nothing that's taught in schools. My hope was, if I'd ask you to change the code yourself, every further pull request would already be consistent with the coding style. Because in the end, that'd safe me time.

I'm seriously sorry if that offended you.

@fb55
Copy link
Owner

fb55 commented Feb 16, 2013

Thanks a lot for your contribution, btw :)

@myndzi
Copy link
Contributor Author

myndzi commented Feb 18, 2013

Peace! I'm not holding any grudge. Your requests were reasonable, and I complied with them. The manner in which they came about was frustrating. Consider it a bug report/feature request on the process.

You're not the only one with valuable time...

fb55 added a commit that referenced this pull request Oct 21, 2018
basic support for implied close tags, bugfix for attribute values containing a slash at the end being recognized as self-closing tags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants