-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attributes that have no value get their name as their value #17
Comments
Thanks for the test case @simbabque! |
It might be helpful to add this as a TODO test as well. |
I've also run into this issue a couple of times - digging deeper into HTML::Parser reveals that setting the value to the name is intentional! There is an option to control what value is "parsed" when an attribute has no value. ; perl -MHTML::TokeParser -MDDP -lE 'my $p = HTML::TokeParser->new( doc => \qq{<input type="text" name="abc123" value>}, boolean_attribute_value=>"no value!")->get_tag->[1]; p $p'
{
name "abc123",
type "text",
value "no value!"
} The current design of "return the name" doesn't seem sensible to me - having the default setting for the option be Here's where setting the value to the name happens in the C code. |
This part of the parser specifically mentions 'boolean' - I believe it's referring to this: https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#boolean-attributes
and
I think what this means is that HTML::Parser needs to be aware of the types of the attributes its parsing, which makes it seem like the fix won't be so easy? |
This blog post states that there are 25 attributes which are boolean. https://meiert.com/en/blog/boolean-attributes-of-html/ If that's correct, they could be special-cased, but from my quick digging I didn't find a definitive list elsewhere, so I'm not confident in this yet. |
Thanks for the info about boolean attributes - the "return the name" behavior makes sense now. Does a user of HTML::Parser care about differentiating between |
… attribute name - but see libwww-perl#17 discussion
… attribute name - but see #17 discussion
… attribute name - but see #17 discussion
When investigating libwww-perl/WWW-Mechanize#125 I noticed that the following HTML parses weirdly.
According to the HTML spec on an input element a value attribute that's not followed by an equals
=
should be empty, so we should be parsing it to an empty string.Instead of making it empty, we set it to "value".
I've looked into it, and got as far as that get_tag returns a data structure that contains the wrong value:
Unfortunately I am out of my depths with the actual C code for the parser. But I think, we should be returning an empty string for the value attribute, as well as all other empty attributes.
I wrote the following test to demonstrates the problem.
The text was updated successfully, but these errors were encountered: