-
-
Notifications
You must be signed in to change notification settings - Fork 16
Add support for task list checkboxes outside p
#81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for task list checkboxes outside p
#81
Conversation
Previously, list items with leading checkboxes would only be treated as task list items if the checkbox was wrapped in a `<p>` element, like: <li><p><input type="checkbox"> Other content</p></li> This adds support for detecting task lists even when the checkbox is not nested in a `<p>`: <li><input type="checkbox"> Other content</li> This does *not* add support for some other complex scenarios, such as the checkbox being inside other phrasing content, being more deeply nested, or having comments in front of it: <li><strong><input type="checkbox"> Other content</strong></li> <li><p><strong><input type="checkbox"> Other content</strong></p></li> <li><!-- Comment --><input type="checkbox"> Other content</li> Fixes syntax-tree#80.
lib/handlers/li.js
Outdated
if ( | ||
checkbox && | ||
checkbox.type === 'element' && | ||
checkbox.tagName === 'input' && | ||
checkbox.properties && | ||
(checkbox.properties.type === 'checkbox' || | ||
checkbox.properties.type === 'radio') | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Node 16+ is required, this could probably be stremlined a bit with optional chaining (e.g. checkbox?.properties?.type === 'checkbox'
). I didn't see that in use anywhere else, though, so I left the full check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for such a good PR!
lib/handlers/li.js
Outdated
clone = {...node, children: [...node.children]} | ||
if (checkboxParent === node) { | ||
clone.children.splice(0, 1) | ||
} else { | ||
clone.children.splice(0, 1, { | ||
...checkboxParent, | ||
children: checkboxParent.children.slice(1) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a) some comments are useful
b) operating on the clone reads a bit complex, to me. Maybe that’s just me. What do you think about this?
clone = {...node, children: [...node.children]} | |
if (checkboxParent === node) { | |
clone.children.splice(0, 1) | |
} else { | |
clone.children.splice(0, 1, { | |
...checkboxParent, | |
children: checkboxParent.children.slice(1) | |
}) | |
} | |
// The checkbox is the head of `li`: | |
if (checkboxParent === node) { | |
clone = {...node, children: node.children.slice(1)} | |
} | |
// The checkbox is in a paragraph (which is `checkboxParent`): | |
else { | |
clone = { | |
...node, | |
children: [ | |
{...checkboxParent, children: checkboxParent.children.slice(1)}, | |
...checkboxParent.children.slice(1) | |
] | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Yes, I definitely should have added some comments here. I tried both of these styles and liked on the one I committed slightly more, but I don’t have strong feelings. Will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: will wait for feedback on the below item (about the checkbox with no text after it) before making the changes here, since there might be more to adjust depending on what the right output should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I gather the main concern here is readability, I wound up extracting the code for finding and removing the checkbox into a separate function. I also wound up rewriting this recursively, which I think makes the logic much clearer to read (if you don’t think so, I can just roll back to f0acebf, which is still a separate function but has your recommended implementation and some more comments and description).
<li><input type="checkbox"> India</li> | ||
<li> | ||
<input type="checkbox"> Juliet | ||
</li> | ||
</ol> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a test without anything after it! Particularly without whitespace, like the current new India one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, that’s an interesting edge case I didn’t think about! What would you expect for output here? The current code creates an empty task list item, e.g:
{
type: "listItem",
checked: false,
children: []
}
…which renders like a normal list item, without even a checkbox (adding the Juliet item before it just so it’s clear):
* [ ] Juliet
*
(The checkbox disappears because there’s no paragraph
node at the start of the children array, and that’s required to render it in mdast-util-task-list-gfm-item.)
Is that OK? This is kind of a weird situation (maybe like the things I was thinking there could be warnings about). I’m not sure thare’s any really reasonable result here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the wait. Yes, I think that’s fine. I think no checkbox is rendered because a paragraph is required by GFM. And those have to be filled with something? As far as I remember?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, great, will leave this case functioning as it currently does and get the rest of this done this week. 👍
I think no checkbox is rendered because a paragraph is required by GFM. And those have to be filled with something? As far as I remember?
Yeah, that’s what’s happening. FWIW, what I was trying to get at was whether you’d want to output a different tree in this case because the above current behavior doesn’t render/outputs an invalid mdast tree. For example, putting in a non-breaking space:
* [ ] Juliet
* [ ]
Or a comment:
* [ ] Juliet
* [ ] <!-- No text -->
Or just backing out and treating it like a checkbox in the middle of text gets treated, rather than as a task list item:
* [ ] Juliet
* \[ ]
The first two will successfully render as a checklist item when converting back to HTML; the last one does not, but matches behavior elsewhere in this module. There are lots of creative options along these lines.
BUT all of these are doing some kind of funny behavior to preserve the intent of the input. I definitely understand there are reasons you might consider the current garbage-in-garbage-out behavior better or more straightforward.
As for tests: if I am personally investigating something small, I copy/paste the example from the readme in an Typically tests are smaller, less fixture bases than this project. Then adding some console.logs somewhere in there is what I do. For these big fixture projects, you can also pass |
The input here can be arbitrary HTML. It could be any unsafe page on the web. Perhaps we could start thinking about warnings where we can’t make a good choice. But I imagine that there are too many, and human interaction is needed if that goal, scraping the web and turning it into markdown, can’t be reached. What’s the reason you are using this project? |
Co-authored-by: Titus <tituswormer@gmail.com> Signed-off-by: Rob Brackett <rob@robbrackett.com>
This also applies to the smaller, non-fixture tests here! Personally, I sometimes use a debugger with tests and sometimes use - test('core', async function (t) {
+ describe('core', async function () {
- await t.test('should expose the public api', async function () {
+ it('should expose the public api', async function () {
I don’t think that would have helped me a whole lot here (I wrote the tests before the code), but good to know!
For sure! I guess I was thinking about warnings in very specific cases where it’s clear what might be intended, even though you won’t get it because of Markdown’s rules (or at least Mdast’s rules; “Markdown” is pretty flavor-based when you get to the details), or because the input produces an invalid/malformed/whatever output tree. For example, it’s easy to detect a situation like the following because the Mdast tree we convert to cannot be correctly rendered (it produces a task list item node that does not start with a <!-- Renders like a perfectly normal checklist item in HTML, but the checkbox will just disappear in Markdown.
Where did it go? Why? I imagine it would be pretty opaque from an end-user perspective. -->
<li><input type="checkbox"> <!-- comment -->Kilo</li>
For this, my main usage is https://github.com/Mr0grog/google-docs-to-markdown. (I’ve used other parts of Mdast for other things, but not the HTML → MD stuff here.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience! I think there’s just this one point?
<li><input type="checkbox"> India</li> | ||
<li> | ||
<input type="checkbox"> Juliet | ||
</li> | ||
</ol> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the wait. Yes, I think that’s fine. I think no checkbox is rendered because a paragraph is required by GFM. And those have to be filled with something? As far as I remember?
This adds some more explanatory comments and moves the checkbox-related code into its own function, so the complex stuff is better encapsulated.
The main goal here is to make the logic here easier to read and follow. It also makes it easier to update to handle more complex scenarios if desired in the future. This does make the code slightly more vulnerable to getting slowed down by weird trees (that aren't valid HTML) with `<p>` elements deeply nested inside each other.
@wooorm I think this should be good to go now! I added a list item with just a checkbox to both the I wound up extracting the code for finding and removing the checkbox into a separate function, since I think isolating it and giving it a name helps make things easier to follow. I also rewrote it in a recursive form that I think makes the logic much clearer to read (if you don’t think so, I can just roll back to f0acebf, which is still a separate function but has your recommended implementation and some more comments and description). Benchmarking on my machine, it doesn't seem to make a notable performance difference, although it does technically make the code more vulnerable to slowdowns from weird (not valid in HTML, and will never be returned by <li>
<p>
<p>
<p>
<!-- and so on many levels deep -->
<input type="checkbox"> Text
</p>
</p>
</p>
</li> Please let me know if that's ok, whether there's anything else I need to do here, and if you need me to squash the commits. |
p
elementsp
This comment has been minimized.
This comment has been minimized.
Thanks so much! 10.1.0! |
Initial checklist
Description of changes
Previously, list items with leading checkboxes would only be treated as task list items if the checkbox was wrapped in a
<p>
element, like:This adds support for detecting task lists even when the checkbox is not nested in a
<p>
:This does not add support for some other complex scenarios, such as the checkbox being inside other phrasing content, being more deeply nested, or having comments in front of it, since the implementation gets much more complex:
See discussion on #80 for more about that. There is a prototype of support for deeper nesting, but not comments, at Mr0grog@9a386bc, which I am happy to add here (or clean up some) if you like it.
Fixes #80.
Side Questions
I thought about writing some kind of warning messages for situations that are probably malformed, e.g. there is a leading checkbox, but followed by non-phrasing content (so you’d wind up with a Mdast tree that is in some sense invalid, or at least doesn’t stringify correctly). I was thinking of the error messages in remark-parse’s
emitErrors
option (obviously not exactly the same since this is not a Unified plugin). Anyway, I didn’t see tooling for anything like this, and wanted to make sure I wasn’t missing it.I couldn’t find an easy/obvious way to narrow down which tests were running, which would have been helpful while debugging some issues (instead, I wound up hacking around in the
test/index.js
file). Is there a way to do this that I’m missing? Otherwise it seems like the Node.js test runner has support, but the way tests are run and written here doesn’t work with it:node --test
, notnode test/index.js
(i.e. don’t run an actual test file), then you can use--test-name-pattern
to filter tests.describe()
andit()
instead oftest()
with subtests. I think the issue with the latter is that it can’t know in advance whether you are defining subtests, so it can’t filter based on them.^^ If those are useful changes, I’m happy to do a separate, quick PR for them.