Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of tight lists #23

Closed
wants to merge 9 commits into from

Conversation

Hamms
Copy link
Contributor

@Hamms Hamms commented Aug 27, 2018

Context

In markdown, "tight" lists are supposed to render text without paragraph tags:

- item one
- item two

becomes:

<ul>
<li>item one</li>
<li>item two</li>
</ul>

In remark, this usually works, but it works in kind of a weird way. The original text is actually parsed to an MDAST tree that contains paragraph nodes:

root[1] (1:1-2:11, 0-21)
└─ list[2] (1:1-2:11, 0-21) [ordered=false][loose=false]
   ├─ listItem[1] (1:1-1:11, 0-10) [loose=false]
   │  └─ paragraph[1] (1:3-1:11, 2-10)
   │     └─ text: "item one" (1:3-1:11, 2-10)
   └─ listItem[1] (2:1-2:11, 11-21) [loose=false]
      └─ paragraph[1] (2:3-2:11, 13-21)
         └─ text: "item two" (2:3-2:11, 13-21)

And those tags are then removed by mdast-util-to-hast:

if (
(!parent || !parent.loose) &&
children.length === 1 &&
head.type === 'paragraph'
) {
single = true
}

root[1] (1:1-2:11, 0-21)
└─ element[5] (1:1-2:11, 0-21) [tagName="ul"]
   ├─ text: "\n"
   ├─ element[1] (1:1-1:11, 0-10) [tagName="li"]
   │  └─ text: "item one" (1:3-1:11, 2-10)
   ├─ text: "\n"
   ├─ element[1] (2:1-2:11, 11-21) [tagName="li"]
   │  └─ text: "item two" (2:3-2:11, 13-21)
   └─ text: "\n"

The Problem

This usually results in correct HTML output, even though it's a bit weird to have "paragraph" nodes that don't then become "p" tags. However! This only works if the tight list item has a single child, which means that in cases like

- item one
  - item two

we would expect to render to

<ul>
<li>item one
<ul>
<li>item two</li>
</ul>
</li>
</ul>

but instead get

<ul>
<li>
<p>item1</p>
<ul>
<li>item2</li>
</ul>
</li>
</ul>

Because the node in the intermediate MDAST doesn't have just a single child:

root[1] (1:1-2:13, 0-23)
└─ list[1] (1:1-2:13, 0-23) [ordered=false][loose=false]
   └─ listItem[2] (1:1-2:13, 0-23) [loose=false]
      ├─ paragraph[1] (1:3-1:11, 2-10)
      │  └─ text: "item one" (1:3-1:11, 2-10)
      └─ list[1] (2:3-2:13, 13-23) [ordered=false][loose=false]
         └─ listItem[1] (2:3-2:13, 13-23) [loose=false]
            └─ paragraph[1] (2:5-2:13, 15-23)
               └─ text: "item two" (2:5-2:13, 15-23)

The Solution

There are a couple of potential solutions here; one of them would be to avoid creating those unnecessary "paragraph" nodes in the first place, but another is toupgrade mdast-util-to-hast to be able to deal with this case as well. This latter solution is outlined in this PR.

I still need to fix the tests that are affected by this change and write new ones for this new functionality, but before I invest that time I want to get your thoughts on the validity of pursuing this fix as opposed to the other one.

…item handler

Since otherwise, we don't correctly handle tight lists containing
multiple children, as in the case of

    - item one
      - item two
@Hamms
Copy link
Contributor Author

Hamms commented Aug 27, 2018

@wooorm could I get your thoughts on the relative merits of the fix proposed here vs the possible fix for the same problem at the layer of the parser?

@wooorm
Copy link
Member

wooorm commented Aug 27, 2018

@Hamms Interesting! I’m just now rewriting the mdast spec, and am finding that tight / loose on lists and on items are pretty messy. They’re idea is that they’re useful for HTML, mostly, but as we’re not really using it here, that’s just weird!

I saw that commonmark does not render the <p> in this case, so I guess that should be removed.

I think that the loose / tight problem should be fixed in remark-parse.
I feel that it makes sense to make the logic here simpler, so that we could only check the loose property here

These are just half thoughts, what do based on my ideas?

@Hamms
Copy link
Contributor Author

Hamms commented Aug 27, 2018

Yeah, I wasn't sure how to feel about whether the problem should be fixed at the level of "paragraph" nodes or "p" tags; seems like it comes down to the question of whether "paragraph" nodes have some use beyond just being blindly converted into "p" tags.

Fixing the problem here in the converter is saying that yes they do, and the handler (in this case, mdast-util-to-hast) should also be smart enough to know what tight lists are and know when they should and should not generate "p" tags.

Fixing it in remark-parse would be saying that no they do not; they just exist to be turned into "p" tags and so the parser should be smart enough to know when to and to know inject them into the tree.

I actually originally fixed this at the parser the layer, mostly because I didn't feel that I had a good sense for the role of the paragraph node in the greater remark ecosystem, and because the existing implementation puts that responsibility on mdast-util-to-hast and not on remark-parse, so I just went with the current direction. But if you're down with the idea of me changing things on a deeper level, I do I think prefer the "stupid" paragraph nodes philosophy.

@Hamms
Copy link
Contributor Author

Hamms commented Aug 28, 2018

Upon further investigation, it looks like this solution actually introduces a problem with loose lists, because in the case of

- item one

- item two

remark-parse actually does not recognize the second item as being loose:

root[1] (1:1-3:11, 0-22)
└─ list[2] (1:1-3:11, 0-22) [ordered=false][loose=true]
   ├─ listItem[1] (1:1-2:1, 0-11) [loose=true]
   │  └─ paragraph[1] (1:3-1:11, 2-10)
   │     └─ text: "item one" (1:3-1:11, 2-10)
   └─ listItem[1] (3:1-3:11, 12-22) [loose=false]
      └─ paragraph[1] (3:3-3:11, 14-22)
         └─ text: "item two" (3:3-3:11, 14-22)

@Hamms
Copy link
Contributor Author

Hamms commented Aug 28, 2018

Given that any list item being loose should make the whole list behave as loose - which is a "going backwards" approach that seems pretty inconsistent with remark-parse's entire premise - it seems like this fix will require changes to both the parse and transform stages.

First, parse should be updated to consistently create "paragraph" tags while still bubbling the loose flag up as it currently does. Then, transform should be updated to purge paragraph tags in loose lists as it currently does, but to also do so for list items that contain more than one child.

@wooorm
Copy link
Member

wooorm commented Aug 28, 2018

@Hamms Maybe, but it’s based on commonmark! I dunno, I’m not sure of a better approach?

@Hamms
Copy link
Contributor Author

Hamms commented Aug 28, 2018

I think it's possible to keep the approach consistent, it just requires embracing the idea of paragraph nodes as being something more semantically significant than just placeholders for p tags

@wooorm
Copy link
Member

wooorm commented Aug 28, 2018

I agree with that idea, it’s what we’re doing now, and it works for me.
It’s similar to the idea of a paragraph in HTML (not just a <p> tag either), in most cases <p> can be omitted.

@Hamms
Copy link
Contributor Author

Hamms commented Sep 10, 2018

@wooorm Please take another look!

I believe I have the ultimate functionality where we want it to be; all the various ways of combining looseness, tightness, and nesting work as I expect them to. However, I'm not sure how I feel about the actual approach; rendering the results then modifying them like I'm doing feels inconsistent with the way subtrees are handled everywhere else in the codebase, but I'm not sure of a better way to do it. I'd love to hear your thoughts!

Thanks!

@Hamms
Copy link
Contributor Author

Hamms commented Sep 17, 2018

@wooorm Could you take a look at this when you get a chance?

@wooorm
Copy link
Member

wooorm commented Oct 6, 2018

Hey, sorry for the slow response! I thought about it some more, see remarkjs/remark#364 for my current thinking. I think we can use the code here as a base on those changes in remark quite easily.

@Hamms
Copy link
Contributor Author

Hamms commented Oct 8, 2018

Very exciting! I'll take a look as soon as I can

wooorm added a commit to remarkjs/remark that referenced this pull request Oct 14, 2018
* remark-parse: remove `loose` fields from `list`, `listItem`
* remark-parse: add support for empty list-items in any non-pedantic mode
* remark-parse: add `spread` to `list`, to signal that its items should have blank
  lines between them, set to true if any blank line exists between items
* remark-parse: add `spread` to `listItem`, to signal that its children should have
  blank lines between them
* remark-parse: fix bug where two lists, if they are both ordered or both not
  ordered, was seen in commonmark as one list
* remark-stringify: add support for missing `ordered` field
* remark-stringify: add support for missing `start` field
* remark-stringify: add support for `list.spread` (default: false)
* remark-stringify: add support for `listItem.spread` (default: true)
* remark-stringify: fix bug where two lists, if they are both ordered or both not
  ordered, and in commonmark, were not properly stringified, resulting in a
  new parse seeing them as one list
* remark-stringify: fix bug where a list followed by an indented code block, in
  commonmark, was not correctly stringified, resulting in a new parse seeing
  the code as content of the list
* remark-stringify: remove trailing white-space on empty list-items
* remark-stringify: add support for stringifying list-items without parent

Related to syntax-tree/mdast#4.
Related to syntax-tree/mdast-util-to-hast#23.
Related to GH-349.

Closes GH-332.
Closes GH-350.
Closes GH-358.
Closes GH-364.

Reviewed-by: Elijah Hamovitz <elijahhamovitz@gmail.com>
Reviewed-by: Ika <ikatyang@gmail.com>
@wooorm wooorm closed this in 06361e2 Nov 11, 2018
@wooorm wooorm added ⛵️ status/released 🐛 type/bug This is a problem 🗄 area/interface This affects the public interface 🧑 semver/major This is a change labels Aug 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🗄 area/interface This affects the public interface 💪 phase/solved Post is done 🧑 semver/major This is a change 🐛 type/bug This is a problem
Development

Successfully merging this pull request may close these issues.

2 participants