Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List Items separated by tab character not parsed correctly #89

Closed
butuzov opened this issue Oct 1, 2019 · 3 comments
Closed

List Items separated by tab character not parsed correctly #89

butuzov opened this issue Oct 1, 2019 · 3 comments
Labels
bug has-workaround A bug that has a workaround.

Comments

@butuzov
Copy link

butuzov commented Oct 1, 2019

List items that follow next scheme list_bullet|tab|item (plese ignore |) not parsed as list items.

Example test.md

title
*	tabbed item long line

title
* spaced item

Test Code

with open('test.md', 'r') as fin:
    with HTMLRenderer() as renderer:
        rendered = renderer.render(Document(fin))
print(rendered)

Expected Output

<p>title</p>
<ul>
<li>tabbed item long line</li>
</ul>
<p>title</p>
<ul>
<li>spaced item</li>
</ul>

Actual Output

<p>title
*       tabbed item long line</p>
<p>title</p>
<ul>
<li>spaced item</li>
</ul>
@butuzov
Copy link
Author

butuzov commented Oct 1, 2019

If the first item of the list has space betwean * and text, and next is tabbed.

output is next

<p>title</p>
<ul>
<li>spaced item</li>
<li>abbed item long line</li>
</ul>
<p>title</p>
<ul>
<li>spaced item</li>
</ul>

@pbodnar pbodnar changed the title List Items separated by tab character List Items separated by tab character not parsed correctly Sep 18, 2021
@pbodnar
Copy link
Collaborator

pbodnar commented Sep 18, 2021

Thanks for the report.

Workarounds until fixed:

  • Use space instead of tab.
  • Put a new line after the previous block element (title here). Yet, this still leads to the 1st letter of a "tab" list item being removed though.

@pbodnar pbodnar added bug has-workaround A bug that has a workaround. labels Sep 18, 2021
anderskaplan added a commit to anderskaplan/mistletoe that referenced this issue Sep 23, 2022
…d correctly.

Also fixes failing examples 312 and 313 in the CommonMark 0.30 spec.
The problem was that tabs were expanded to four spaces, not to tab stops as specified in the spec.
The solution was to expand tabs at the beginning of lines.
However, this meant that plain string indexing could no longer be used. The extraction of line
content was therefore moved into the parse_marker and parse_continuation methods.
anderskaplan added a commit to anderskaplan/mistletoe that referenced this issue Oct 29, 2022
…d correctly.

Also fixes failing examples 312 and 313 in the CommonMark 0.30 spec due to the way leading space is now checked for list items.

The problem was that tabs were expanded to four spaces, not to tab stops as specified in the spec.
The solution was to expand tabs at the beginning of lines.
However, this meant that plain string indexing could no longer be used. The extraction of line
content was therefore moved into the parse_marker and parse_continuation methods.
anderskaplan added a commit to anderskaplan/mistletoe that referenced this issue Oct 30, 2022
…d correctly.

Also fixes failing examples 312 and 313 in the CommonMark 0.30 spec, due to the way leading space
is now checked for list items.

The direct cause of the reported bug was that only spaces and not tabs were considered
valid separators for list item markers. Another problem was that the implemented tab expansion,
where tabs were always expanded to four spaces, did not work according to the spec, which states
that tabs should be expanded to the nearest tab stop (of width 4).

This fix uses `expandtabs()` to implement the tab stops correctly and moves extraction
of content into the `parse_marker()` and `parse_continuation()` methods. This lets
us implement use cases like "list interrupts a paragraph" and "list item continuation"
in a less error-prone way.
pbodnar pushed a commit that referenced this issue Oct 30, 2022
…ly. (#164)

Also fixes failing examples 312 and 313 in the CommonMark 0.30 spec due to the way leading space is now checked for list items.

The 1st problem was that `# check if next_line starts List` considered only markers followed
by a space and not by a tab. Another problem was that the implemented tab expansion
to tab stops (of width 4 as per spec) could work only in a limited scope.

This fix uses `expandtabs()` to implement the tab stops correctly and moves extraction
of content into the `parse_marker()` and `parse_continuation()` methods. This lets
us implement use cases like "list interrupts a paragraph" and "list item continuation"
in a less error-prone way.
@pbodnar
Copy link
Collaborator

pbodnar commented Oct 30, 2022

Both reported problems and even more fixed by #164.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug has-workaround A bug that has a workaround.
Projects
None yet
Development

No branches or pull requests

2 participants