List Items separated by tab character not parsed correctly #89

butuzov · 2019-10-01T13:43:09Z

List items that follow next scheme list_bullet|tab|item (plese ignore |) not parsed as list items.

Example `test.md`

title
*	tabbed item long line

title
* spaced item

Test Code

with open('test.md', 'r') as fin:
    with HTMLRenderer() as renderer:
        rendered = renderer.render(Document(fin))
print(rendered)

Expected Output

<p>title</p>
<ul>
<li>tabbed item long line</li>
</ul>
<p>title</p>
<ul>
<li>spaced item</li>
</ul>

Actual Output

<p>title
*       tabbed item long line</p>
<p>title</p>
<ul>
<li>spaced item</li>
</ul>

The text was updated successfully, but these errors were encountered:

butuzov · 2019-10-01T13:48:41Z

If the first item of the list has space betwean * and text, and next is tabbed.

output is next

<p>title</p>
<ul>
<li>spaced item</li>
<li>abbed item long line</li>
</ul>
<p>title</p>
<ul>
<li>spaced item</li>
</ul>

pbodnar · 2021-09-18T06:45:27Z

Thanks for the report.

Workarounds until fixed:

Use space instead of tab.
Put a new line after the previous block element (title here). Yet, this still leads to the 1st letter of a "tab" list item being removed though.

…d correctly. Also fixes failing examples 312 and 313 in the CommonMark 0.30 spec. The problem was that tabs were expanded to four spaces, not to tab stops as specified in the spec. The solution was to expand tabs at the beginning of lines. However, this meant that plain string indexing could no longer be used. The extraction of line content was therefore moved into the parse_marker and parse_continuation methods.

…d correctly. Also fixes failing examples 312 and 313 in the CommonMark 0.30 spec due to the way leading space is now checked for list items. The problem was that tabs were expanded to four spaces, not to tab stops as specified in the spec. The solution was to expand tabs at the beginning of lines. However, this meant that plain string indexing could no longer be used. The extraction of line content was therefore moved into the parse_marker and parse_continuation methods.

…d correctly. Also fixes failing examples 312 and 313 in the CommonMark 0.30 spec, due to the way leading space is now checked for list items. The direct cause of the reported bug was that only spaces and not tabs were considered valid separators for list item markers. Another problem was that the implemented tab expansion, where tabs were always expanded to four spaces, did not work according to the spec, which states that tabs should be expanded to the nearest tab stop (of width 4). This fix uses `expandtabs()` to implement the tab stops correctly and moves extraction of content into the `parse_marker()` and `parse_continuation()` methods. This lets us implement use cases like "list interrupts a paragraph" and "list item continuation" in a less error-prone way.

…ly. (#164) Also fixes failing examples 312 and 313 in the CommonMark 0.30 spec due to the way leading space is now checked for list items. The 1st problem was that `# check if next_line starts List` considered only markers followed by a space and not by a tab. Another problem was that the implemented tab expansion to tab stops (of width 4 as per spec) could work only in a limited scope. This fix uses `expandtabs()` to implement the tab stops correctly and moves extraction of content into the `parse_marker()` and `parse_continuation()` methods. This lets us implement use cases like "list interrupts a paragraph" and "list item continuation" in a less error-prone way.

pbodnar · 2022-10-30T19:15:20Z

Both reported problems and even more fixed by #164.

pbodnar changed the title ~~List Items separated by tab character~~ List Items separated by tab character not parsed correctly Sep 18, 2021

pbodnar added bug has-workaround A bug that has a workaround. labels Sep 18, 2021

pbodnar mentioned this issue Oct 28, 2022

Fix for #89, List Items separated by tab character not parsed correctly. #164

Merged

pbodnar closed this as completed Oct 30, 2022

pbodnar mentioned this issue Jan 7, 2023

Finalize version 1.0.0 #174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List Items separated by tab character not parsed correctly #89

List Items separated by tab character not parsed correctly #89

butuzov commented Oct 1, 2019 •

edited

Loading

butuzov commented Oct 1, 2019

pbodnar commented Sep 18, 2021

pbodnar commented Oct 30, 2022

List Items separated by tab character not parsed correctly #89

List Items separated by tab character not parsed correctly #89

Comments

butuzov commented Oct 1, 2019 • edited Loading

Example test.md

Test Code

Expected Output

Actual Output

butuzov commented Oct 1, 2019

pbodnar commented Sep 18, 2021

pbodnar commented Oct 30, 2022

butuzov commented Oct 1, 2019 •

edited

Loading

Example `test.md`