Skip to content

Default text wrap breaks long words in table cell into two words #9001

@rgaiacs

Description

@rgaiacs

I first noticed this issue when working with DOCX to Markdown translation.

Given the minimal DOCX working example that contains

Screenshot of minimal DOCX working example

pandoc --from docx+styles --to markdown mwe.docx produces

+----------------------+-----------+-----------+-----------+-----------+
| ::: {c               | nullam    | diam      | tellus    | massa     |
| ustom-style="Quote"} | non nisi  | donec     | rutrum    | massa     |
| > non quam lacus     | est       | a         | tellus    | ultricies |
| > suspendisse        |           | dipiscing | pel       | mi        |
| :::                  |           | tristique | lentesque |           |
+======================+===========+===========+===========+===========+
+----------------------+-----------+-----------+-----------+-----------+

Note that adipiscing and pellentesque are broken into two in the output. This will create issues for any pipeline, for example pandoc --from docx+styles --to markdown mwe.docx | pandoc --from markdown --to html that will produce

<table style="width:96%;">
<colgroup>
<col style="width: 31%" />
<col style="width: 16%" />
<col style="width: 16%" />
<col style="width: 16%" />
<col style="width: 16%" />
</colgroup>
<thead>
<tr class="header">
<th><div class="{c">
<p>ustom-style=“Quote”} &gt; non quam lacus &gt; suspendisse</p>
</div></th>
<th>nullam non nisi est</th>
<th>diam donec a dipiscing tristique</th>
<th>tellus rutrum tellus pel lentesque</th>
<th>massa massa ultricies mi</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Note that instead of adipiscing and pellentesque we now have a dipiscing and pel lentesque.

This issue does not happen when text wrap is disable. For example, pandoc --from docx+styles --to markdown --wrap=none mwe.docx` produces

+------------------------------+---------------------+---------------------------------+-----------------------------------+--------------------------+
| ::: {custom-style="Quote"}   | nullam non nisi est | diam donec adipiscing tristique | tellus rutrum tellus pellentesque | massa massa ultricies mi |
| > non quam lacus suspendisse |                     |                                 |                                   |                          |
| :::                          |                     |                                 |                                   |                          |
+==============================+=====================+=================================+===================================+==========================+
+------------------------------+---------------------+---------------------------------+-----------------------------------+--------------------------+

Would be possible to implement one of the following options?

  1. Disable text wrap for tables in Markdown by default and add a option --force-table-wrap
  2. Print a warning when this might happen

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions