Skip to content

DOCX reader should handle table caption created in non-English Microsoft Word #9518

Closed
@rgaiacs

Description

@rgaiacs

DOCX reader should

instead of checking the styleId, is look up the style id and check the style's <w:name> element to see if it is "caption".

as pointed by @jgm.

Previous discussed at #9515

I have my Microsoft Word in German and my document in English.

Screenshot 2024-02-26 170602

I create a table using the Microsoft Word built-in interface.

Screenshot 2024-02-26 170752

And I add a caption using the Microsoft Word built-in dialogue window.

Screenshot 2024-02-26 170852

Because my document is in English, Word automatically set the caption to "Table".

The final minimal working example is mwe-using-german-word.docx.

When I run pandoc --from docx --to html mwe-using-german-word.docx, the output is

<p>Lorem ipsum</p>
<p>Table 1 Example</p>
<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr class="header">
<th>A</th>
<th>B</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>C</td>
<td>D</td>
</tr>
</tbody>
</table>

instead of

<p>Lorem ipsum</p>
<table>
<caption><p>Example</p></caption>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr class="header">
<th>A</th>
<th>B</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>2</td>
</tr>
</tbody>
</table>

that is produced by the same command (pandoc --from docx --to html) but using mwe-using-english-word.docx as input.

XML of non-English Document

The caption is

    <w:p w14:paraId="1FADD07B" w14:textId="3660CC9A" w:rsidR="00917377" w:rsidRDefault="00917377" w:rsidP="00917377">
      <w:pPr>
        <w:pStyle w:val="Beschriftung"/>
        <w:keepNext/>
      </w:pPr>
      <w:r>
        <w:t xml:space="preserve">Table </w:t>
      </w:r>
      <w:r>
        <w:fldChar w:fldCharType="begin"/>
      </w:r>
      <w:r>
        <w:instrText xml:space="preserve"> SEQ Table \* ARABIC </w:instrText>
      </w:r>
      <w:r>
        <w:fldChar w:fldCharType="separate"/>
      </w:r>
      <w:r>
        <w:rPr>
          <w:noProof/>
        </w:rPr>
        <w:t>1</w:t>
      </w:r>
      <w:r>
        <w:fldChar w:fldCharType="end"/>
      </w:r>
      <w:r>
        <w:t xml:space="preserve"> </w:t>
      </w:r>
      <w:proofErr w:type="spellStart"/>
      <w:r>
        <w:t>Example</w:t>
      </w:r>
      <w:proofErr w:type="spellEnd"/>
    </w:p>

XML of English Document

    <w:p w14:paraId="5DE3A68F" w14:textId="153D5F3C" w:rsidR="000E6255" w:rsidRDefault="000E6255" w:rsidP="000E6255">
      <w:pPr>
        <w:pStyle w:val="Caption"/>
        <w:keepNext/>
      </w:pPr>
      <w:r>
        <w:t xml:space="preserve">Table </w:t>
      </w:r>
      <w:fldSimple w:instr=" SEQ Table \* ARABIC ">
        <w:r>
          <w:rPr>
            <w:noProof/>
          </w:rPr>
          <w:t>1</w:t>
        </w:r>
      </w:fldSimple>
      <w:r>
        <w:t xml:space="preserve"> Example</w:t>
      </w:r>
    </w:p>

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions