Skip to content

DOCX reader should handle table caption created in non-English Microsoft Word #9518




DOCX reader should

instead of checking the styleId, is look up the style id and check the style's <w:name> element to see if it is "caption".

as pointed by @jgm.

Previous discussed at #9515

I have my Microsoft Word in German and my document in English.

Screenshot 2024-02-26 170602

I create a table using the Microsoft Word built-in interface.

Screenshot 2024-02-26 170752

And I add a caption using the Microsoft Word built-in dialogue window.

Screenshot 2024-02-26 170852

Because my document is in English, Word automatically set the caption to "Table".

The final minimal working example is mwe-using-german-word.docx.

When I run pandoc --from docx --to html mwe-using-german-word.docx, the output is

<p>Lorem ipsum</p>
<p>Table 1 Example</p>
<col style="width: 50%" />
<col style="width: 50%" />
<tr class="header">
<tr class="odd">

instead of

<p>Lorem ipsum</p>
<col style="width: 50%" />
<col style="width: 50%" />
<tr class="header">
<tr class="odd">

that is produced by the same command (pandoc --from docx --to html) but using mwe-using-english-word.docx as input.

XML of non-English Document

The caption is

    <w:p w14:paraId="1FADD07B" w14:textId="3660CC9A" w:rsidR="00917377" w:rsidRDefault="00917377" w:rsidP="00917377">
        <w:pStyle w:val="Beschriftung"/>
        <w:t xml:space="preserve">Table </w:t>
        <w:fldChar w:fldCharType="begin"/>
        <w:instrText xml:space="preserve"> SEQ Table \* ARABIC </w:instrText>
        <w:fldChar w:fldCharType="separate"/>
        <w:fldChar w:fldCharType="end"/>
        <w:t xml:space="preserve"> </w:t>
      <w:proofErr w:type="spellStart"/>
      <w:proofErr w:type="spellEnd"/>

XML of English Document

    <w:p w14:paraId="5DE3A68F" w14:textId="153D5F3C" w:rsidR="000E6255" w:rsidRDefault="000E6255" w:rsidP="000E6255">
        <w:pStyle w:val="Caption"/>
        <w:t xml:space="preserve">Table </w:t>
      <w:fldSimple w:instr=" SEQ Table \* ARABIC ">
        <w:t xml:space="preserve"> Example</w:t>




No one assigned



    No projects


    No milestone


    None yet


    No branches or pull requests

    Issue actions