Skip to content

DocBook reader ignores the id attribute of formalpara #8666

Closed
@tombolano

Description

@tombolano

Explain the problem.

DocBook reader ignores the id attribute of formalpara elements. This attribute is needed for cross-references.

I found this problem when trying to convert an asciidoc document that references code blocks. Since pandoc does not support direct asciidoc conversion, I used the DocBook backend of asciidoctor to generate a DocBook document, but I found that when I tried to convert the DocBook document to other formats, the references to the code blocks were broken.

For a minimal example, consider this asciidoc code:

= My document

My code is in <<my_code_id>>.

.Code caption
[#my_code_id,bash]
----
echo "hello world"
----

When converting to docbook with asciidoctor -b docbook example.adoc the following DocBook is produced:

<?xml version="1.0" encoding="UTF-8"?>
<?asciidoc-toc?>
<?asciidoc-numbered?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
<info>
<title>My document</title>
<date>2023-03-03</date>
</info>
<simpara>My code is in <xref linkend="my_code_id"/>.</simpara>
<formalpara xml:id="my_code_id">
<title>Code caption</title>
<para>
<programlisting language="bash" linenumbering="unnumbered">echo "hello world"</programlisting>
</para>
</formalpara>
</article>

Then, when pandoc reads the DocBook code with the command pandoc -t native -f docbook the following AST is returned:

[ Para
    [ Str "My"
    , Space
    , Str "code"
    , Space
    , Str "is"
    , Space
    , Str "in"
    , Space
    , Link
        ( "" , [] , [] )
        [ Str "formalpara_title" ]
        ( "#my_code_id" , "" )
    , Str "."
    ]
, Div
    ( "" , [ "formalpara-title" ] , [] )
    [ Para [ Strong [ Str "Code" , Space , Str "caption" ] ] ]
, CodeBlock ( "" , [ "bash" ] , [] ) "echo \"hello world\""
]

The problem here is that in the AST the Div element is missing the id and thus the previous reference to the code element is broken. The expected Div should be:

Div
    ( "my_code_id" , [ "formalpara-title" ] , [] )
    [ Para [ Strong [ Str "Code" , Space , Str "caption" ] ] ]

Pandoc version?
Pandoc development version

Possible fix
I have never programmed in haskell, but I looked around the code a bit and I found a working solution, this is the diff:

diff --git a/src/Text/Pandoc/Readers/DocBook.hs b/src/Text/Pandoc/Readers/DocBook.hs
index e11da4253..cf08d04d6 100644
--- a/src/Text/Pandoc/Readers/DocBook.hs
+++ b/src/Text/Pandoc/Readers/DocBook.hs
@@ -858,7 +858,7 @@ parseBlock (Elem e) =
         "para"  -> parseMixed para (elContent e)
         "formalpara" -> do
            tit <- case filterChild (named "title") e of
-                        Just t  -> divWith ("",["formalpara-title"],[]) .
+                        Just t  -> divWith (attrValue "id" e,["formalpara-title"],[]) .
                                    para .  strong <$> getInlines t
                         Nothing -> return mempty
            (tit <>) <$> parseMixed para (elContent e)

This fixes the id attribute, but note that there is also the related issue #3657 for which the role attributes are not saved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions