Description
Explain the problem.
DocBook reader ignores the id attribute of formalpara
elements. This attribute is needed for cross-references.
I found this problem when trying to convert an asciidoc document that references code blocks. Since pandoc does not support direct asciidoc conversion, I used the DocBook backend of asciidoctor
to generate a DocBook document, but I found that when I tried to convert the DocBook document to other formats, the references to the code blocks were broken.
For a minimal example, consider this asciidoc code:
= My document
My code is in <<my_code_id>>.
.Code caption
[#my_code_id,bash]
----
echo "hello world"
----
When converting to docbook with asciidoctor -b docbook example.adoc
the following DocBook is produced:
<?xml version="1.0" encoding="UTF-8"?>
<?asciidoc-toc?>
<?asciidoc-numbered?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
<info>
<title>My document</title>
<date>2023-03-03</date>
</info>
<simpara>My code is in <xref linkend="my_code_id"/>.</simpara>
<formalpara xml:id="my_code_id">
<title>Code caption</title>
<para>
<programlisting language="bash" linenumbering="unnumbered">echo "hello world"</programlisting>
</para>
</formalpara>
</article>
Then, when pandoc reads the DocBook code with the command pandoc -t native -f docbook
the following AST is returned:
[ Para
[ Str "My"
, Space
, Str "code"
, Space
, Str "is"
, Space
, Str "in"
, Space
, Link
( "" , [] , [] )
[ Str "formalpara_title" ]
( "#my_code_id" , "" )
, Str "."
]
, Div
( "" , [ "formalpara-title" ] , [] )
[ Para [ Strong [ Str "Code" , Space , Str "caption" ] ] ]
, CodeBlock ( "" , [ "bash" ] , [] ) "echo \"hello world\""
]
The problem here is that in the AST the Div element is missing the id and thus the previous reference to the code element is broken. The expected Div should be:
Div
( "my_code_id" , [ "formalpara-title" ] , [] )
[ Para [ Strong [ Str "Code" , Space , Str "caption" ] ] ]
Pandoc version?
Pandoc development version
Possible fix
I have never programmed in haskell, but I looked around the code a bit and I found a working solution, this is the diff:
diff --git a/src/Text/Pandoc/Readers/DocBook.hs b/src/Text/Pandoc/Readers/DocBook.hs
index e11da4253..cf08d04d6 100644
--- a/src/Text/Pandoc/Readers/DocBook.hs
+++ b/src/Text/Pandoc/Readers/DocBook.hs
@@ -858,7 +858,7 @@ parseBlock (Elem e) =
"para" -> parseMixed para (elContent e)
"formalpara" -> do
tit <- case filterChild (named "title") e of
- Just t -> divWith ("",["formalpara-title"],[]) .
+ Just t -> divWith (attrValue "id" e,["formalpara-title"],[]) .
para . strong <$> getInlines t
Nothing -> return mempty
(tit <>) <$> parseMixed para (elContent e)
This fixes the id
attribute, but note that there is also the related issue #3657 for which the role
attributes are not saved.