Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docx+Citations import fails with multiple sources (Endnote) #8433

Open
frederik opened this issue Nov 10, 2022 · 5 comments
Open

Docx+Citations import fails with multiple sources (Endnote) #8433

frederik opened this issue Nov 10, 2022 · 5 comments

Comments

@frederik
Copy link

frederik commented Nov 10, 2022

Explain the problem.
When importing a docx that has multiple sources combined in one references pandoc -s test.docx -f docx+citations -o test.json fails with

Invalid XML:
Missing root element

I am attaching a docx for reproduction with 1: multiple sources combined and then a single one. As far as I can see at first glance, the multiple sources are contained in the fldData (base64 encoded) while the single source is encoded inside the instrText.

I have reached out to the publisher to find out the exact Endnote Citation Plugin version that was used to create the document. (edit: EndNote X7.8 (Bld 11583))

combined.docx

Pandoc version?

pandoc 2.19.2 (installed with brew on MacOS (ARM))
Compiled with pandoc-types 1.22.2.1, texmath 0.12.5.2, skylighting 0.13,
citeproc 0.8.0.1, ipynb 0.2, hslua 2.2.1
Scripting engine: Lua 5.4
@frederik frederik added the bug label Nov 10, 2022
@jgm
Copy link
Owner

jgm commented Nov 10, 2022

It's a strange format here; the instrText and the data aren't even in the same node:

      <w:r w:rsidR="008138BA">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:fldChar w:fldCharType="begin">
          <w:fldData xml:space="preserve">
...base64data...
</w:fldData>
        </w:fldChar>
      </w:r>
      <w:r w:rsidR="008138BA">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:instrText xml:space="preserve">
 ADDIN EN.CITE.DATA 
</w:instrText>
      </w:r>

And there are several of these pairs in a row.

@frederik
Copy link
Author

@jgm could we maybe activate the Zotero and the Endnote reference detection separately? IMHO the Endnote detection is de facto unusable because most documents will contain combined citations, and thus they all need the feature deactivated.

Zotero, however, works great, and I think it's one of the most valuable features added to the docx reader in the last years.

@jgm
Copy link
Owner

jgm commented Apr 22, 2024

Activating separately would only help if the same document contains both zotero and endnote citations. And that's not going to be common, is it?

Otherwise, I'd say: just use +citations for zotero and don't use it for endnote.

@frederik
Copy link
Author

Activating it separately would allow us to still use Zotero references and ignore documents with Endnote (of which most fail with an error). We will have to catch the error and then run the conversion again having citations turned off.

@jgm
Copy link
Owner

jgm commented Apr 23, 2024

Another possibility, perhaps, is that we could catch the error in pandoc and ignore such cases.
Or issue a warning.

jgm added a commit that referenced this issue Apr 23, 2024
when we can't parse EndNote citations.  See #8433.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants