Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It couldn't recognize the xml file I downloaded from pubmed #148

Open
wildwhip opened this issue Jul 29, 2024 · 6 comments
Open

It couldn't recognize the xml file I downloaded from pubmed #148

wildwhip opened this issue Jul 29, 2024 · 6 comments
Labels

Comments

@wildwhip
Copy link

wildwhip commented Jul 29, 2024

I have downloaded the xml file from "https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/"
and then

{
	"name": "XPathEvalError",
	"message": "Error in xpath expression",
	"stack": "---------------------------------------------------------------------------
XPathEvalError                            Traceback (most recent call last)
Cell In[14], line 3
      1 import pubmed_parser as pp
      2 path_xml = pp.list_xml_path(\"...\xml\")
----> 3 pubmed_dict = pp.parse_pubmed_xml(path_xml[0]) # dictionary output
      4 print(pubmed_dict)

File ......\\pubmed_parser\\pubmed_oa_parser.py:182, in parse_pubmed_xml(path, include_path, nxml)
    179     subjects = \"\"
    181 # create affiliation dictionary
****--> 182 affil_id = tree.xpath(\".//aff[@id]/@id\")****
    183 if len(affil_id) > 0:
    184     affil_id = list(map(str, affil_id))

File src\\\\lxml\\\\etree.pyx:2342, in lxml.etree._ElementTree.xpath()

File src\\\\lxml\\\\xpath.pxi:342, in lxml.etree.XPathDocumentEvaluator.__call__()

File src\\\\lxml\\\\xpath.pxi:210, in lxml.etree._XPathEvaluatorBase._handle_result()

XPathEvalError: Error in xpath expression"
}
@wildwhip wildwhip added the bug label Jul 29, 2024
@wildwhip wildwhip changed the title Itcouldn't recognize the xml file I downloaded from pubmed It couldn't recognize the xml file I downloaded from pubmed Jul 29, 2024
@Michael-E-Rose
Copy link
Collaborator

Which file specifically do you mean with the xml file? Also, which version are you using?

@wildwhip
Copy link
Author

wildwhip commented Sep 16, 2024

from https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/
the lastest version

image

@Michael-E-Rose
Copy link
Collaborator

All of them, or a particular xml file? The website has multiple hundred xml files.

@titipata
Copy link
Owner

I think you might need to use pp.parse_medline_xml instead. The PubMed one is for PubMed Open Access corpus.

@wildwhip
Copy link
Author

All of them, or a particular xml file? The website has multiple hundred xml files.

yes,all of them

@wildwhip
Copy link
Author

I think you might need to use pp.parse_medline_xml instead. The PubMed one is for PubMed Open Access corpus.

where can i find this tool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants