An MCP (Model Context Protocol) server that enables Large Language Models to search and retrieve academic papers from PubMed, arXiv, bioRxiv, medRxiv, chemRxiv, and Google Scholar.
This MCP server is built on top of the excellent paperscraper library by @jannisborn and contributors. The core paper scraping functionality comes from that implementation.
Note: This is an early version of the MCP integration and the code is currently quite messy. A cleaner implementation is coming soon!
- Installation
- MCP Server Setup
- Available Functions
- Usage Examples
- Database Setup
- Original paperscraper Features
git clone https://github.com/MCPmed/paperscraperMCP
cd paperscraperMCP
pip install -e .To use paperscraper with an LLM client that supports MCP (such as Claude Desktop), add the following to your MCP configuration:
{
"mcpServers": {
"paperscraper-server": {
"type": "stdio",
"command": "python",
"args": ["-m", "paperscraper.mcp_server"],
"env": {}
}
}
}The MCP server provides the following functions for paper searching and retrieval:
- search_pubmed: Search PubMed database using keyword queries with AND/OR logic
- search_arxiv: Search arXiv database using keyword queries with AND/OR logic
- search_scholar: Search Google Scholar using topic-based queries
- search_preprint_servers: Search bioRxiv, medRxiv, and chemRxiv preprint servers
- get_citations: Get citation count for a paper by title or DOI
- search_journal_impact: Search for journal impact factors
- download_paper_pdf: Download PDF of a paper using its DOI
- update_preprint_dumps: Update local dumps of preprint servers
Once configured, you can ask your LLM to:
- "Search PubMed for papers on COVID-19 and artificial intelligence"
- "Find recent arXiv papers on transformer models in computer vision"
- "Get the citation count for 'Attention is All You Need'"
- "Download the PDF for DOI 10.1234/example"
- "Search bioRxiv for papers on CRISPR from 2023"
To search preprint servers (bioRxiv, medRxiv, chemRxiv), you need to download their dumps first. These are stored locally in .jsonl format:
from paperscraper.get_dumps import biorxiv, medrxiv, chemrxiv
medrxiv() # Takes ~30min and should result in ~35 MB file
biorxiv() # Takes ~1h and should result in ~350 MB file
chemrxiv() # Takes ~45min and should result in ~20 MB fileYou can also update dumps for specific date ranges:
medrxiv(start_date="2023-04-01", end_date="2023-04-08")For faster arXiv searches, you can create a local dump instead of using the API:
from paperscraper.get_dumps import arxiv
arxiv(start_date='2024-01-01', end_date=None) # scrapes all metadata from 2024 until today.For detailed documentation on the underlying paperscraper functionality, including:
- Direct Python API usage
- Advanced search queries with Boolean logic
- Full-text PDF/XML retrieval
- Citation counting
- Journal impact factor lookup
- Data visualization (bar plots, Venn diagrams)
- Publisher API integration (Wiley, Elsevier)
Please visit the original paperscraper repository.
This project uses the paperscraper library. If you use this MCP server in your research, please cite:
@article{born2021trends,
title={Trends in Deep Learning for Property-driven Drug Design},
author={Born, Jannis and Manica, Matteo},
journal={Current Medicinal Chemistry},
volume={28},
number={38},
pages={7862--7886},
year={2021},
publisher={Bentham Science Publishers}
}