paperscraper MCP Server

An MCP (Model Context Protocol) server that enables Large Language Models to search and retrieve academic papers from PubMed, arXiv, bioRxiv, medRxiv, chemRxiv, and Google Scholar.

Acknowledgments

This MCP server is built on top of the excellent paperscraper library by @jannisborn and contributors. The core paper scraping functionality comes from that implementation.

Note: This is an early version of the MCP integration and the code is currently quite messy. A cleaner implementation is coming soon!

Installation

git clone https://github.com/MCPmed/paperscraperMCP
cd paperscraperMCP
pip install -e .

MCP Server Setup

To use paperscraper with an LLM client that supports MCP (such as Claude Desktop), add the following to your MCP configuration:

{
  "mcpServers": {
    "paperscraper-server": {
      "type": "stdio",
      "command": "python",
      "args": ["-m", "paperscraper.mcp_server"],
      "env": {}
    }
  }
}

Available Functions

The MCP server provides the following functions for paper searching and retrieval:

search_pubmed: Search PubMed database using keyword queries with AND/OR logic
search_arxiv: Search arXiv database using keyword queries with AND/OR logic
search_scholar: Search Google Scholar using topic-based queries
search_preprint_servers: Search bioRxiv, medRxiv, and chemRxiv preprint servers
get_citations: Get citation count for a paper by title or DOI
search_journal_impact: Search for journal impact factors
download_paper_pdf: Download PDF of a paper using its DOI
update_preprint_dumps: Update local dumps of preprint servers

Usage Examples

Once configured, you can ask your LLM to:

"Search PubMed for papers on COVID-19 and artificial intelligence"
"Find recent arXiv papers on transformer models in computer vision"
"Get the citation count for 'Attention is All You Need'"
"Download the PDF for DOI 10.1234/example"
"Search bioRxiv for papers on CRISPR from 2023"

Database Setup

Download Preprint Server Dumps

To search preprint servers (bioRxiv, medRxiv, chemRxiv), you need to download their dumps first. These are stored locally in .jsonl format:

from paperscraper.get_dumps import biorxiv, medrxiv, chemrxiv
medrxiv()  #  Takes ~30min and should result in ~35 MB file
biorxiv()  # Takes ~1h and should result in ~350 MB file
chemrxiv()  #  Takes ~45min and should result in ~20 MB file

You can also update dumps for specific date ranges:

medrxiv(start_date="2023-04-01", end_date="2023-04-08")

Local arXiv Dump

For faster arXiv searches, you can create a local dump instead of using the API:

from paperscraper.get_dumps import arxiv
arxiv(start_date='2024-01-01', end_date=None) # scrapes all metadata from 2024 until today.

Original paperscraper Features

For detailed documentation on the underlying paperscraper functionality, including:

Direct Python API usage
Advanced search queries with Boolean logic
Full-text PDF/XML retrieval
Citation counting
Journal impact factor lookup
Data visualization (bar plots, Venn diagrams)
Publisher API integration (Wiley, Elsevier)

Please visit the original paperscraper repository.

License and Citation

This project uses the paperscraper library. If you use this MCP server in your research, please cite:

@article{born2021trends,
  title={Trends in Deep Learning for Property-driven Drug Design},
  author={Born, Jannis and Manica, Matteo},
  journal={Current Medicinal Chemistry},
  volume={28},
  number={38},
  pages={7862--7886},
  year={2021},
  publisher={Bentham Science Publishers}
}

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
paperscraper		paperscraper
.codespellrc		.codespellrc
.coveragerc		.coveragerc
.gitignore		.gitignore
.mcp.json		.mcp.json
LICENSE		LICENSE
README.md		README.md
README_MCP.md		README_MCP.md
codecov.yml		codecov.yml
mcp_config.json		mcp_config.json
mcp_server_standalone.py		mcp_server_standalone.py
mcp_server_wrapper.py		mcp_server_wrapper.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_mcp_server.py		run_mcp_server.py
setup.py		setup.py
start_mcp.py		start_mcp.py
test_mcp.py		test_mcp.py
test_mcp_connection.py		test_mcp_connection.py
test_mcp_schemas.py		test_mcp_schemas.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

paperscraper MCP Server

Acknowledgments

Table of Contents

Installation

MCP Server Setup

Available Functions

Usage Examples

Database Setup

Download Preprint Server Dumps

Local arXiv Dump

Original paperscraper Features

License and Citation

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

MCPmed/paperscraperMCP

Folders and files

Latest commit

History

Repository files navigation

paperscraper MCP Server

Acknowledgments

Table of Contents

Installation

MCP Server Setup

Available Functions

Usage Examples

Database Setup

Download Preprint Server Dumps

Local arXiv Dump

Original paperscraper Features

License and Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages