Automatically download, organize, and summarize arXiv papers directly into your Zotero library with AI-powered insights.
- Smart Search: Search arXiv by keywords, authors, categories, or date ranges
- Auto-Download: Automatically download paper PDFs and attach them to Zotero entries
- AI Summarization: Generate concise summaries using Google's Gemini AI (optional)
- Metadata Extraction: Preserve complete paper metadata including authors, abstract, and publication details
- Collection Support: Organize papers into specific Zotero collections
- Flexible Filtering: Filter by journal papers, conference proceedings, or preprints
- Batch Processing: Process multiple papers efficiently with progress tracking
- Python 3.7 or higher
- Zotero account with API access
- Internet connection for downloading papers
- Google AI API key (optional, for summarization features)
pip install arxiv-zotero-connector
pip install git+https://github.com/StepanKropachev/arxiv-zotero-connector.git
git clone https://github.com/StepanKropachev/arxiv-zotero-connector.git
cd arxiv-zotero-connector
pip install -e .
- Library ID: Visit Zotero Settings → Your user ID for API calls
- API Key:
- Go to Zotero Settings → New Private Key
- Grant all permissions and save the key
- Collection Key (optional):
- Open your Zotero web library
- Navigate to desired collection
- Copy the key from the URL:
.../collections/XXXXXXXX
Create a .env
file in your working directory:
ZOTERO_LIBRARY_ID=your_library_id
ZOTERO_API_KEY=your_api_key
COLLECTION_KEY=your_collection_key # Optional
GOOGLE_API_KEY=your_gemini_api_key # Optional, for AI summaries
Basic search:
arxiv-zotero --keywords "machine learning" --max-results 10
Advanced search with filters:
arxiv-zotero \
--keywords "transformer" "attention" \
--categories cs.AI cs.LG \
--start-date 2023-01-01 \
--max-results 20
Search by author:
arxiv-zotero --author "Yoshua Bengio" --start-date 2023-06-01
Create search_config.yaml
:
keywords:
- "reinforcement learning"
- "deep learning"
categories:
- "cs.AI"
- "cs.LG"
max_results: 50
start_date: "2023-01-01"
content_type: "journal" # journal, conference, or preprint
# AI Summarization settings (optional)
summarizer:
enabled: true
prompt: "Summarize this paper in 3 key points"
max_length: 300
Run with config:
arxiv-zotero --config search_config.yaml
from arxiv_zotero import ArxivZoteroCollector, ArxivSearchParams
import asyncio
async def main():
# Initialize collector
collector = ArxivZoteroCollector(
zotero_library_id="your_library_id",
zotero_api_key="your_api_key",
collection_key="optional_collection_key"
)
# Configure search
search_params = ArxivSearchParams(
keywords=["quantum computing", "quantum algorithms"],
categories=["quant-ph", "cs.CC"],
max_results=10,
start_date=datetime(2023, 1, 1)
)
# Run collection
successful, failed = await collector.run_collection_async(
search_params=search_params,
download_pdfs=True
)
print(f"Processed {successful} papers successfully, {failed} failed")
asyncio.run(main())
arxiv-zotero \
--keywords "neural architecture search" "AutoML" \
--categories cs.LG \
--content-type journal \
--start-date 2022-01-01 \
--max-results 100
arxiv-zotero \
--keywords "ICLR" "NeurIPS" \
--content-type conference \
--start-date 2023-01-01
arxiv-zotero --keywords "quantum" --no-pdf --max-results 50
Enable AI-powered paper summaries by adding your Google AI API key:
arxiv-zotero \
--keywords "large language models" \
--summarizer-enabled \
--summarizer-prompt "Explain this paper's contribution in simple terms" \
--summary-length 500
Common categories include:
- cs.AI: Artificial Intelligence
- cs.LG: Machine Learning
- cs.CL: Computation and Language
- cs.CV: Computer Vision
- stat.ML: Machine Learning (Statistics)
- math.OC: Optimization and Control
- quant-ph: Quantum Physics
Full list: arXiv Category Taxonomy
The tool preserves:
- Title, authors, abstract
- Publication date and journal references
- ArXiv ID and categories
- DOI (when available)
- Comments and version info
The tool respects arXiv's rate limits automatically. For large batch operations, consider using:
arxiv-zotero --keywords "your search" --rate-limit 5
Failed downloads are logged and can be retried:
- Check
arxiv_zotero.log
for details - Papers are processed independently
- Partial failures don't stop the entire batch
- "Collection not found": Verify your collection key or remove it to use the main library
- "API key invalid": Check your Zotero API key has proper permissions
- Import errors: Ensure all dependencies are installed:
pip install -r requirements.txt
- PDF download fails: Check your internet connection and disk space
For detailed logging:
import logging
logging.getLogger('arxiv_zotero').setLevel(logging.DEBUG)
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- arXiv API for providing access to paper metadata
- Zotero for the excellent reference management platform
- Google Gemini for AI summarization capabilities
- Initial release
- Core functionality for searching and collecting arXiv papers
- Zotero integration with metadata preservation
- AI-powered summarization support
- Command-line interface and Python API
Made with ❤️ by Stepan Kropachev