Skip to content

BibTeX Metadata Enhancer is a Python-based tool that enriches .bib files with metadata from a variety of academic sources including CrossRef, arXiv, Semantic Scholar, and DBLP.Metadata can be enhanced via API queries and DOI-to-BibTeX conversion via doi2bib.org. Designed for easy extension and automation in academic workflows.

Notifications You must be signed in to change notification settings

robin-ck/bibtex_enhancer

Repository files navigation

BibTeX Metadata Enhancer

A Python tool to enhance BibTeX entries with metadata from various academic APIs and DOI services. It was mostly written by ChatGPT o3, i just needed to help it a bit with structuring the code and fixing some bugs.

Performance

Tested performance on sample bibliography:

  • Sync mode time: 78.21129846572876 seconds
  • Async mode time: 22.707439184188843 seconds

Features

  • Two-stage processing:
    1. Metadata enhancement from multiple APIs
    2. DOI-based BibTeX conversion using doi2bib.org
  • Fetches metadata from multiple sources:
    • CrossRef API
    • arXiv API
    • Semantic Scholar API
    • DBLP API
  • Web scraping support for doi2bib.org
  • Supports both synchronous and asynchronous operations (something broke with async mode :/// )
  • Modular provider system for easy extension
  • Handles DOI and title-based searches
  • Comprehensive logging system
  • Organized output file management

Installation

pip install bibtexparser aiohttp requests beautifulsoup4

Project Structure

.
├── bibtex_processor.py    # Combined processing orchestrator
├── metadata_enhancer.py   # API-based metadata enhancement
├── doi_converter.py       # DOI to BibTeX conversion
├── bibtex_output/        # Enhanced and converted BibTeX files
├── logs/                 # Detailed processing logs
├── providers/             # API providers directory
│   ├── __init__.py
│   ├── arxiv_provider.py  # arXiv API integration
│   └── crossref_provider.py  # CrossRef API integration + base class
└── README.md

Usage

Basic usage:

from bibtex_processor import BibTexProcessor

# Initialize processor with optional API keys
processor = BibTexProcessor(crossref_api_key="YOUR_API_KEY")

# Run asynchronously (faster for many entries)
async def main():
    final_bibtex = await processor.process_bibtex_file('your_file.bib', is_async=True)
    print(f"Processed file saved to: {final_bibtex}")

asyncio.run(main())

Output Files

The processor creates two types of output:

  1. Enhanced BibTeX files with metadata from APIs (I mostly use this)
  2. Final BibTeX files with DOI-based conversions
  3. you can also use the doi_converter.py to convert the bibtex file to a new bibtex file with the doi All files are saved in the bibtex_output directory with timestamps.

Logging

Three types of logs are generated in the logs directory:

  1. Success/failure logs for each operation
  2. Statistical information about processing
  3. Combined processing log with overall results

API Keys

  • CrossRef: Optional but recommended for better rate limits
  • arXiv: No API key required
  • Semantic Scholar: Optional but recommended for better rate limits
  • DBLP: No API key required

Contributing

Feel free to submit issues and enhancement requests!

About

BibTeX Metadata Enhancer is a Python-based tool that enriches .bib files with metadata from a variety of academic sources including CrossRef, arXiv, Semantic Scholar, and DBLP.Metadata can be enhanced via API queries and DOI-to-BibTeX conversion via doi2bib.org. Designed for easy extension and automation in academic workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published