local-llm-ref-verifier

Privacy-preserving citation verification for unpublished research manuscripts. The manuscript text never leaves your local machine.

How it works

Three-stage pipeline:

Extract (local, no internet) -- Parses the PDF reference section using regex. Auto-detects citation style (APA, IEEE, Vancouver, Harvard, Chicago). Outputs structured JSON.
Verify (online, metadata only) -- Checks each reference title/author against CrossRef, Semantic Scholar, and Google Scholar APIs. Only minimal metadata is sent. Computes confidence scores via fuzzy matching. Also fetches paper abstracts and summaries (when available) for correctness checking in Stage 3.
Audit (local, no internet) -- Uses a local LLM (Ollama) to compare the manuscript body against verified references. Flags uncited references, missing citations, year mismatches, misquoted claims, and unsupported claims. Uses fetched abstracts/summaries to verify that the manuscript accurately represents the cited work.

Install

pip install -e ".[dev]"

Requires Ollama for Stage 3 only:

ollama pull llama3.1

Usage

Run the full pipeline:

ref-verifier run paper.pdf -o output/ -m llama3.1

Or run stages independently:

ref-verifier extract paper.pdf -o refs.json
ref-verifier verify refs.json -o verified.json
ref-verifier audit paper.pdf verified.json -o report.json -m llama3.1

Options:

-s / --style -- Force citation style (apa, ieee, vancouver, harvard, chicago). Auto-detected if omitted.
-m / --model -- Ollama model name (default: llama3.1). Only used by audit and run.
--google-scholar -- Enable Google Scholar fallback (slow, rate-limited).
-v / --verbose -- Verbose logging.

Output

Each stage produces a JSON file:

extracted_references.json -- Parsed references with authors, title, year, journal, volume, pages, DOI.
verification_results.json -- Verification status (verified/ambiguous/not_found), confidence scores, canonical metadata, abstracts, and TLDR summaries.
audit_report.json -- Citation issues list with severity and a human-readable summary.

Testing

Run tests (excluding slow live-API tests):

pytest -m "not slow"

Run all tests including live API verification:

pytest -m slow

The test suite includes 13 real research papers across all 5 citation styles (IEEE, Vancouver, APA, Harvard, Chicago) with single-column and two-column layouts. Each paper has a companion JSON with 3 injected fake citations for verifier testing.

To do list

Support latex and .docx import
Measure different local LLM performance
Integrate the option for paid/premium scholar API
Improve front-end interface
Add JSON file cleanup
Improve test suite: download real papers with real citations and add fake citations

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.claude		.claude
doc/images		doc/images
src/ref_verifier		src/ref_verifier
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

local-llm-ref-verifier

How it works

Install

Usage

Output

Testing

To do list

About

Uh oh!

Releases

Packages

Languages

PMQ9/local-llm-ref-verifier

Folders and files

Latest commit

History

Repository files navigation

local-llm-ref-verifier

How it works

Install

Usage

Output

Testing

To do list

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages