GitHub - WGLab/PubMind: PubMind is a large language model (LLM)-assisted framework for Publication Mutation and information Discovery, designed to extract variant–disease–pathogenicity relationships directly from biomedical literature.

PubMind is a large language model (LLM)-assisted framework for Publication Mutation and information Discovery, designed to extract variant–disease–pathogenicity relationships directly from biomedical literature.

PubMind is an AI-driven framework that uses large language models (LLMs) to extract genetic variant–disease–pathogenicity associations directly from biomedical literature. It combines fine-tuned BERT models for input filtering with instruction-tuned LLMs for extracting variant, disease, and functional evidence, covering SNVs, CNVs, SVs, and gene fusions. Extracted variants are normalized to genomic and transcript coordinates and stored in PubMind-DB, a web-accessible knowledgebase. Applied to >41M PubMed abstracts and >5M PMC full texts, PubMind-DB contains ~0.7M consolidated unique variants with rich annotations, of which only ~10% overlap with ClinVar—yet >80% of those show concordant pathogenicity labels, including full agreement for four-star expert-reviewed variants. PubMind provides a scalable, generalizable, and open-source framework that transforms unstructured text into structured genomic knowledge, supporting variant interpretation and precision medicine.

Prerequisite

Please refer to requirements.txt for required packages.

Run PubMind

Please refer to run_PubMind.ipynb for how to use PubMind. All inputs and outputs during this example PubMind run are in the example folder.

PubMind frameworkds includes the following modules:

Filtering Module (finetuned BERT model)
- Wangwpi/PubMind_finetuned_BERT (Hugging Face)
Inference Module (instruction-tuned LLM)
- meta-llama/Llama-3.3-70B-Instruct (Hugging Face)
Normalization Module
- Quality filter (gene name, pathogenicity)
- Variant parser (cDNA, protein, RSID)
- Map to transcript
- Map to genome cooridnates
- MONDO Disease name
- HPO term

PubMind-DB

PubMind-DB could be accessed here: https://pubmind.wglab.org/

License

PubMind is freely available for academic use. For license details, please refer to this page.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.ipynb_checkpoints		.ipynb_checkpoints
LLM_inference		LLM_inference
example		example
input_filtering		input_filtering
normalization		normalization
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
run_PubMind.ipynb		run_PubMind.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prerequisite

Run PubMind

PubMind-DB

License

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

WGLab/PubMind

Folders and files

Latest commit

History

Repository files navigation

Prerequisite

Run PubMind

PubMind-DB

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages