Skip to content
/ PubMind Public

PubMind is a large language model (LLM)-assisted framework for Publication Mutation and information Discovery, designed to extract variant–disease–pathogenicity relationships directly from biomedical literature.

License

Notifications You must be signed in to change notification settings

WGLab/PubMind

Repository files navigation

pubmind_logo_v1

PubMind is a large language model (LLM)-assisted framework for Publication Mutation and information Discovery, designed to extract variant–disease–pathogenicity relationships directly from biomedical literature.

image

PubMind is an AI-driven framework that uses large language models (LLMs) to extract genetic variant–disease–pathogenicity associations directly from biomedical literature. It combines fine-tuned BERT models for input filtering with instruction-tuned LLMs for extracting variant, disease, and functional evidence, covering SNVs, CNVs, SVs, and gene fusions. Extracted variants are normalized to genomic and transcript coordinates and stored in PubMind-DB, a web-accessible knowledgebase. Applied to >41M PubMed abstracts and >5M PMC full texts, PubMind-DB contains ~0.7M consolidated unique variants with rich annotations, of which only ~10% overlap with ClinVar—yet >80% of those show concordant pathogenicity labels, including full agreement for four-star expert-reviewed variants. PubMind provides a scalable, generalizable, and open-source framework that transforms unstructured text into structured genomic knowledge, supporting variant interpretation and precision medicine.

Prerequisite

Please refer to requirements.txt for required packages.

Run PubMind

Please refer to run_PubMind.ipynb for how to use PubMind. All inputs and outputs during this example PubMind run are in the example folder.

PubMind frameworkds includes the following modules:

  1. Filtering Module (finetuned BERT model)
    • Wangwpi/PubMind_finetuned_BERT (Hugging Face)
  2. Inference Module (instruction-tuned LLM)
    • meta-llama/Llama-3.3-70B-Instruct (Hugging Face)
  3. Normalization Module
    • Quality filter (gene name, pathogenicity)
    • Variant parser (cDNA, protein, RSID)
    • Map to transcript
    • Map to genome cooridnates
    • MONDO Disease name
    • HPO term

PubMind-DB

PubMind-DB could be accessed here: https://pubmind.wglab.org/

License

PubMind is freely available for academic use. For license details, please refer to this page.

About

PubMind is a large language model (LLM)-assisted framework for Publication Mutation and information Discovery, designed to extract variant–disease–pathogenicity relationships directly from biomedical literature.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published