VANER2: Towards more general biomedical named entity recognition using multi-task large language model encoders

Preparation

Create python 3.10 environment with requirements.txt.
Due to possible copyright issues, the training and testing dataset will not be directly available, but is available upon reasonable request. We provided example data in the data folder.
Download the Llama-3.1-8B-Instruct base model from https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct and put them under base_models folder.
Pretrained parameters of VANER2 can be downloaded at https://huggingface.co/droidlyx866/VANER2, and unziped to finetuned_models folder.

Prediction

For VANER2 prediction, run run_VANER2_NER.py --model_names [folder names of trained models in the results folder]

Example: python run_VANER2_NER.py --model_names VANER2

The model predictions should be saved in results/[dataset folder name]/[model name]

Note: 20GB GPU memory is required. The pybind11 module is to speed up prediction, without it the prediction will be slower. To install it, run pip install ./pybind11_module

Evaluation

For Evaluation of the results, run evaluate_results.py, which evaluates all results under results/[dataset folder name] and generate a NER_summary.tsv file containing the results.

More detailed scores are saved in results/[dataset folder name]/[model name]/all_NER_scores.txt

Baseline methods

For prediction of other baseline methods, run run_other_models.py, the results can then be evaluated using evaluate_results.py.

Note that for Hunflair2, you need to download the model from https://huggingface.co/hunflair/hunflair2-ner and save to finetuned_models/hunflair2-ner

For scispacy, you need to install four scispacy models en_ner_craft_md, en_ner_jnlpba_md, en_ner_bc5cdr_md, en_ner_bionlp13cg_md. see https://github.com/allenai/scispacy for guidelines.

AIONER needs a separate python environment, thus are not included in our code, please refer to https://github.com/ncbi/AIONER to reproduce their results. Similarly, for VANER, please refer to https://github.com/Eulring/VANER.

Entity Linking

Additionally, you can run the code run_hunflair2_NEN.py to automatically add Entity Linking results to the NER results of VANER2 using the pretrained models of Hunflair2 [1]. The code will automatically download the EL models from the flair repository. Alternatively, you can download the EL models from the following links manually and modify run_hunflair2_NEN.py to load the downloaded models:

Gene-linker: https://huggingface.co/hunflair/biosyn-sapbert-bc2gn

Chemical-linker: https://huggingface.co/hunflair/biosyn-sapbert-bc5cdr-chemical

Disease-linker: https://huggingface.co/hunflair/biosyn-sapbert-ncbi-disease

Species-linker: https://huggingface.co/hunflair/sapbert-ncbi-taxonomy

After that, run evaluate_results.py --evaluate_NEN True to evaluate NER and NEN simutaneously, if NEN labels are provided in the test dataset.

[1] Sänger M, Garda S, Wang X D, et al. HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools[J]. Bioinformatics, 2024, 40(10): btae564.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VANER2: Towards more general biomedical named entity recognition using multi-task large language model encoders

Preparation

Prediction

Evaluation

Baseline methods

Entity Linking

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
base_models		base_models
data/Converted_New		data/Converted_New
finetuned_models		finetuned_models
model		model
pybind11_module		pybind11_module
results		results
LICENSE		LICENSE
LLAMA3.1_prompt.txt		LLAMA3.1_prompt.txt
README.md		README.md
evaluate_results.py		evaluate_results.py
requirements.txt		requirements.txt
run_VANER2_NER.py		run_VANER2_NER.py
run_hunflair2_NEN.py		run_hunflair2_NEN.py
run_other_models.py		run_other_models.py

License

ZhuLab-Fudan/VANER2

Folders and files

Latest commit

History

Repository files navigation

VANER2: Towards more general biomedical named entity recognition using multi-task large language model encoders

Preparation

Prediction

Evaluation

Baseline methods

Entity Linking

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages