🔬 Exciting breakthrough in BioNLP! 🧬
We're thrilled to introduce BioInstruct—a dataset enhancing LLMs like Llama with 25,000+ tailored instructions for biomedical tasks. Our research shows remarkable gains in question answering (QA), information extraction (IE), and text generation.
🌟 Highlights:
- 17.3% boost in QA accuracy
- 5.7% increase in IE F1 score
- 96% improvement in text generation tasks
By marrying instruction tuning with multi-task learning, our results also show that the performance gain is significantly higher when the LLM is instruction fine-tuned on closely related tasks.
For more details, please check out our paper.
The BioInstruct dataset is available through huggingface dataset.
@article{Tran2024Bioinstruct,
author = {Tran, Hieu and Yang, Zhichao and Yao, Zonghai and Yu, Hong},
title = "{BioInstruct: instruction tuning of large language models for biomedical natural language processing}",
journal = {Journal of the American Medical Informatics Association},
pages = {ocae122},
year = {2024},
month = {06},
issn = {1527-974X},
doi = {10.1093/jamia/ocae122},
url = {https://doi.org/10.1093/jamia/ocae122},
eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocae122/58084577/ocae122.pdf},
}
Have a specific task and instruction you'd like an LLM to perform in a clinical setting? Raise a new issue here! Your contributions will aid in refining LLMs to be more effective and relevant in healthcare environments.