Data
Medical NLP Competition, dataset, large models, paper
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc
Chinese medical dialogue data 中文医疗对话数据集
This is the dataset for Chinese community medical question answering.
PubMedQA: A Dataset for Biomedical Research Question Answering
Code and dataset for our Bioinformatics 2022 paper: "A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets"
Collection of awesome medical dataset resources.
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。