Add BioASQ dataset to the list of supported BEIR datasets #250

MathVast · 2023-09-21T13:36:59Z

Hi @seanmacavaney I would like to use the BioASQ dataset for an experiment and I have stumbled across this on the GitHub repo of the BEIR paper beir-cellar where the author links the preprocessed data for the 4 datasets marked as "unavailable". I am aware that you've been trying to extend the list of available datasets from the benchmark on ir_datasets (ie. this issue) and I was wondering if, given these resources, BioASQ could be integrated to the catalog?

Dataset Information:

BioASQ is a dataset featuring in the BEIR benchmark and originated from a challenge around "biomedical semantic indexing and question answering". More information about the challenge and the dataset can be found here: http://bioasq.org/

Links to Resources:

Link to the steps listed on beir-cellar in order to reproduce the files: https://github.com/beir-cellar/beir/tree/main/examples/dataset#2-bioasq ;
Link to the Google Drive space linked in the issue cited above where the preprocessed data can be found: https://drive.google.com/drive/folders/1CgDO-KmQQMpGEGeD3R20ZgTTM008xix9

Dataset ID(s) & supported entities:

beir/bioasq-2020: queries, docs
beir/bioasq-2020/train: queries, docs, qrels
beir/bioasq-2020/test: queries, docs, qrels

Checklist

Mark each task once completed. All should be checked prior to merging a new dataset.

Dataset definition (in ir_datasets/datasets/[topid].py)
Tests (in tests/integration/[topid].py)
Metadata generated (using ir_datasets generate_metadata command, should appear in ir_datasets/etc/metadata.json)
Documentation (in ir_datasets/etc/[topid].yaml)
- Documentation generated in https://github.com/seanmacavaney/ir-datasets.com/
Downloadable content (in ir_datasets/etc/downloads.json)
- Download verification action (in .github/workflows/verify_downloads.yml). Only one needed per topid.
- Any small public files from NIST (or other potentially troublesome files) mirrored in https://github.com/seanmacavaney/irds-mirror/. Mirrored status properly reflected in downloads.json.

The text was updated successfully, but these errors were encountered:

seanmacavaney · 2023-10-07T11:02:36Z

Hey @MathVast! Sorry for the delay -- the start of semester is a busy time.

Thanks for opening the issue. This seems doable and like a good addition to the package.

MathVast · 2023-10-07T14:27:05Z

No problem, in the meantime I've made a fork and worked on the integration in ir_datasets of BioASQ on my side. I've been playing with the dataset through XPM-IR and it seems to be working but you might want to check some of the choices I've made. If it's okay for you @seanmacavaney I can open a PR.

MathVast added the add-dataset label Sep 21, 2023

MathVast mentioned this issue Jan 17, 2024

BioASQ #253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BioASQ dataset to the list of supported BEIR datasets #250

Add BioASQ dataset to the list of supported BEIR datasets #250

MathVast commented Sep 21, 2023

seanmacavaney commented Oct 7, 2023

MathVast commented Oct 7, 2023

Add BioASQ dataset to the list of supported BEIR datasets #250

Add BioASQ dataset to the list of supported BEIR datasets #250

Comments

MathVast commented Sep 21, 2023

seanmacavaney commented Oct 7, 2023

MathVast commented Oct 7, 2023