You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @seanmacavaney I would like to use the BioASQ dataset for an experiment and I have stumbled across this on the GitHub repo of the BEIR paper beir-cellar where the author links the preprocessed data for the 4 datasets marked as "unavailable". I am aware that you've been trying to extend the list of available datasets from the benchmark on ir_datasets (ie. this issue) and I was wondering if, given these resources, BioASQ could be integrated to the catalog?
Dataset Information:
BioASQ is a dataset featuring in the BEIR benchmark and originated from a challenge around "biomedical semantic indexing and question answering". More information about the challenge and the dataset can be found here: http://bioasq.org/
Downloadable content (in ir_datasets/etc/downloads.json)
Download verification action (in .github/workflows/verify_downloads.yml). Only one needed per topid.
Any small public files from NIST (or other potentially troublesome files) mirrored in https://github.com/seanmacavaney/irds-mirror/. Mirrored status properly reflected in downloads.json.
The text was updated successfully, but these errors were encountered:
No problem, in the meantime I've made a fork and worked on the integration in ir_datasets of BioASQ on my side. I've been playing with the dataset through XPM-IR and it seems to be working but you might want to check some of the choices I've made. If it's okay for you @seanmacavaney I can open a PR.
Hi @seanmacavaney I would like to use the BioASQ dataset for an experiment and I have stumbled across this on the GitHub repo of the BEIR paper beir-cellar where the author links the preprocessed data for the 4 datasets marked as "unavailable". I am aware that you've been trying to extend the list of available datasets from the benchmark on ir_datasets (ie. this issue) and I was wondering if, given these resources, BioASQ could be integrated to the catalog?
Dataset Information:
BioASQ is a dataset featuring in the BEIR benchmark and originated from a challenge around "biomedical semantic indexing and question answering". More information about the challenge and the dataset can be found here: http://bioasq.org/
Links to Resources:
Link to the steps listed on beir-cellar in order to reproduce the files: https://github.com/beir-cellar/beir/tree/main/examples/dataset#2-bioasq ;
Link to the Google Drive space linked in the issue cited above where the preprocessed data can be found: https://drive.google.com/drive/folders/1CgDO-KmQQMpGEGeD3R20ZgTTM008xix9
Dataset ID(s) & supported entities:
beir/bioasq-2020
: queries, docsbeir/bioasq-2020/train
: queries, docs, qrelsbeir/bioasq-2020/test
: queries, docs, qrelsChecklist
Mark each task once completed. All should be checked prior to merging a new dataset.
ir_datasets/datasets/[topid].py
)tests/integration/[topid].py
)ir_datasets generate_metadata
command, should appear inir_datasets/etc/metadata.json
)ir_datasets/etc/[topid].yaml
)ir_datasets/etc/downloads.json
).github/workflows/verify_downloads.yml
). Only one needed pertopid
.downloads.json
.The text was updated successfully, but these errors were encountered: