Pre-processing code for missing datasets #86

memray · 2022-03-30T02:44:03Z

Hi there,

I wonder if the code for preprocessing unprovided datasets could be released? For example, I downloaded bioasq and signal1m following the instruction. But it's not clear to me how to convert the raw dataset to corpus.jsonl, queries.jsonl, qrels/{train/dev/test}.tsv the same way as you did. I think it's critical for reproducing your results and fair benchmarking.

Thank you,
Rui

thakur-nandan · 2022-03-30T16:50:49Z

Hi @memray,

You can send out an email to nandant@gmail.com. I can send you the datasets privately.
Please ensure you are responsible for accepting the licenses for all private datasets.

Kind Regards,
Nandan Thakur

memray · 2022-03-30T17:31:18Z

Thanks @NThakur20 ! I'll reach out to you via email later.

gsgoncalves · 2022-06-09T16:56:34Z

Hi @NThakur20,
I sent an email as well.
It would be great if you could share the details for the html2text initialization for TREC News, and which Anserini tweet indexing options were used for Signal AI.
Thanks!

Cyril-JZ · 2023-05-16T04:08:22Z

Hi @thakur-nandan,

I also sent an email to kindly request access to the TREC-News dataset TGZ files. I assure you that the dataset will be used solely for academic purposes. I greatly appreciate your assistance and support!

Thanks!
im.jzfeng@gmail.com

thakur-nandan · 2023-05-16T04:14:31Z

Hi @gsgoncalves and @Cyril-JZ, you can find the private BEIR datasets here (all preprocessed): https://drive.google.com/drive/folders/1CgDO-KmQQMpGEGeD3R20ZgTTM008xix9?usp=sharing.

Hope it helps!

Thanks!

Cyril-JZ · 2023-05-16T04:23:46Z

Thanks for your prompt reply! It helps me a lot!

memray closed this as completed Mar 30, 2022

MathVast mentioned this issue Sep 21, 2023

Add BioASQ dataset to the list of supported BEIR datasets allenai/ir_datasets#250

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-processing code for missing datasets #86

Pre-processing code for missing datasets #86

memray commented Mar 30, 2022

thakur-nandan commented Mar 30, 2022 •

edited

Loading

memray commented Mar 30, 2022

gsgoncalves commented Jun 9, 2022

Cyril-JZ commented May 16, 2023

thakur-nandan commented May 16, 2023

Cyril-JZ commented May 16, 2023

Pre-processing code for missing datasets #86

Pre-processing code for missing datasets #86

Comments

memray commented Mar 30, 2022

thakur-nandan commented Mar 30, 2022 • edited Loading

memray commented Mar 30, 2022

gsgoncalves commented Jun 9, 2022

Cyril-JZ commented May 16, 2023

thakur-nandan commented May 16, 2023

Cyril-JZ commented May 16, 2023

thakur-nandan commented Mar 30, 2022 •

edited

Loading