How to Measure the Reproducibility of System-oriented IR Experiments

This repository contains the accompanying code, dataset and online appendix of:

Timo Breuer, Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Philipp Schaer, and Ian Soboroff. 2020. How to Measure the Reproducibility of System-oriented IR Experiments. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’20).

Abstract

Replicability and reproducibility of experimental results are primary concerns in all the areas of science and IR is not an exception. Besides the problem of moving the field towards more reproducible experimental practices and protocols, we also face a severe methodological issue: we do not have any means to assess when reproduced is reproduced. Moreover, we lack any reproducibility-oriented dataset, which would allow us to develop such methods.

To address these issues, we compare several measures to objectively quantify to what extent we have replicated or reproduced a system-oriented IR experiment. These measures operate at different levels of granularity, from the fine-grained comparison of ranked lists, to the more general comparison of the obtained effects and significant differences. Moreover, we also develop a reproducibility-oriented dataset, which allows us to validate our measures and which can also be used to develop future measures.

Project overview

appendix/: online appendix with additional Tables and Figures
config/: configurations of each run constellation
core/: core modules of the reimplementation
dataset/: replicated and reproduced results of wcrobust04 and wcrobust0405 with 200 runs in total
evaluation/: scripts for the evaluation of the experimental setup
replicability/: scripts for producing replicated results
reproducibility/: scripts for producing reproduced results

Setup

Install requirements with pip:
```
pip install -r requirements.txt
```
Download English stopwords for nltk:
```
python -m nltk.downloader stopwords
```

Clone trec_eval and compile it in this directory:

git clone https://github.com/usnistgov/trec_eval.git && make -C trec_eval

Edit config/config.py by adding the path of the four test collections to the parameters robust04, robust05, core17 and core18.
Specify one of the 50 run constellations with the help of config/settings.py. Set the parameter num_con to the appropriate number of the constellation. If the preprocessing has already been done for a previous run, it can be omitted by setting the parameter data_prep to False.
Run the commands below for producing the respective run.

	Replicability	Reproducibility
WCRobust04	`python -m replicability.wcrobust04`	`python -m reproducibility.wcrobust04`
WCRobust0405	`python -m replicability.wcrobust0405`	`python -m reproducibility.wcrobust0405`

Workflow

Mapping run names to constellation numbers

run name	constellation number
rpl/rpd_tf_1	45
rpl/rpd_tf_2	46
rpl/rpd_tf_3	47
rpl/rpd_tf_4	48
rpl/rpd_tf_5	49
rpl/rpd_df_1	14
rpl/rpd_df_2	15
rpl/rpd_df_3	16
rpl/rpd_df_4	17
rpl/rpd_df_5	18
rpl/rpd_tol_1	39
rpl/rpd_tol_2	38
rpl/rpd_tol_3	37
rpl/rpd_tol_4	36
rpl/rpd_tol_5	35
rpl/rpd_C_1	44
rpl/rpd_C_2	43
rpl/rpd_C_3	42
rpl/rpd_C_4	41
rpl/rpd_C_5	40

Evaluation

For the evaluation script you need Matlab and Matters.

Alternatively, some evaluation measures are already pre-computed and stored in csv files: evaluation/matlab/results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to Measure the Reproducibility of System-oriented IR Experiments

Abstract

Project overview

Setup

Workflow

Mapping run names to constellation numbers

Evaluation

About

Releases 1

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
appendix		appendix
config		config
core		core
dataset		dataset
doc		doc
evaluation		evaluation
qrels		qrels
replicability		replicability
reproducibility		reproducibility
README.md		README.md
requirements.txt		requirements.txt

irgroup/sigir2020-measure-reproducibility

Folders and files

Latest commit

History

Repository files navigation

How to Measure the Reproducibility of System-oriented IR Experiments

Abstract

Project overview

Setup

Workflow

Mapping run names to constellation numbers

Evaluation

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages