-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fusion regression #2611
Merged
Merged
Add fusion regression #2611
Changes from 27 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
1cb5ea1
add fusion
cadurosar fe7e529
Merge branch 'castorini:master' into master
cadurosar c626a7c
added cadurosar's code via a copy + paste and made changes to match p…
DanielKohn1208 4d702d0
merged cadurosar's code
DanielKohn1208 7db24a9
moved FuseRuns
DanielKohn1208 e7efd1b
merged cadurosar's code with modifications
DanielKohn1208 ebd8ed4
added run fusion to match pyserini implementation
DanielKohn1208 1b398af
added fusion feature
Stefan824 27b44df
modified arguments; added test cases
Stefan824 72e6e06
modified TrecRun class code style
Stefan824 5f7ec35
added comment
Stefan824 509049c
deleted test file from previous version
Stefan824 39f62a9
Added dependency for junit test
Stefan824 37e89fa
resolved formatting; merged trectools module to fusion
Stefan824 54c74b4
remove unused test cases
Stefan824 32e13c2
removed unused test files
Stefan824 6c648f7
Merge remote-tracking branch 'origin/master' into add-fusion
Stefan824 a9d7804
added fusion regression script paired with two yaml test files
Stefan824 e049e48
added md for test
Stefan824 bd0ce76
add cmd on test instruction
Stefan824 17ceb49
removed abundant dependency
Stefan824 6f550b1
revert unecessary change
Stefan824 f4644e1
resolved a minor decoding issue
Stefan824 0ea8369
added a yaml that is based on regression test run results
Stefan824 ec57e96
added doc for test2
Stefan824 042b678
typo
Stefan824 f2b6f4c
changed name for test yamls
Stefan824 d94c0f9
second attempt to revert src/main/resources/regression/beir-v1.0.0-ro…
Stefan824 f5871b9
fixed precision and added run_origins for fusion yaml
Stefan824 b7961f3
removed two yamls that use runs not from current regression experiments
Stefan824 ab33853
modified test instructions according to last commit
Stefan824 db12c79
add yaml file
Stefan824 9a419aa
removed old yaml
Stefan824 d9cff54
changed output naming
Stefan824 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Fusion Regression Test Setup | ||
|
||
This document provides instructions for setting up and downloading the necessary run files to perform fusion regression tests. | ||
|
||
## Prerequisites | ||
You will need the following: | ||
- A working installation of `wget`. | ||
- Enough disk space to store the downloaded files. | ||
|
||
## Automatic Download Using Script for first two tests | ||
|
||
To automatically download the required files, you can use the following shell script. The script will download and extract the files in the `runs/runs.beir` folder with the correct filenames. | ||
|
||
```bash | ||
#!/bin/bash | ||
|
||
# Create the target directory if it doesn't exist | ||
mkdir -p runs/runs.beir | ||
|
||
# Download the run files from Google Drive using their file IDs | ||
wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=1XVlVCDYQe3YjRzxplaeGbmW_0EFQCgm8' -O runs/runs.beir/run.inverted.beir-v1.0.0-robust04.multifield.test.bm25 | ||
wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=1Z4rWlNgmXebMf1ardfiDg_4KIZImjqxt' -O runs/runs.beir/run.inverted.beir-v1.0.0-robust04.splade-pp-ed.test.splade-pp-ed-cached | ||
wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=1fExxJHkPPNCdtptKqWTbcsH0Ql0PnPqS' -O runs/runs.beir/run.inverted.beir-v1.0.0-robust04.flat.test.bm25 | ||
``` | ||
## Perform two regression runs for test fusion-regression-bge-flat-int8-robust04-2 | ||
|
||
One could generate the runs necessary for test fusion-regression-bge-flat-int8-robust04-2 following | ||
- https://github.com/castorini/anserini/blob/master/docs/regressions/regressions-beir-v1.0.0-robust04.bge-base-en-v1.5.flat-int8.cached.md | ||
- https://github.com/castorini/anserini/blob/master/docs/regressions/regressions-beir-v1.0.0-robust04.bge-base-en-v1.5.flat.cached.md | ||
|
||
## Run fuse-regression script with two yaml tests | ||
```bash | ||
python src/main/python/run_fusion_regression.py --regression fusion-regression-bge-flat-robust04-3 | ||
|
||
python src/main/python/run_fusion_regression.py --regression fusion-regression-bge-flat-robust04.yaml-2 | ||
|
||
python src/main/python/run_fusion_regression.py --regression fusion-regression-bge-flat-int8-robust04-2 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
# | ||
# Anserini: A Lucene toolkit for reproducible information retrieval research | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
import os | ||
import argparse | ||
import logging | ||
import time | ||
import yaml | ||
from subprocess import call, Popen, PIPE | ||
|
||
# Constants | ||
FUSE_COMMAND = 'bin/run.sh io.anserini.fusion.FuseTrecRuns' | ||
|
||
# Set up logging | ||
logger = logging.getLogger('fusion_regression_test') | ||
logger.setLevel(logging.INFO) | ||
ch = logging.StreamHandler() | ||
ch.setLevel(logging.INFO) | ||
formatter = logging.Formatter('%(asctime)s %(levelname)s [python] %(message)s') | ||
ch.setFormatter(formatter) | ||
logger.addHandler(ch) | ||
|
||
def is_close(a: float, b: float, rel_tol: float = 1e-9, abs_tol: float = 0.0) -> bool: | ||
"""Check if two numbers are close within a given tolerance.""" | ||
return abs(a - b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol) | ||
|
||
def check_output(command: str) -> str: | ||
"""Run a shell command and return its output. Raise an error if the command fails.""" | ||
process = Popen(command, shell=True, stdout=PIPE) | ||
output, err = process.communicate() | ||
if process.returncode == 0: | ||
return output | ||
else: | ||
raise RuntimeError(f"Command {command} failed with error: {err}") | ||
|
||
def construct_fusion_commands(yaml_data: dict) -> list: | ||
""" | ||
Constructs the fusion commands from the YAML configuration. | ||
|
||
Args: | ||
yaml_data (dict): The loaded YAML configuration. | ||
|
||
Returns: | ||
list: A list of commands to be executed. | ||
""" | ||
return [ | ||
[ | ||
FUSE_COMMAND, | ||
'-runs', ' '.join([run for run in yaml_data['runs']]), | ||
'-output', method.get('output'), | ||
'-method', method.get('name', 'average'), | ||
'-k', str(method.get('k', 1000)), | ||
'-depth', str(method.get('depth', 1000)), | ||
'-rrf_k', str(method.get('rrf_k', 60)), | ||
'-alpha', str(method.get('alpha', 0.5)) | ||
] | ||
for method in yaml_data['methods'] | ||
] | ||
|
||
def run_fusion_commands(cmds: list): | ||
""" | ||
Run the fusion commands and log the results. | ||
|
||
Args: | ||
cmds (list): List of fusion commands to run. | ||
""" | ||
for cmd_list in cmds: | ||
cmd = ' '.join(cmd_list) | ||
logger.info(f'Running command: {cmd}') | ||
try: | ||
return_code = call(cmd, shell=True) | ||
if return_code != 0: | ||
logger.error(f"Command failed with return code {return_code}: {cmd}") | ||
except Exception as e: | ||
logger.error(f"Error executing command {cmd}: {str(e)}") | ||
|
||
def evaluate_and_verify(yaml_data: dict, dry_run: bool): | ||
""" | ||
Runs the evaluation and verification of the fusion results. | ||
|
||
Args: | ||
yaml_data (dict): The loaded YAML configuration. | ||
dry_run (bool): If True, output commands without executing them. | ||
""" | ||
fail_str = '\033[91m[FAIL]\033[0m ' | ||
ok_str = ' [OK] ' | ||
failures = False | ||
|
||
logger.info('=' * 10 + ' Verifying Fusion Results ' + '=' * 10) | ||
|
||
for method in yaml_data['methods']: | ||
for i, topic_set in enumerate(yaml_data['topics']): | ||
for metric in yaml_data['metrics']: | ||
output_runfile = str(method.get('output')) | ||
|
||
# Build evaluation command | ||
eval_cmd = [ | ||
os.path.join(metric['command']), | ||
metric['params'] if 'params' in metric and metric['params'] else '', | ||
os.path.join('tools/topics-and-qrels', topic_set['qrel']) if 'qrel' in topic_set and topic_set['qrel'] else '', | ||
output_runfile | ||
] | ||
|
||
if dry_run: | ||
logger.info(' '.join(eval_cmd)) | ||
continue | ||
|
||
try: | ||
out = [line for line in | ||
check_output(' '.join(eval_cmd)).decode('utf-8').split('\n') if line.strip()][-1] | ||
if not out.strip(): | ||
continue | ||
except Exception as e: | ||
logger.error(f"Failed to execute evaluation command: {str(e)}") | ||
continue | ||
|
||
eval_out = out.strip().split(metric['separator'])[metric['parse_index']] | ||
expected = round(method['results'][metric['metric']][i], metric['metric_precision']) | ||
actual = round(float(eval_out), metric['metric_precision']) | ||
result_str = ( | ||
f'expected: {expected:.4f} actual: {actual:.4f} (delta={abs(expected-actual):.4f}) - ' | ||
f'metric: {metric["metric"]:<8} method: {method["name"]} topics: {topic_set["id"]}' | ||
) | ||
|
||
if is_close(expected, actual) or actual > expected: | ||
logger.info(ok_str + result_str) | ||
else: | ||
logger.error(fail_str + result_str) | ||
failures = True | ||
|
||
end_time = time.time() | ||
logger.info(f"Total execution time: {end_time - start_time:.2f} seconds") | ||
if failures: | ||
logger.error(f'{fail_str}Some tests failed.') | ||
else: | ||
logger.info(f'All tests passed successfully!') | ||
|
||
if __name__ == '__main__': | ||
start_time = time.time() | ||
|
||
# Command-line argument parsing | ||
parser = argparse.ArgumentParser(description='Run Fusion regression tests.') | ||
parser.add_argument('--regression', required=True, help='Name of the regression test configuration.') | ||
parser.add_argument('--dry-run', dest='dry_run', action='store_true', | ||
help='Output commands without actual execution.') | ||
args = parser.parse_args() | ||
|
||
# Load YAML configuration | ||
try: | ||
with open(f'src/main/resources/fuse_regression/{args.regression}.yaml') as f: | ||
yaml_data = yaml.safe_load(f) | ||
except FileNotFoundError as e: | ||
logger.error(f"Failed to load configuration file: {e}") | ||
exit(1) | ||
|
||
# Construct the fusion command | ||
fusion_commands = construct_fusion_commands(yaml_data) | ||
|
||
# Run the fusion process | ||
if args.dry_run: | ||
logger.info(' '.join([cmd for cmd_list in fusion_commands for cmd in cmd_list])) | ||
else: | ||
run_fusion_commands(fusion_commands) | ||
|
||
# Evaluate and verify results | ||
evaluate_and_verify(yaml_data, args.dry_run) | ||
|
||
logger.info(f"Total execution time: {time.time() - start_time:.2f} seconds") |
73 changes: 73 additions & 0 deletions
73
src/main/resources/fuse_regression/fusion-regression-bge-flat-int8-robust04-2.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's remove this and keep only the above? |
||
corpus: beir-v1.0.0-robust04.bge-base-en-v1.5 | ||
corpus_path: collections/beir-v1.0.0/bge-base-en-v1.5/robust04 | ||
|
||
metrics: | ||
- metric: nDCG@10 | ||
command: bin/trec_eval | ||
params: -c -m ndcg_cut.10 | ||
separator: "\t" | ||
parse_index: 2 | ||
metric_precision: 4 | ||
can_combine: false | ||
- metric: R@100 | ||
command: bin/trec_eval | ||
params: -c -m recall.100 | ||
separator: "\t" | ||
parse_index: 2 | ||
metric_precision: 4 | ||
can_combine: false | ||
- metric: R@1000 | ||
command: bin/trec_eval | ||
params: -c -m recall.1000 | ||
separator: "\t" | ||
parse_index: 2 | ||
metric_precision: 4 | ||
can_combine: false | ||
|
||
topic_reader: JsonStringVector | ||
topics: | ||
- name: "BEIR (v1.0.0): Robust04" | ||
id: test | ||
path: topics.beir-v1.0.0-robust04.test.bge-base-en-v1.5.jsonl.gz | ||
qrel: qrels.beir-v1.0.0-robust04.test.txt | ||
|
||
# Fusion Regression Test Configuration | ||
runs: | ||
- runs/run.flat-int8.beir-v1.0.0-robust04.bge-base-en-v1.5.test.bge-flat-int8-cached | ||
- runs/run.flat.beir-v1.0.0-robust04.bge-base-en-v1.5.test.bge-flat-cached | ||
methods: | ||
- name: rrf | ||
k: 1000 | ||
depth: 1000 | ||
rrf_k: 60 | ||
output: runs/fuse/run.flat-int8.beir-v1.0.0-robust04.bge-base-en-v1.5.test.bge-flat-int8-cached.bge-flat-cached.fusion.rrf | ||
results: | ||
nDCG@10: | ||
- 0.3 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. record to 4 digits? |
||
R@100: | ||
- 0.3 | ||
R@1000: | ||
- 0.5 | ||
- name: average | ||
output: runs/fuse/run.flat-int8.beir-v1.0.0-robust04.bge-base-en-v1.5.test.bge-flat-int8-cached.bge-flat-cached.fusion.average | ||
results: | ||
nDCG@10: | ||
- 0.3 | ||
R@100: | ||
- 0.3 | ||
R@1000: | ||
- 0.5 | ||
- name: interpolation | ||
alpha: 0.5 | ||
output: runs/fuse/run.flat-int8.beir-v1.0.0-robust04.bge-base-en-v1.5.test.bge-flat-int8-cached.bge-flat-cached.fusion.interpolation | ||
results: | ||
nDCG@10: | ||
- 0.3 | ||
R@100: | ||
- 0.3 | ||
R@1000: | ||
- 0.5 | ||
|
||
|
||
|
65 changes: 65 additions & 0 deletions
65
src/main/resources/fuse_regression/fusion-regression-bge-flat-robust04-3.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
--- | ||
corpus: beir-v1.0.0-robust04.flat | ||
corpus_path: collections/beir-v1.0.0/corpus/robust04/ | ||
|
||
metrics: | ||
- metric: nDCG@10 | ||
command: bin/trec_eval | ||
params: -c -m ndcg_cut.10 | ||
separator: "\t" | ||
parse_index: 2 | ||
metric_precision: 4 | ||
can_combine: false | ||
- metric: R@100 | ||
command: bin/trec_eval | ||
params: -c -m recall.100 | ||
separator: "\t" | ||
parse_index: 2 | ||
metric_precision: 4 | ||
can_combine: false | ||
- metric: R@1000 | ||
command: bin/trec_eval | ||
params: -c -m recall.1000 | ||
separator: "\t" | ||
parse_index: 2 | ||
metric_precision: 4 | ||
can_combine: false | ||
|
||
topic_reader: TsvString | ||
topics: | ||
- name: "BEIR (v1.0.0): Robust04" | ||
id: test | ||
path: topics.beir-v1.0.0-robust04.test.tsv.gz | ||
qrel: qrels.beir-v1.0.0-robust04.test.txt | ||
|
||
# Fusion Regression Test Configuration | ||
runs: | ||
- runs/runs.beir/run.inverted.beir-v1.0.0-robust04.flat.test.bm25 | ||
- runs/runs.beir/run.inverted.beir-v1.0.0-robust04.multifield.test.bm25 | ||
- runs/runs.beir/run.inverted.beir-v1.0.0-robust04.splade-pp-ed.test.splade-pp-ed-cached | ||
|
||
methods: | ||
- name: rrf | ||
k: 1000 | ||
depth: 1000 | ||
rrf_k: 60 | ||
output: runs/fuse/run.inverted.beir-v1.0.0-robust04.flat.test.bm25.multifield.test.bm25.splade-pp-ed.test.splade-pp-ed-cached.fusion.rrf | ||
results: | ||
nDCG@10: | ||
- 0.4636 | ||
R@100: | ||
- 0.4243 | ||
R@1000: | ||
- 0.7349 | ||
- name: average | ||
output: runs/fuse/run.inverted.beir-v1.0.0-robust04.flat.test.bm25.multifield.test.bm25.splade-pp-ed.test.splade-pp-ed-cached.fusion.average | ||
results: | ||
nDCG@10: | ||
- 0.4 | ||
R@100: | ||
- 0.38 | ||
R@1000: | ||
- 0.62 | ||
|
||
|
||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depend on the runs generated by the regression yaml config?