diff --git a/docs/regressions/regressions-dl19-passage-cohere-embed-english-v3-hnsw-int8.md b/docs/regressions/regressions-dl19-passage-cohere-embed-english-v3-hnsw-int8.md index 717d9d22d9..e6d8073b14 100644 --- a/docs/regressions/regressions-dl19-passage-cohere-embed-english-v3-hnsw-int8.md +++ b/docs/regressions/regressions-dl19-passage-cohere-embed-english-v3-hnsw-int8.md @@ -94,13 +94,13 @@ With the above commands, you should be able to reproduce the following results: | **AP@1000** | **cohere-embed-english-v3**| |:-------------------------------------------------------------------------------------------------------------|-----------| -| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.488 | +| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.487 | | **nDCG@10** | **cohere-embed-english-v3**| -| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.699 | +| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.690 | | **R@100** | **cohere-embed-english-v3**| -| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.646 | +| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.647 | | **R@1000** | **cohere-embed-english-v3**| -| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.857 | +| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.850 | Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run. Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw-int8.yaml). diff --git a/docs/regressions/regressions-dl19-passage-cohere-embed-english-v3-hnsw.md b/docs/regressions/regressions-dl19-passage-cohere-embed-english-v3-hnsw.md index 4bc1ea56c1..d634402abd 100644 --- a/docs/regressions/regressions-dl19-passage-cohere-embed-english-v3-hnsw.md +++ b/docs/regressions/regressions-dl19-passage-cohere-embed-english-v3-hnsw.md @@ -94,13 +94,13 @@ With the above commands, you should be able to reproduce the following results: | **AP@1000** | **cohere-embed-english-v3**| |:-------------------------------------------------------------------------------------------------------------|-----------| -| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.487 | +| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.486 | | **nDCG@10** | **cohere-embed-english-v3**| -| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.691 | +| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.690 | | **R@100** | **cohere-embed-english-v3**| | [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.645 | | **R@1000** | **cohere-embed-english-v3**| -| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.844 | +| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.851 | Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run. Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw.yaml). diff --git a/docs/regressions/regressions-dl20-passage-cohere-embed-english-v3-hnsw-int8.md b/docs/regressions/regressions-dl20-passage-cohere-embed-english-v3-hnsw-int8.md index d3d9d44490..0fbf5992ec 100644 --- a/docs/regressions/regressions-dl20-passage-cohere-embed-english-v3-hnsw-int8.md +++ b/docs/regressions/regressions-dl20-passage-cohere-embed-english-v3-hnsw-int8.md @@ -94,13 +94,13 @@ With the above commands, you should be able to reproduce the following results: | **AP@1000** | **cohere-embed-english-v3**| |:-------------------------------------------------------------------------------------------------------------|-----------| -| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.506 | +| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.505 | | **nDCG@10** | **cohere-embed-english-v3**| -| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.721 | -| **R@100** | **cohere-embed-english-v3**| | [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.722 | +| **R@100** | **cohere-embed-english-v3**| +| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.720 | | **R@1000** | **cohere-embed-english-v3**| -| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.861 | +| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.858 | Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run. Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw-int8.yaml). diff --git a/docs/regressions/regressions-dl20-passage-cohere-embed-english-v3-hnsw.md b/docs/regressions/regressions-dl20-passage-cohere-embed-english-v3-hnsw.md index 7e2ae24603..6aac6597c1 100644 --- a/docs/regressions/regressions-dl20-passage-cohere-embed-english-v3-hnsw.md +++ b/docs/regressions/regressions-dl20-passage-cohere-embed-english-v3-hnsw.md @@ -96,11 +96,11 @@ With the above commands, you should be able to reproduce the following results: |:-------------------------------------------------------------------------------------------------------------|-----------| | [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.505 | | **nDCG@10** | **cohere-embed-english-v3**| -| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.726 | -| **R@100** | **cohere-embed-english-v3**| | [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.725 | +| **R@100** | **cohere-embed-english-v3**| +| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.724 | | **R@1000** | **cohere-embed-english-v3**| -| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.865 | +| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.864 | Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run. Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw.yaml). diff --git a/docs/regressions/regressions-msmarco-passage-cohere-embed-english-v3-hnsw-int8.md b/docs/regressions/regressions-msmarco-passage-cohere-embed-english-v3-hnsw-int8.md index 826d1fab41..7b34ce4d7c 100644 --- a/docs/regressions/regressions-msmarco-passage-cohere-embed-english-v3-hnsw-int8.md +++ b/docs/regressions/regressions-msmarco-passage-cohere-embed-english-v3-hnsw-int8.md @@ -95,7 +95,7 @@ With the above commands, you should be able to reproduce the following results: | **nDCG@10** | **cohere-embed-english-v3**| |:-------------------------------------------------------------------------------------------------------------|-----------| -| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.428 | +| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.427 | | **AP@1000** | **cohere-embed-english-v3**| | [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.371 | | **RR@10** | **cohere-embed-english-v3**| diff --git a/src/main/python/msmarco/extract_avg_hnsw_regression_scores_from_log.py b/src/main/python/msmarco/extract_avg_hnsw_regression_scores_from_log.py new file mode 100644 index 0000000000..fdecdeb3cb --- /dev/null +++ b/src/main/python/msmarco/extract_avg_hnsw_regression_scores_from_log.py @@ -0,0 +1,44 @@ +# +# Anserini: A Lucene toolkit for reproducible information retrieval research +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import re +import subprocess + + +beir_keys = [ + 'dl19-passage-cohere-embed-english-v3-hnsw', + 'dl19-passage-cohere-embed-english-v3-hnsw-int8', + 'dl20-passage-cohere-embed-english-v3-hnsw', + 'dl20-passage-cohere-embed-english-v3-hnsw-int8', + 'msmarco-passage-cohere-embed-english-v3-hnsw', + 'msmarco-passage-cohere-embed-english-v3-hnsw-int8' +] + +for key in sorted(beir_keys): + print(key) + for metric in ['AP@1000', 'nDCG@10', 'RR@10', 'R@100', 'R@1000']: + command = f'tail -n 5 logs/log.{key}_* | grep "{metric}\s"' + p = subprocess.run(command, shell=True, text=True, capture_output=True) + output = p.stdout + scores = [] + for line in output.rstrip().split('\n'): + pattern = r'actual: (\d.\d\d\d)' + match = re.search(pattern, line) + if match: + scores.append(float(match.group(1))) + if len(scores) > 0: + avg = round(sum(scores)/len(scores) * 10 ** 3) / (10 ** 3) + print(f' {metric} (avg over {len(scores)}):\n - {avg:.3f}') diff --git a/src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw-int8.yaml b/src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw-int8.yaml index 0c2e3319b1..4a37c002fa 100644 --- a/src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw-int8.yaml +++ b/src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw-int8.yaml @@ -56,10 +56,10 @@ models: params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000 results: AP@1000: - - 0.4881 + - 0.487 nDCG@10: - - 0.6993 + - 0.690 R@100: - - 0.6458 + - 0.647 R@1000: - - 0.8569 \ No newline at end of file + - 0.850 \ No newline at end of file diff --git a/src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw.yaml b/src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw.yaml index 6605ec97b9..47cd4d817b 100644 --- a/src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw.yaml +++ b/src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw.yaml @@ -56,10 +56,10 @@ models: params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000 results: AP@1000: - - 0.4870 + - 0.486 nDCG@10: - - 0.6913 + - 0.690 R@100: - - 0.6447 + - 0.645 R@1000: - - 0.8436 \ No newline at end of file + - 0.851 \ No newline at end of file diff --git a/src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw-int8.yaml b/src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw-int8.yaml index 0497f1bf23..6b036473af 100644 --- a/src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw-int8.yaml +++ b/src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw-int8.yaml @@ -56,10 +56,10 @@ models: params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000 results: AP@1000: - - 0.5063 + - 0.505 nDCG@10: - - 0.7215 + - 0.722 R@100: - - 0.7224 + - 0.720 R@1000: - - 0.8611 \ No newline at end of file + - 0.858 \ No newline at end of file diff --git a/src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw.yaml b/src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw.yaml index 0bdeeae3a7..3f246ee596 100644 --- a/src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw.yaml +++ b/src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw.yaml @@ -56,10 +56,10 @@ models: params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000 results: AP@1000: - - 0.5054 + - 0.505 nDCG@10: - - 0.7258 + - 0.725 R@100: - - 0.7248 + - 0.724 R@1000: - - 0.8647 \ No newline at end of file + - 0.864 \ No newline at end of file diff --git a/src/main/resources/regression/msmarco-passage-cohere-embed-english-v3-hnsw-int8.yaml b/src/main/resources/regression/msmarco-passage-cohere-embed-english-v3-hnsw-int8.yaml index e4d65aef10..983b342d3d 100644 --- a/src/main/resources/regression/msmarco-passage-cohere-embed-english-v3-hnsw-int8.yaml +++ b/src/main/resources/regression/msmarco-passage-cohere-embed-english-v3-hnsw-int8.yaml @@ -56,10 +56,10 @@ models: params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000 results: nDCG@10: - - 0.4275 + - 0.427 AP@1000: - - 0.3706 + - 0.371 RR@10: - - 0.3648 + - 0.365 R@1000: - - 0.9735 + - 0.974 diff --git a/src/main/resources/regression/msmarco-passage-cohere-embed-english-v3-hnsw.yaml b/src/main/resources/regression/msmarco-passage-cohere-embed-english-v3-hnsw.yaml index a74f9a3bde..088517ef41 100644 --- a/src/main/resources/regression/msmarco-passage-cohere-embed-english-v3-hnsw.yaml +++ b/src/main/resources/regression/msmarco-passage-cohere-embed-english-v3-hnsw.yaml @@ -56,10 +56,10 @@ models: params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000 results: nDCG@10: - - 0.4275 + - 0.428 AP@1000: - - 0.3706 + - 0.371 RR@10: - - 0.3648 + - 0.365 R@1000: - - 0.9735 + - 0.974