Skip to content

Commit

Permalink
Tweak scores for Cohere regressions (#2386)
Browse files Browse the repository at this point in the history
Ran four trials on different machines and averaged the scores, to better reflect the variance across multiple trials.
  • Loading branch information
lintool authored Feb 21, 2024
1 parent fea04ca commit 21b3ddd
Show file tree
Hide file tree
Showing 12 changed files with 83 additions and 39 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -94,13 +94,13 @@ With the above commands, you should be able to reproduce the following results:

| **AP@1000** | **cohere-embed-english-v3**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.488 |
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.487 |
| **nDCG@10** | **cohere-embed-english-v3**|
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.699 |
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.690 |
| **R@100** | **cohere-embed-english-v3**|
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.646 |
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.647 |
| **R@1000** | **cohere-embed-english-v3**|
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.857 |
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.850 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw-int8.yaml).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,13 +94,13 @@ With the above commands, you should be able to reproduce the following results:

| **AP@1000** | **cohere-embed-english-v3**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.487 |
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.486 |
| **nDCG@10** | **cohere-embed-english-v3**|
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.691 |
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.690 |
| **R@100** | **cohere-embed-english-v3**|
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.645 |
| **R@1000** | **cohere-embed-english-v3**|
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.844 |
| [DL19 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.851 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/dl19-passage-cohere-embed-english-v3-hnsw.yaml).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,13 +94,13 @@ With the above commands, you should be able to reproduce the following results:

| **AP@1000** | **cohere-embed-english-v3**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.506 |
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.505 |
| **nDCG@10** | **cohere-embed-english-v3**|
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.721 |
| **R@100** | **cohere-embed-english-v3**|
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.722 |
| **R@100** | **cohere-embed-english-v3**|
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.720 |
| **R@1000** | **cohere-embed-english-v3**|
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.861 |
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.858 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw-int8.yaml).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,11 @@ With the above commands, you should be able to reproduce the following results:
|:-------------------------------------------------------------------------------------------------------------|-----------|
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.505 |
| **nDCG@10** | **cohere-embed-english-v3**|
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.726 |
| **R@100** | **cohere-embed-english-v3**|
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.725 |
| **R@100** | **cohere-embed-english-v3**|
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.724 |
| **R@1000** | **cohere-embed-english-v3**|
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.865 |
| [DL20 (Passage)](https://trec.nist.gov/data/deep2020.html) | 0.864 |

Note that due to the non-deterministic nature of HNSW indexing, results may differ slightly between each experimental run.
Nevertheless, scores are generally within 0.005 of the reference values recorded in [our YAML configuration file](../../src/main/resources/regression/dl20-passage-cohere-embed-english-v3-hnsw.yaml).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **cohere-embed-english-v3**|
|:-------------------------------------------------------------------------------------------------------------|-----------|
| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.428 |
| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.427 |
| **AP@1000** | **cohere-embed-english-v3**|
| [MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking) | 0.371 |
| **RR@10** | **cohere-embed-english-v3**|
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#
# Anserini: A Lucene toolkit for reproducible information retrieval research
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import re
import subprocess


beir_keys = [
'dl19-passage-cohere-embed-english-v3-hnsw',
'dl19-passage-cohere-embed-english-v3-hnsw-int8',
'dl20-passage-cohere-embed-english-v3-hnsw',
'dl20-passage-cohere-embed-english-v3-hnsw-int8',
'msmarco-passage-cohere-embed-english-v3-hnsw',
'msmarco-passage-cohere-embed-english-v3-hnsw-int8'
]

for key in sorted(beir_keys):
print(key)
for metric in ['AP@1000', 'nDCG@10', 'RR@10', 'R@100', 'R@1000']:
command = f'tail -n 5 logs/log.{key}_* | grep "{metric}\s"'
p = subprocess.run(command, shell=True, text=True, capture_output=True)
output = p.stdout
scores = []
for line in output.rstrip().split('\n'):
pattern = r'actual: (\d.\d\d\d)'
match = re.search(pattern, line)
if match:
scores.append(float(match.group(1)))
if len(scores) > 0:
avg = round(sum(scores)/len(scores) * 10 ** 3) / (10 ** 3)
print(f' {metric} (avg over {len(scores)}):\n - {avg:.3f}')
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,10 @@ models:
params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000
results:
AP@1000:
- 0.4881
- 0.487
nDCG@10:
- 0.6993
- 0.690
R@100:
- 0.6458
- 0.647
R@1000:
- 0.8569
- 0.850
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,10 @@ models:
params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000
results:
AP@1000:
- 0.4870
- 0.486
nDCG@10:
- 0.6913
- 0.690
R@100:
- 0.6447
- 0.645
R@1000:
- 0.8436
- 0.851
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,10 @@ models:
params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000
results:
AP@1000:
- 0.5063
- 0.505
nDCG@10:
- 0.7215
- 0.722
R@100:
- 0.7224
- 0.720
R@1000:
- 0.8611
- 0.858
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,10 @@ models:
params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000
results:
AP@1000:
- 0.5054
- 0.505
nDCG@10:
- 0.7258
- 0.725
R@100:
- 0.7248
- 0.724
R@1000:
- 0.8647
- 0.864
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,10 @@ models:
params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000
results:
nDCG@10:
- 0.4275
- 0.427
AP@1000:
- 0.3706
- 0.371
RR@10:
- 0.3648
- 0.365
R@1000:
- 0.9735
- 0.974
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,10 @@ models:
params: -generator VectorQueryGenerator -topicField vector -threads 16 -hits 1000 -efSearch 1000
results:
nDCG@10:
- 0.4275
- 0.428
AP@1000:
- 0.3706
- 0.371
RR@10:
- 0.3648
- 0.365
R@1000:
- 0.9735
- 0.974

0 comments on commit 21b3ddd

Please sign in to comment.