Closed
Description
@gaotianyu1350
Hi, thank you for the great work / publishing beautiful codes!
I have some questions reproducing the STS results for pre-trained bert models.
When I run the following command in my environment, I got higher STS scores comparing to the results shown in your paper.
Do you have any idea what is causing the issue?
Code executed
python evaluation.py \
--model_name_or_path bert-base-uncased \
--pooler avg_first_last \
--task_set sts \
--mode test
Results
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
| 45.09 | 64.30 | 54.56 | 70.52 | 67.87 | 59.05 | 63.75 | 60.73 |
Expected results (scores shown in your paper)
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
| 39.70 | 59.38 | 49.67 | 66.03 | 66.19 | 53.87 | 62.06 | 56.70 |
Strangely, I can fully reproduce the scores for SimCSE models via following command:
% python evaluation.py \
--model_name_or_path princeton-nlp/sup-simcse-bert-base-uncased \
--pooler cls \
--task_set sts \
--mode test
Here is the result of pip freeze
and I am using one NVIDIA RTX 6000 Ada GPU.
Thank you very much for your help!
pip freeze result
aiofiles==23.2.1
aiohappyeyeballs==2.4.3
aiohttp==3.10.10
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.5.0
async-timeout==4.0.3
attrs==24.2.0
certifi==2024.8.30
charset-normalizer==3.4.0
click==8.1.7
contourpy==1.1.1
cycler==0.12.1
datasets==3.0.1
dill==0.3.8
exceptiongroup==1.2.2
fastapi==0.115.2
ffmpy==0.4.0
filelock==3.16.1
fonttools==4.54.1
frozenlist==1.4.1
fsspec==2024.6.1
gradio==4.44.1
gradio-client==1.3.0
h11==0.14.0
httpcore==1.0.6
httpx==0.27.2
huggingface-hub==0.25.2
idna==3.10
importlib-resources==6.4.5
jinja2==3.1.4
joblib==1.4.2
kiwisolver==1.4.7
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.7.5
mdurl==0.1.2
multidict==6.1.0
multiprocess==0.70.17
numpy==1.24.4
orjson==3.10.7
packaging==24.1
pandas==2.0.3
pillow==10.4.0
prettytable==3.11.0
propcache==0.2.0
pyarrow==17.0.0
pydantic==2.9.2
pydantic-core==2.23.4
pydub==0.25.1
pygments==2.18.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
python-multipart==0.0.12
pytz==2024.2
PyYAML==6.0.2
regex==2024.9.11
requests==2.32.3
rich==13.9.2
ruff==0.6.9
sacremoses==0.1.1
safetensors==0.4.5
scikit-learn==1.3.2
scipy==1.10.1
semantic-version==2.10.0
shellingham==1.5.4
six==1.16.0
sniffio==1.3.1
starlette==0.39.2
threadpoolctl==3.5.0
tokenizers==0.9.4
tomlkit==0.12.0
torch==1.7.1+cu110
torchtyping==0.1.5
tqdm==4.66.5
transformers==4.2.1
typeguard==2.13.3
typer==0.12.5
typing-extensions==4.12.2
tzdata==2024.2
urllib3==2.2.3
uvicorn==0.31.1
wcwidth==0.2.13
websockets==12.0
xxhash==3.5.0
yarl==1.15.1
zipp==3.20.2
Metadata
Metadata
Assignees
Labels
No labels