Bugfix AbsTaskRetrieval for DREPS evaluation #80

ashokrajab · 2023-08-20T14:28:14Z

Issues fixed:

When debugging the evalution_model.py in multi_gpu case, there occurs a debugger port collision between parent and child processes. In order to avoid this: nested the evaluation_model.py within name == 'main'.
The expected behaviour of sentence_transformer_encode_multi_process_worker() function is to encode the corpus, evaluate the score against all the queries and store the metric. This was not properly handled by this function. Hence made use of the default SentenceTransformer._encode_multi_process_worker().
Appended instruction in sentences list in encode_corpus_parallel(). This change is made with reference taken from encode_corpus() function.
USE_BEIR_DEVELOPMENT is removed from evaluation/MTEB/mteb/abstasks/BeIRTask.py. This boolean just skips cqadupstack dataset download. I do not find any valid reason to do so. Hence removed.
Added explicit requirement pyarrow==8.0.0. Pip auto dependency resolver installs the latest version of pyarrow and that does not play well with the evaluate package.
Added necessary changes in train.py in order to feed in validation dataset.

ashokrajab added 3 commits August 20, 2023 19:42

fixed AbsTaskRetrieval for DRPES

d5ca30f

Fixed local MTEB installation

637329a

Added provision for including val dataset in training

b2cad88

hongjin-su merged commit 554e944 into xlang-ai:main Aug 25, 2023

ashokrajab deleted the bugfix_abs_retrieval_multi_gpu branch September 24, 2023 08:05

Provide feedback