Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Merge main into v2 #2116

Open
wants to merge 11 commits into
base: v2.0.0
Choose a base branch
from
Open

WIP: Merge main into v2 #2116

wants to merge 11 commits into from

Conversation

KennethEnevoldsen
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen commented Feb 20, 2025

  • There were some changes to taskmetadata's descriptive_stat_path, I am not entirely sure if the merge is correct here.
  • why the "InstructionReranking"? (probably missed the reason somewhere?)
  • @sam-hey they couldn't get the iso library to work. Seems it could be python-iso639? (seems like the test passed because try except loop).
  • @Samoed I tried to calculate desc stats, in v2, but got this error, which I believe you might have fixed (my merge could have undone it)
from mteb.tasks.Reranking.eng import BuiltBenchReranking

task = BuiltBenchReranking()
task.load_data() # fails (but only on v2) as "default" is not available?

for now I just calculated the desc. stats over on main and moved it over.

Note: we are likely missing some task imports in v2 (it is impossible to know with the star imports) - any good way to check? I used the now outdated category to check.

It might be nice to start doing these merges more often?

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

@Samoed
Copy link
Collaborator

Samoed commented Feb 20, 2025

Since there no more duplicate class names you can run scripts/generate_imports.py

@KennethEnevoldsen KennethEnevoldsen marked this pull request as ready for review February 21, 2025 15:26
@Samoed
Copy link
Collaborator

Samoed commented Feb 21, 2025

why the "InstructionReranking"? (probably missed the reason somewhere?)

Instruction reranking were added in #1359 after unifing retrieval and reranking

There were some changes to taskmetadata's descriptive_stat_path, I am not entirely sure if the merge is correct here.

Yes, that's correct

I tried to calculate desc stats, in v2, but get this quite error, which I believe you might have fixed (a merge could have undone it)

I think you've fixed it for now.

Copy link
Collaborator

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@sam-hey
Copy link
Contributor

sam-hey commented Feb 21, 2025

Just had a look at the original pr - back then it was iso639, I think python-iso639 is also fine to use. Not sure when the dependency was removed.

@Samoed
Copy link
Collaborator

Samoed commented Feb 21, 2025

To load BuiltBenchReranking correctly you need to put it in

OLD_FORMAT_RERANKING_TASKS = [
"MindSmallReranking",
"SciDocsRR",
"StackOverflowDupQuestions",
"WebLINXCandidatesReranking",
"AlloprofReranking",
"SyntecReranking",
"VoyageMMarcoReranking",
"ESCIReranking",
"MIRACLReranking",
"WikipediaRerankingMultilingual",
"RuBQReranking",
"T2Reranking",
"MMarcoReranking",
"CMedQAv1-reranking",
"CMedQAv2-reranking",
"NamaaMrTydiReranking",
]

I'm removing this in #2097

@KennethEnevoldsen
Copy link
Contributor Author

Failures seems to be HF failures. Will try to rerun to see if it resolves it

@Samoed
Copy link
Collaborator

Samoed commented Feb 23, 2025

I think we should modify test_dataset_availability to test datasets independently. Currently, if one test fails, all datasets are rechecked due to pytest.mark.flaky, which results in around 4,000 requests after retries. This could lead to Hugging Face blocking our requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants