-
Notifications
You must be signed in to change notification settings - Fork 10
Closed
Labels
enhancementNew feature or requestNew feature or requestpostponedThis issue/PR is postponed until there is a very good reason (e.g. $$$) to implement it.This issue/PR is postponed until there is a very good reason (e.g. $$$) to implement it.
Milestone
Description
In v0.5.0 eval run we have the problem that GPT-4 is better than Gemini 1.5 Flash. Gemini has more code that is executable, but GPT has a higher coverage score that is why it is better. However, it makes sense to first order by executable code than coverage. We need to balance:
- Executable code should be weighted much higher
- Coverage is still very important
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestpostponedThis issue/PR is postponed until there is a very good reason (e.g. $$$) to implement it.This issue/PR is postponed until there is a very good reason (e.g. $$$) to implement it.