Skip to content

Commit

Permalink
elaborate
Browse files Browse the repository at this point in the history
  • Loading branch information
slobentanzer committed Feb 17, 2024
1 parent ab558b5 commit 56009ee
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion content/30.discussion.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ As such, a framework is a necessary step towards the objective and reproducible
We prevent data leakage from the benchmark datasets into the training data of new models by encryption, which is essential for the sustainability of the benchmark as new models are released.
The living benchmark will be updated with new questions and tasks as they arise in the community.

The benchmark's results provide a starting point for understanding why some models perform differently than expected.
The benchmark's results provide convenient selection criteria and a starting point for understanding why some models perform differently than expected.
For instance, the benchmark allowed immediate flagging of the drop in performance from the older (0613) to the newer (0125) version of gpt-4.
It also identified a range of pre-trained open-source models suitable for our uses, most notably, the openhermes-2.5 model in 4- or 5-bit quantisation.
This model is a fine-tuned (on GPT-4-generated data) variant of Mistral 7B v0.1, whose vanilla variants perform considerably worse in our benchmarks.
Expand Down

0 comments on commit 56009ee

Please sign in to comment.