Open
Description
Update 10 Apr 2024: #231 (comment)
It would be great to start doing this kind of quantitative analysis of ggml
-based inference:
https://bellard.org/ts_server/
It looks like Fabrice evaluates the models using something called LM Evaluation Harness:
https://github.com/EleutherAI/lm-evaluation-harness
I have no idea what this is yet, but would be nice to study it and try to integrate it here and in other ggml
-based projects.
This will be very important step needed to estimate the quality of the generated output and see if we are on the right track.