Better testing for eval recipe: add test for generation and log likelihood tasks #1873
Labels
better engineering
Tasks which help improve eng productivity e.g. building tools, cleaning up code, writing docs
Moving this to it's own issue after being mentioned in #1763 (comment) - it wasn't immediately trivial in trying to get the eval harness to ensure a specific ordering of tasks (we would want successive generation - non-generation tasks). Will follow up in another PR.
The text was updated successfully, but these errors were encountered: