Skip to content

Conversation

@eyurtsev
Copy link
Collaborator

@eyurtsev eyurtsev commented Nov 28, 2023

  • Added benchmarks to typerwriter 1, multiverse, relational data
  • Updated the evaluator to be more configurable; it'll grade the multiverse math correctly now + allow skipping grading output for typewriter tasks
  • Fixed examples in a dataset (updated them already in the public dataset)

@eyurtsev eyurtsev changed the title Added simple comparison code Added benchmarks to typerwriter 1, multiverse, relational data, update evaluators Nov 29, 2023
@eyurtsev eyurtsev requested a review from hinthornw November 29, 2023 22:20
"expected_steps": ["log", "multiply"],
},
{
"question": "calculate sqrt of 101 to 4 digits of precision",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we fine keeping multiple sources of truth?

return RunEvalConfig(custom_evaluators=[AgentTrajectoryEvaluator()])
def get_eval_config(
*,
eval_llm: Union[BaseLanguageModel, BaseChatModel, None] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both chat and base language model?

@eyurtsev eyurtsev merged commit 4a52867 into main Nov 29, 2023
@eyurtsev eyurtsev deleted the eugene/adding_comparison_code branch November 29, 2023 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants