Added benchmarks to typerwriter 1, multiverse, relational data, update evaluators #92

eyurtsev · 2023-11-28T19:25:04Z

Added benchmarks to typerwriter 1, multiverse, relational data
Updated the evaluator to be more configurable; it'll grade the multiverse math correctly now + allow skipping grading output for typewriter tasks
Fixed examples in a dataset (updated them already in the public dataset)

hinthornw · 2023-11-29T22:26:21Z

langchain_benchmarks/tool_usage/tasks/multiverse_math.py

        "expected_steps": ["log", "multiply"],
    },
    {
-        "question": "calculate sqrt of 101 to 4 digits of precision",


Are we fine keeping multiple sources of truth?

hinthornw · 2023-11-29T22:27:21Z

langchain_benchmarks/tool_usage/evaluators.py

-    return RunEvalConfig(custom_evaluators=[AgentTrajectoryEvaluator()])
+def get_eval_config(
+    *,
+    eval_llm: Union[BaseLanguageModel, BaseChatModel, None] = None,


Do we need both chat and base language model?

eyurtsev added 13 commits November 28, 2023 14:24

Added simple comparison code

c90e1c0

fix bug in agents

a993020

fix bug in multiverse math

5577f0d

update relational data

7ece20a

x

dd55bd5

x

ee2f579

x

41094fa

x

422ed56

x

c7cdff5

x

6b44f30

x

6e21720

x

f537a0d

x

299fd72

eyurtsev changed the title ~~Added simple comparison code~~ Added benchmarks to typerwriter 1, multiverse, relational data, update evaluators Nov 29, 2023

eyurtsev requested a review from hinthornw November 29, 2023 22:20

Merge branch 'main' into eugene/adding_comparison_code

c1be8db

hinthornw reviewed Nov 29, 2023

View reviewed changes

hinthornw approved these changes Nov 29, 2023

View reviewed changes

eyurtsev merged commit 4a52867 into main Nov 29, 2023

eyurtsev deleted the eugene/adding_comparison_code branch November 29, 2023 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Added benchmarks to typerwriter 1, multiverse, relational data, update evaluators #92

Added benchmarks to typerwriter 1, multiverse, relational data, update evaluators #92

Uh oh!

eyurtsev commented Nov 28, 2023 •

edited

Loading

Uh oh!

hinthornw Nov 29, 2023

Uh oh!

hinthornw Nov 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Added benchmarks to typerwriter 1, multiverse, relational data, update evaluators #92

Added benchmarks to typerwriter 1, multiverse, relational data, update evaluators #92

Uh oh!

Conversation

eyurtsev commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hinthornw Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

hinthornw Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eyurtsev commented Nov 28, 2023 •

edited

Loading