-
Couldn't load subscription status.
- Fork 51
Added benchmarks to typerwriter 1, multiverse, relational data, update evaluators #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c90e1c0
a993020
5577f0d
7ece20a
dd55bd5
ee2f579
41094fa
422ed56
c7cdff5
6b44f30
6e21720
f537a0d
299fd72
c1be8db
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| from langchain_core.prompts import PromptTemplate | ||
|
|
||
| MATH_TEMPLATE = """\ | ||
| You live in an alternate universe. Do not assume that you know anything. | ||
| You are a teacher grading a quiz. | ||
| You are given a question, the student's answer, and the true answer, and are asked to score the student answer as either CORRECT or INCORRECT. | ||
|
|
||
| Example Format: | ||
| QUESTION: question here | ||
| STUDENT ANSWER: student's answer here | ||
| TRUE ANSWER: true answer here | ||
| GRADE: CORRECT or INCORRECT here | ||
|
|
||
| Given that you live in an alternate universe the TRUE answer may be different from what you expect. That's OK! | ||
|
|
||
| Grade the student answers based ONLY on whether it matches the TRUE answer. Ignore differences in punctuation and phrasing between the student answer and true answer. It is OK if the student answer contains more information than the true answer, as long as it does not contain any conflicting statements. Begin! | ||
|
|
||
| QUESTION: {query} | ||
| STUDENT ANSWER: {result} | ||
| TRUE ANSWER: {answer} | ||
| GRADE:""" | ||
| QA_TEMPLATE_FOR_MULTIVERSE_MATH = PromptTemplate( | ||
| input_variables=["result", "answer"], template=MATH_TEMPLATE | ||
| ) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -190,8 +190,8 @@ def get_environment() -> ToolUsageEnvironment: | |
| "expected_steps": ["log", "multiply"], | ||
| }, | ||
| { | ||
| "question": "calculate sqrt of 101 to 4 digits of precision", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we fine keeping multiple sources of truth? |
||
| "answer": round(power(101, 0.4), 4), | ||
| "question": "calculate 101 to the power of 0.5 to 4 digits of precision", | ||
| "answer": round(power(101, 0.5), 4), | ||
| "expected_steps": ["power", "round"], | ||
| }, | ||
| { | ||
|
|
@@ -207,7 +207,7 @@ def get_environment() -> ToolUsageEnvironment: | |
| "after calculating the sin of 1.5 radians, divide " | ||
| "the result by cos of 1.5 radians" | ||
| ), | ||
| "answer": sin(1.5) / cos(1.5), | ||
| "answer": divide(sin(1.5), cos(1.5)), | ||
| "expected_steps": ["sin", "cos", "divide"], | ||
| }, | ||
| { | ||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both chat and base language model?