Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harrison/llm math #1808

Merged
merged 2 commits into from
Mar 20, 2023
Merged

Harrison/llm math #1808

merged 2 commits into from
Mar 20, 2023

Conversation

hwchase17
Copy link
Contributor

No description provided.

vbarda and others added 2 commits March 19, 2023 20:44
Existing `LLMMathChain` is a bit unpredictable in its behavior. I
suspect that it’s because the prompt allows the model to generate the
answer bypassing python code altogether
[here](https://github.com/hwchase17/langchain/blob/df8702fead4a26263f8a3f0a6fc36db1a8c37c7b/langchain/chains/llm_math/prompt.py#L19),
and it can occasionally hallucinate an answer.

This PR introduces a simplified version of the prompt. The new version
improves the behavior in the following ways:
- more robust to the variations in the input format (literal values /
spelled out values / whitespacing, etc)
- smaller prompt size -> cheaper queries

**CAUTION**⚠️: both the new and the old prompts suffer from injection
attacks and are quite dangerous since both are executing Python code
blindly. In the future this chain would benefit from using a much more
constrained executor

See examples below that compare old vs new behavior:

```python
test_cases = [
    # input, expected output
    
    ("5", 5),  # same value
    ("5 + 3", 8),  # trivial math
    ("2^3.171", 2 ** 3.171),  # insensitivity to the spaces around operator
    ("  2 ^3.171 ", 2 ** 3.171),  # insensitivity to whitespacing overall
    ("two to the power of three point one hundred seventy one", 2 ** 3.171),  # spelled out values and operations
    ("five + three squared minus 1", 5 + 3 ** 2 - 1),  # mixed literal and spelled out values / operations    
    ("2097 times 27.31", 2097 * 27.31),  # mixed types
    ("two thousand ninety seven times twenty seven point thirty one", 2097 * 27.31), # spelled out mixed types
    ("209758 / 2714", 209758 / 2714),  # int division
    ("209758.857 divided by 2714.31", 209758.857 / 2714.31)  # float division
]

for input_text, expected_output in test_cases:
    print("input: ", input_text)
    print("expected output :", expected_output)
    print("old chain: ", math_chain.run(input_text))
    print("new chain: ", new_math_chain.run(input_text), "\n")
```

```
input:  5
expected output : 5
old chain:  Answer:  8
new chain:  Answer: 5
 

input:  5 + 3
expected output : 8
old chain:  Answer:  8
new chain:  Answer: 8
 

input:  2^3.171
expected output : 9.006708689094099
old chain:  Answer: 10.945
new chain:  Answer: 9.006708689094099
 

input:    2 ^3.171 
expected output : 9.006708689094099
old chain:  Answer: 10.945
new chain:  Answer: 9.006708689094099
 

input:  two to the power of three point one hundred seventy one
expected output : 9.006708689094099
old chain:  Answer: 9.9078598877
new chain:  Answer: 9.006708689094099
 

input:  five + three squared minus 1
expected output : 13
old chain:  Answer: 19
new chain:  Answer: 13
 

input:  2097 times 27.31
expected output : 57269.07
old chain:  Answer: 57239.07
new chain:  Answer: 57269.07
 

input:  two thousand ninety seven times twenty seven point thirty one
expected output : 57269.07
old chain:  Answer: 56753.97
new chain:  Answer: 57269.07
 

input:  209758 / 2714
expected output : 77.28739867354459
old chain:  Answer: 77.2
new chain:  Answer: 77.28739867354459
 

input:  209758.857 divided by 2714.31
expected output : 77.27888745205964
old chain:  Answer: 77.27888745205964
new chain:  Answer: 77.27888745205964
```
@hwchase17 hwchase17 merged commit d5b4393 into master Mar 20, 2023
@hwchase17 hwchase17 deleted the harrison/llm-math branch March 20, 2023 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants