Potential Bug: Improvement in 3-Digit Addition Baseline by Adjusting Prompt Formatting

Hello there,

First off, thank you for the amazing work on "Teaching Arithmetic to Small Transformers" and for sharing the code. I'm a researcher who's been exploring your methods, and I find them quite enlightening.

However, while trying out your baseline tests and conducting some badcase analysis, I believe I've come across a potential bug that might impact the results presented in your paper.

## The Main Issue:
My primary observation is that the lower accuracy rate for the plain baseline could be attributed to the prompt formatting used during the testing phase. After a small adjustment, the accuracy jumps from 87.27% to 95.58% without retraining the model. I suspect that if the model is retrained and the best-performing model on the validation set is selected, the accuracy could go up to around 97%, which is comparable to the 'plain2' model's results.

## Explanation:

The change is quite simple: just prepend a newline (`\n`) character before the existing prompt during testing. The discovery came after noticing that even in the training dataset, the plain method could only achieve around a 90% accuracy, which seemed odd to me.

Upon further analysis, I found that the issue mostly occurs with arithmetic tasks of the form `A2A1+C3C2C1=` or `A1+C3C2C1=`, where GPT, being a next-word predictor, can sometimes match the input to an incorrect but similar-looking arithmetic equation. For example, if the test prompt is `1+234=235`, and the training dataset contains `\n21+234=255`, the model may incorrectly produce `1+234=255`.

Adding a newline character at the beginning, as in `\n1+234=`, prevents this issue. The model cannot match `\n1+234=` with `\n21+234=255`, thereby substantially improving accuracy.

I hope this observation is useful and I would love to know your thoughts on it.

Best regards,
Su

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Bug: Improvement in 3-Digit Addition Baseline by Adjusting Prompt Formatting #1

The Main Issue:

Explanation:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Potential Bug: Improvement in 3-Digit Addition Baseline by Adjusting Prompt Formatting #1

Description

The Main Issue:

Explanation:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions