Follow-Up Issue: Question on the Efficacy of the `reverse` Method and Its Underlying Rationale

Hello again,

I recently submitted an issue titled "Potential Bug: Improvement in 3-Digit Addition Baseline by Adjusting Prompt Formatting" and would like to follow up with another query related to your "Teaching Arithmetic to Small Transformers" work.

## Concerns:

I have some questions about the principal argument in your paper, which suggests that solving arithmetic problems by considering the most significant digit first requires a more global approach, making the task significantly harder to train.

When examining a 3-digit addition task like \(A3A2A1 + B3B2B1 = C3C2C1\), the paper claims that \(C3\) would require comprehensive, global information. However, in most instances, \(C3\) can be computed using only \(A3\), \(B3\), and possibly the carry from \(A2 + B2\). The task only requires information from all digits when \(A2 + B2 = 9\).

For the `reverse` method \(A3A2A1 + B3B2B1 = C1C2C3\), the computation for \(C3\) seems similarly dependent on carries from \(B2\), \(A2\), and \(C2\). Therefore, it's unclear to me why there would be a substantial difference in complexity between the `plain` and `reverse` methods for calculating \(C3\).

## Additional Evidence:

In my earlier investigation, I observed that bad cases from the `plain2` method rarely included situations where \(A2 + B2 = 9\). This leads me to wonder if the primary reason the `reverse` method performs better might differ from what is discussed in the paper.

I'm eager to hear your insights on this matter.

Best regards,
Su Wang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-Up Issue: Question on the Efficacy of the `reverse` Method and Its Underlying Rationale #2

Concerns:

Additional Evidence:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Follow-Up Issue: Question on the Efficacy of the reverse Method and Its Underlying Rationale #2

Description

Concerns:

Additional Evidence:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Follow-Up Issue: Question on the Efficacy of the `reverse` Method and Its Underlying Rationale #2