Skip to content

Follow-Up Issue: Question on the Efficacy of the reverse Method and Its Underlying Rationale #2

Open
@james016

Description

@james016

Hello again,

I recently submitted an issue titled "Potential Bug: Improvement in 3-Digit Addition Baseline by Adjusting Prompt Formatting" and would like to follow up with another query related to your "Teaching Arithmetic to Small Transformers" work.

Concerns:

I have some questions about the principal argument in your paper, which suggests that solving arithmetic problems by considering the most significant digit first requires a more global approach, making the task significantly harder to train.

When examining a 3-digit addition task like (A3A2A1 + B3B2B1 = C3C2C1), the paper claims that (C3) would require comprehensive, global information. However, in most instances, (C3) can be computed using only (A3), (B3), and possibly the carry from (A2 + B2). The task only requires information from all digits when (A2 + B2 = 9).

For the reverse method (A3A2A1 + B3B2B1 = C1C2C3), the computation for (C3) seems similarly dependent on carries from (B2), (A2), and (C2). Therefore, it's unclear to me why there would be a substantial difference in complexity between the plain and reverse methods for calculating (C3).

Additional Evidence:

In my earlier investigation, I observed that bad cases from the plain2 method rarely included situations where (A2 + B2 = 9). This leads me to wonder if the primary reason the reverse method performs better might differ from what is discussed in the paper.

I'm eager to hear your insights on this matter.

Best regards,
Su Wang

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions