Skip to content

questions about NIAH #35

@Cooperx521

Description

@Cooperx521

Congrats for the insightful paper!

image

I noticed a few points in the figure in the appendix that I find a bit confusing, and here are two questions:

  1. Since the 'Training Long Language Model' step uses a context length of only 224k, why does the model still show high accuracy even when the context length reaches 512k?

  2. I observed that when the distractor is set to 5, the distribution of the NIAH results appears unusual. It seems that the context length of 224k performs better than the context length of 64k, which is quite different from what is typically seen in NIAH results for other models.

Looking forward to your insights on these points
Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions