-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Congrats for the insightful paper!
I noticed a few points in the figure in the appendix that I find a bit confusing, and here are two questions:
-
Since the 'Training Long Language Model' step uses a context length of only 224k, why does the model still show high accuracy even when the context length reaches 512k?
-
I observed that when the distractor is set to 5, the distribution of the NIAH results appears unusual. It seems that the context length of 224k performs better than the context length of 64k, which is quite different from what is typically seen in NIAH results for other models.
Looking forward to your insights on these points
Best regards
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
