Skip to content

Conversation

@bhimrazy
Copy link
Collaborator

@bhimrazy bhimrazy commented Oct 30, 2025

What does this pr do ?

This PR updates TinyStories optimize to use item_loader=TokensLoader() as per the rule from LitData, i.e.:

outputs = optimize(
    ...,
    # This is important to inform LitData that we are encoding a contiguous 1D array (tokens). 
    # LitData skips storing metadata for each sample, e.g., all the tokens are concatenated to form one large tensor.
    item_loader=TokensLoader(),
)

It was addressed in #2048 by @andyland , but the change was limited to the test and seems to be missed in the source.

Fixes #2144

Additional Info

Screenshots

Before: failing with error
image

After:
image
image

@bhimrazy bhimrazy changed the title fix: update TinyStories optimize to use item_loader=TokensLoader(), fix: update TinyStories optimize to use item_loader=TokensLoader() Oct 30, 2025
@lianakoleva lianakoleva merged commit 062fff2 into Lightning-AI:main Nov 4, 2025
21 checks passed
@bhimrazy bhimrazy deleted the fix/tinystories-optimize branch November 5, 2025 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pretraining demo not working

2 participants