Skip to content

Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch

License

Notifications You must be signed in to change notification settings

stanford-cs336/assignment1-basics

Repository files navigation

CS336 Spring 2025 Assignment 1: Basics

For a full description of the assignment, see the assignment handout at cs336_spring2025_assignment1_basics.pdf

If you see any issues with the assignment handout or code, please feel free to raise a GitHub issue or open a pull request with a fix.

Setup

Environment

We manage our environments with uv to ensure reproducibility, portability, and ease of use. Install uv here (recommended), or run pip install uv/brew install uv. We recommend reading a bit about managing projects in uv here (you will not regret it!).

You can now run any code in the repo using

uv run <python_file_path>

and the environment will be automatically solved and activated when necessary.

Run unit tests

uv run pytest

Initially, all tests should fail with NotImplementedErrors. To connect your implementation to the tests, complete the functions in ./tests/adapters.py.

Download data

Download the TinyStories data and a subsample of OpenWebText

mkdir -p data
cd data

wget https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStoriesV2-GPT4-train.txt
wget https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStoriesV2-GPT4-valid.txt

wget https://huggingface.co/datasets/stanford-cs336/owt-sample/resolve/main/owt_train.txt.gz
gunzip owt_train.txt.gz
wget https://huggingface.co/datasets/stanford-cs336/owt-sample/resolve/main/owt_valid.txt.gz
gunzip owt_valid.txt.gz

cd ..

About

Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published