For a full description of the assignment, see the assignment handout at cs336_spring2025_assignment1_basics.pdf
If you see any issues with the assignment handout or code, please feel free to raise a GitHub issue or open a pull request with a fix.
We manage our environments with uv
to ensure reproducibility, portability, and ease of use.
Install uv
here (recommended), or run pip install uv
/brew install uv
.
We recommend reading a bit about managing projects in uv
here (you will not regret it!).
You can now run any code in the repo using
uv run <python_file_path>
and the environment will be automatically solved and activated when necessary.
uv run pytest
Initially, all tests should fail with NotImplementedError
s.
To connect your implementation to the tests, complete the
functions in ./tests/adapters.py.
Download the TinyStories data and a subsample of OpenWebText
mkdir -p data
cd data
wget https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStoriesV2-GPT4-train.txt
wget https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStoriesV2-GPT4-valid.txt
wget https://huggingface.co/datasets/stanford-cs336/owt-sample/resolve/main/owt_train.txt.gz
gunzip owt_train.txt.gz
wget https://huggingface.co/datasets/stanford-cs336/owt-sample/resolve/main/owt_valid.txt.gz
gunzip owt_valid.txt.gz
cd ..