Spring 2025 CS336 lectures

This repo contains the lecture materials for "Stanford CS336: Language modeling from scratch".

Under implementation

I'm gradually execute the lectures and finish all the assignments in this repo.

Update: Assignment 1 finished. May 20 ☕

The Decoder-only model with RoPE, SwiGLU and a BPE tokenizer is in assignment/assianment1-basics/cs336_basics. I only run one experiment on my mac because I do not have the permission to use the H100. The training result is acceptable with val loss at 1.71 but unluckily the pt file is to large (over 300MB) to be pushed to this repo.

Training Summary

Metric	Value
Total Training Time	21209.29 seconds
Final Train Loss	1.7131
Best Val Loss	1.7122
Model Parameters	28.92M
Total Iterations	10000

Generating sample

 and the dog were playing with it. But then, Tim saw something strange. It was a toy outside! Tim picked up the toy and showed it to his mom. His mom was a little 
scared, but Tim had an idea.Tim showed her the toy and asked if it could play too. His mom agreed, and they played with the toy together. Tim was very happy and not 
fearful anymore. They all had a fun day at the park.<|endoftext|>Once

There is still a issue in my RoPE module. I modified it to run training in defferent bath sizes, but then it cannot pass the pytest by Stanford. To pass the pytest, the training script can only run with batch_size = 8 & batch_size = 1

Update: Assignment 2 partly finished. May 31 ☕

I am not going to run all the experiments neither as I do not have any GPUs myself. But I'll again try to implement all the problems to pass the pytest. (May 24, 2025) I imported my TransformerLM in assignment/assianment2-systems/cs336_basics. And the tasks including Flash-Attention2 triton kernels are implemented and passed course pytest. Due to the lack of GPU, I haven't test my triton kernels.(May 31, 2025) I'm not sure if I would review and run them on GPUs in the future. All the code for the assignment is in assignment/assianment2-systems/cs336_systems

Non-executable (ppt/pdf) lectures

Located in nonexecutable/as PDFs

Executable lectures

Located as lecture_*.py in the root directory

You can compile a lecture by running:

    python execute.py -m lecture_01

which generates a var/traces/lecture_01.json and caches any images as appropriate.

However, if you want to run it on the cluster, you can do:

    ./remote_execute.sh lecture_01

which copies the files to our slurm cluster, runs it there, and copies the results back. You have to setup the appropriate environment and tweak some configs to make this work (these instructions are not complete).

Frontend

If you need to tweak the Javascript:

Install (one-time):

    npm create vite@latest trace-viewer -- --template react
    cd trace-viewer
    npm install

Load a local server to view at http://localhost:5173?trace=var/traces/sample.json:

    npm run dev

Deploy to the main website:

    cd trace-viewer
    npm run build
    git add dist/assets
    # then commit to the repo and it should show up on the website

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
assignments		assignments
images		images
nonexecutable		nonexecutable
trace-viewer		trace-viewer
var		var
.gitignore		.gitignore
.nojekyll		.nojekyll
README.md		README.md
arxiv_util.py		arxiv_util.py
assets		assets
basic_util.py		basic_util.py
data.py		data.py
execute.py		execute.py
execute_util.py		execute_util.py
facts.py		facts.py
file_util.py		file_util.py
gelu.cu		gelu.cu
index.html		index.html
lecture_01.py		lecture_01.py
lecture_02.py		lecture_02.py
lecture_06.py		lecture_06.py
lecture_06_mlp.py		lecture_06_mlp.py
lecture_06_utils.py		lecture_06_utils.py
lecture_08.py		lecture_08.py
lecture_08_remote_execute.sh		lecture_08_remote_execute.sh
lecture_08_utils.py		lecture_08_utils.py
lecture_10.py		lecture_10.py
lecture_12.py		lecture_12.py
lecture_13.py		lecture_13.py
lecture_14.py		lecture_14.py
lecture_17.py		lecture_17.py
lecture_util.py		lecture_util.py
model_util.py		model_util.py
reference.py		reference.py
references.py		references.py
remote_execute.sh		remote_execute.sh
requirements.txt		requirements.txt
sample.py		sample.py
slurm_run.sh		slurm_run.sh
slurm_script.sh		slurm_script.sh
torch_util.py		torch_util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spring 2025 CS336 lectures

Under implementation

Update: Assignment 1 finished. May 20 ☕

Training Summary

Generating sample

Update: Assignment 2 partly finished. May 31 ☕

Non-executable (ppt/pdf) lectures

Executable lectures

Frontend

About

Uh oh!

Releases

Packages

Languages

CatManJr/spring2025-notes-and-assignments

Folders and files

Latest commit

History

Repository files navigation

Spring 2025 CS336 lectures

Under implementation

Update: Assignment 1 finished. May 20 ☕

Training Summary

Generating sample

Update: Assignment 2 partly finished. May 31 ☕

Non-executable (ppt/pdf) lectures

Executable lectures

Frontend

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages