Experimental GPT-2 scale (~124M param) LLM trained from scratch. Trained on 22B tokens od Cosmopedia Dataset. Includes full training pipeline, with SFT FineTuning and log analysis tools with backend and frontend and deployment
nlp tokenizer pytorch transformer llama language-model nlp-machine-learning sft gpt2 train-from-scratch llm bitsandbytes openhermes flashattention cosmopedia
-
Updated
May 15, 2026 - Python