This repository is based on the material from the book Build a Large Language Model (From Scratch) by Sebastian Raschka. The book is published by Manning Publications.
Start with cloning the repository, git clone https://github.com/stefanalfbo/build-a-large-language-model-from-scratch.git
.
Then, install the required packages with make sync
.
- Attention Is All You Need - The paper that introduced the transformer architecture.
- Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research - A comparably large dataset for language model pre-training.
- Improving Language Understanding by Generative Pre-Training - The paper that introduced the GPT model.
- Training language models to follow instructions with human feedback - A paper that introduces a method to train language models with human feedback.