This SLM (or Small Language Model) has been inspired by Karpathy's Video on GPT2 but with a little difference. The model has been made to be more production ready and more similar to trending models such as Alibaba's Qwen 3. So everything has taken from Karpathy's content, Qwen's attention and embedding mechanisms added to it and now, it is one of the pretrained models which are fully open sourced.
This project has been started by Muhammadreza Haghiri(and active on X with the handle @haghiri_ai) who's the founder of Mann-E which was the first generative AI platform with pretrained/fine-tuned models in the country of Iran. This model is an effort from Mann-E in order to have a more accessible and democratized AI for everyone.
You also can take a look at the HuggingFace organization for accessing checkpoints and downloading them.
Since adding all the notes to this README.md makes it unnecessarily long, we've added CHANGES.md file and all change logs and release notes will be added there.
We have a contribution guide that you can study. If you want to contribute to the project you must read that file first.
- A good high-end NVIDIA GPU with CUDA support (Tested on Google Colab's T4 as bare minimum and tested on B200s for faster training)
- Linux operating system
- Python
- A user-level NVIDIA GPU with CUDA support (like a 2050)
- Python
- Linux is recommended. If you're a Windows user, you may run the codes on WSL
First, create a python virtual environment like this:
python3 -m venv .venv
Then activate your environment:
source .venv/bin/activate
After the activation, just install the required libraries by running the following command:
pip install -r requirements.txt
After libraries installed, you may change model_params.json file hyperparameters. You can use params_calculator.py script to find out how big the resulting model will get. Then you only need to run training script:
python3 train.py
After training done, you will find out a few .pt files in the path, it is where you have your model files which are ready for inference.
NOTE: The current model is made to support English language and things may change in the future to add multilignuality to the model. It means it's possible for changes in tokenizer and other parts borrowed from other models as well.
In order to run the inference on the model you have created, you may need to use inference.py and this script comes with a few flags and options. Additionally, you can download the model from huggingface.
--model-path: It is path to the model file.--prompt: It is the text to be completed.--tokenizer: It is your desired tokenizer. Since the training script is currently using SmolLM 135m tokenizer, the same goes for the inference as well. It may change in the future.--max_new_tokens: This flag helps you generate the maximum tokens possible. Since in the current training it has been set to 512, the maximum is 512. If you change it while doing the training process, this can be tweaked.--temperature: This flag is deciding for the creativity of the model. Setting it to 0 is more likely to output the data used in training.--top_p: When you set this, it just looks for everything with that probability.--top_k: This also checks for the nearest neighbors of your input.--seed: Decides for the randomness of the model.--max_seq_length: It decides how many tokens can be taken as an input.
NOTE: All of the flags except "prompt" got their default value. You may change them in order to get the best results.
d_model: Embedding size, or in a better word, the dimensions of each token.n_heads: Number of attention heads per layer.n_layers: Transformers layers needed for the model.d_ff: Dimensions of the feed forward layer.vocab: Vocabulary size (which is determined by the tokenization process.)
-
requirements.txtfile. - Add a license to this repository.
- Upload the model to huggingface.
- Making a more accurate model (on English)
- Changing tokenizer to a better one
- Provide fine tuning script for instruction following
- Porting to safetensors.
- Making models transformer compatible in order to be used in huggingface transformers pipelines.
- Making the train script work on multiple GPUs (it will make training of the bigger models possible)
You can support this project by donations. Donations are currently accepted in form of crypto and these are wallets:
- Solana:
GNJWgRmgRd7S9VrhCcJnVNTtAiQGTWrya9gcGb985x2m - Ethereum:
0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8 - Sui:
0x943c1190bae9a052879c1861833621e20545bc33a8c990d48cc3bb8e7b1ac00b - Polygon:
0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8 - Base:
0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8 - Bitcoin (Taproot):
bc1pgtgd3uymvdxycelu06zz3sgrt47rccw2zk9u550e4de6tzgngz2s738gsn - Bitcoin (Native Segwit):
bc1q85drn275ugetvleha6egp7a8u0ramyf39zg4wj
