Skip to content

prp-e/minillm

Repository files navigation

MiniLLM : An implementation of Qwen3-like small language model

This SLM (or Small Language Model) has been inspired by Karpathy's Video on GPT2 but with a little difference. The model has been made to be more production ready and more similar to trending models such as Alibaba's Qwen 3. So everything has taken from Karpathy's content, Qwen's attention and embedding mechanisms added to it and now, it is one of the pretrained models which are fully open sourced.

This project has been started by Muhammadreza Haghiri(and active on X with the handle @haghiri_ai) who's the founder of Mann-E which was the first generative AI platform with pretrained/fine-tuned models in the country of Iran. This model is an effort from Mann-E in order to have a more accessible and democratized AI for everyone.

You also can take a look at the HuggingFace organization for accessing checkpoints and downloading them.

Download checkpoints

Release notes

Since adding all the notes to this README.md makes it unnecessarily long, we've added CHANGES.md file and all change logs and release notes will be added there.

How to contribute

We have a contribution guide that you can study. If you want to contribute to the project you must read that file first.

How to run

Prerequisites (For training)

  • A good high-end NVIDIA GPU with CUDA support (Tested on Google Colab's T4 as bare minimum and tested on B200s for faster training)
  • Linux operating system
  • Python

Prerequisites (For inference)

  • A user-level NVIDIA GPU with CUDA support (like a 2050)
  • Python
  • Linux is recommended. If you're a Windows user, you may run the codes on WSL

Run training scripts

First, create a python virtual environment like this:

python3 -m venv .venv

Then activate your environment:

source .venv/bin/activate

After the activation, just install the required libraries by running the following command:

pip install -r requirements.txt

After libraries installed, you may change model_params.json file hyperparameters. You can use params_calculator.py script to find out how big the resulting model will get. Then you only need to run training script:

python3 train.py

After training done, you will find out a few .pt files in the path, it is where you have your model files which are ready for inference.

NOTE: The current model is made to support English language and things may change in the future to add multilignuality to the model. It means it's possible for changes in tokenizer and other parts borrowed from other models as well.

Run inference scripts

In order to run the inference on the model you have created, you may need to use inference.py and this script comes with a few flags and options. Additionally, you can download the model from huggingface.

  • --model-path : It is path to the model file.
  • --prompt : It is the text to be completed.
  • --tokenizer : It is your desired tokenizer. Since the training script is currently using SmolLM 135m tokenizer, the same goes for the inference as well. It may change in the future.
  • --max_new_tokens : This flag helps you generate the maximum tokens possible. Since in the current training it has been set to 512, the maximum is 512. If you change it while doing the training process, this can be tweaked.
  • --temperature : This flag is deciding for the creativity of the model. Setting it to 0 is more likely to output the data used in training.
  • --top_p :  When you set this, it just looks for everything with that probability.
  • --top_k : This also checks for the nearest neighbors of your input.
  • --seed : Decides for the randomness of the model.
  • --max_seq_length : It decides how many tokens can be taken as an input.

NOTE: All of the flags except "prompt" got their default value. You may change them in order to get the best results.

Parameter calculator guide

  • d_model : Embedding size, or in a better word, the dimensions of each token.
  • n_heads : Number of attention heads per layer.
  • n_layers : Transformers layers needed for the model.
  • d_ff: Dimensions of the feed forward layer.
  • vocab : Vocabulary size (which is determined by the tokenization process.)

TODO List

  • requirements.txt file.
  • Add a license to this repository.
  • Upload the model to huggingface.
  • Making a more accurate model (on English)
  • Changing tokenizer to a better one
  • Provide fine tuning script for instruction following
  • Porting to safetensors.
  • Making models transformer compatible in order to be used in huggingface transformers pipelines.
  • Making the train script work on multiple GPUs (it will make training of the bigger models possible)

Support The Project

You can support this project by donations. Donations are currently accepted in form of crypto and these are wallets:

  • Solana: GNJWgRmgRd7S9VrhCcJnVNTtAiQGTWrya9gcGb985x2m
  • Ethereum: 0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8
  • Sui: 0x943c1190bae9a052879c1861833621e20545bc33a8c990d48cc3bb8e7b1ac00b
  • Polygon: 0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8
  • Base: 0xa2dd3D50DE0Fc12fAd946606cd853B2a972d8de8
  • Bitcoin (Taproot): bc1pgtgd3uymvdxycelu06zz3sgrt47rccw2zk9u550e4de6tzgngz2s738gsn
  • Bitcoin (Native Segwit): bc1q85drn275ugetvleha6egp7a8u0ramyf39zg4wj

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published