Table of Contents
The LLM Tools is an open-source project designed to provide essential tools for developing and running large language models (LLMs).
- Memory Requirements Calculator: Estimates the memory needed to run or train LLMs based on factors like model size, precision, batch size, and sequence length.
Formerly Known As: LLM System Requirements Calculator
For an in-depth guide, see: Memory Requirements for LLM Training and Inference.
The Memory Requirements Calculator helps estimate the memory needed to run or train large language models (LLMs). Calculating the exact memory requirements for running an LLM can be challenging due to the significant number of frameworks, models, and optimization techniques. This tool provides essential formulas and considerations to offer accurate and practical estimates for various use cases.
If you are lazy and don't want to use the calculator, use this as a general rule of thumb:
- Inference: Number of parameters × Precision (usually 2 or 4 Bytes)
- Training: 4–6 times the inference resources
Performing inference requires resources for loading the model weights, storing the KV cache, and for the activation memory.
Formula
The memory required for loading the model depends on the number of parameters and precision.
Formula [1]
Precision Values
- 4 Bytes: FP32 / Full-precision / float32 / 32-bit
- 2 Bytes: FP16 / bfloat16 / 16-bit
- 1 Byte: int8 / 8-bit
- 0.5 Bytes: int4 / 4-bit
The decoding phase generates a single token at each time step, dependent on previous token tensors. To avoid recomputing these tensors, they are cached in the GPU memory.
Formula [3]
Intermediate activation values must be stored during the forward pass of the model. These activations represent the outputs of each layer in the neural network as data propagates forward through the model. They must be kept in FP32 to avoid numerical instability and ensure convergence.
Formula [4]
Training and fine-tuning require more resources than inference due to the optimizer and gradient states. For fine-tuning, Parameter Efficient Fine-Tuning (PEFT) techniques, such as Low-rank Adaptation (LoRA) and Quantized Low-rank Adaptation (QLoRA), are often employed to reduce the number of trainable parameters.
Formula
Optimization algorithms require resources to store the parameters and auxiliary variables. These variables include momentum and variance used by algorithms such as Adam (2 states) or SGD (1 state). The precision and type of optimizer affect memory usage.
Formula [1]
- AdamW (2 states): 8 Bytes per parameter
- AdamW (bitsandbytes Quantized): 2 Bytes per parameter
- SGD (1 state): 4 Bytes per parameter
Gradient values are computed during the backward pass of the model. They represent the rate of change of the loss function with respect to each model parameter and are crucial for updating the parameters during optimization. As with activations, they must be stored in FP32 for numerical stability.
Formula [1]
4 Bytes per parameter
[1] https://huggingface.co/docs/transformers/model_memory_anatomy
[2] https://huggingface.co/docs/transformers/perf_train_gpu_one
[3] https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/
[5] https://www.oreilly.com/library/view/generative-ai-on/9781098159214/ch04.html
The Memory Requirements Calculator is a tool for estimating memory requirements for LLMs and the only real way of estimating the exact memory is trying things out. The tool is provided as-is, without any warranties or guarantees.
- Memory Requirements Calculator
- Cost Estimation Calculator
See the open issues for a full list of proposed features (and known issues).
To set up and run the project locally, follow these steps:
- Clone the repository:
git clone https://github.com/manuelescobar-dev/LLM-Tools.git
- Navigate to the project directory:
cd LLM-Tools
- Install the dependencies using Poetry:
poetry install
- Start the Streamlit app:
streamlit run llm_tools/Home.py
- Open the provided URL in your browser to view the application.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
Github: @manuelescobar-dev
Medium: @manuelescobar-dev
Email: manuelescobar.dev@gmail.com
Use this space to list any other cool resources for LLM Developers.