diff --git a/docs/source/guides/cache_system.mdx b/docs/source/guides/cache_system.mdx index 3f7251083..23ee751e2 100644 --- a/docs/source/guides/cache_system.mdx +++ b/docs/source/guides/cache_system.mdx @@ -1,19 +1,81 @@ - -# Trainium model cache +# Trainium Model Cache -TODO: +The Trainium Model Cache is a remote cache for compiled Trainium models in the `neff` format. +It is integrated into the [`TrainiumTrainer`] class to enable loading pretrained models from the cache instead of compiling them locally. +This can speed up the training process by about –3x. + +The Trainium Model Cache is hosted on the [Hugging Face Hub](https://huggingface.co/aws-neuron/optimum-neuron-cache) and includes compiled files for all popular and supported pre-trained models `optimum-neuron`. + +When training a Transformers or Diffusion model with vanilla [`torch-neuronx`](https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx), the models needs to be first compiled. The compiled version is stored in a local directory, usually `/var/tmp/neuron-compile-cache`. +This means that every time you train a new model in a new environment, you need to recompile it, which takes a lot of time. + +We created the Trainium Model Cache to solve this limitation by providing a public cache of precompiled available models and a private cache to create your private, secured, remote model cache. + +The Trainium Model Cache plugs into the local cache directory of the Hugging Face Hub. During training, the [`TrainiumTrainer`] will check if compilation files are available on the Hub and download them if they are found, allowing you to save both time and cost by skipping the compilation phase. + +## How the caching system works + +### Hash computation + +Many factors can trigger compilation among which: + +- The model weights +- The input shapes +- The precision of the model, full-precision or bf16 +- The version of the Neuron X compiler +- The number of Neuron cores used + +These parameters are used to compute a hash. This hash is then used to compare local hashes for our training session against hashes stored on the Hugging Face Hub, and act accordingly (download or push). + +### How to use the Trainium model cache + +The Public model cache will be used when your training script uses the [`TrainiumTrainer`]. There are no additional changes needed. + +### How to use a private Trainium model cache + +The repository for the public cache is `aws-neuron/optimum-neuron-cache`. This repository includes all precompiled files for commonly used models so that it is publicly available and free to use for everyone. But there are two limitations: + +1. You will not be able to push your own compiled files on this repo +2. It is public and you might want to use a private repo for private models + +To alleviate that [you can create your own private cache repository](https://huggingface.co/new) and set the environment variable `CUSTOM_CACHE_REPO`. For example, if you cache repo is called `michaelbenayoun/my_custom_cache_repo`, you just need to do: + +```bash +CUSTOM_CACHE_REPO="michaelbenayoun/my_custom_cache_repo" torchrun ... +``` + +or: + +```bash +export CUSTOM_CACHE_REPO="michaelbenayoun/my_custom_cache_repo" +torchrun ... +``` + +You have to be [logged into the Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/quick-start#login) to be able to push and pull files from your private cache repository. + +### Cache system flow + +
+
+
+ Cache system flow
+