Move file

huggingface · philschmid · Apr 25, 2023 · Apr 24, 2023 · Apr 24, 2023 · Apr 24, 2023
commit bb97a7b772590afa73d8c8b91cf2cd97f0d73bbe
diff --git a/docs/source/cache_system.mdx b/docs/source/cache_system.mdx
diff --git a/docs/source/guides/cache_system.mdx b/docs/source/guides/cache_system.mdx
@@ -1,19 +1,72 @@
-<!---
-Copyright 2023 The HuggingFace Team. All rights reserved.
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
 
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
 
-    http://www.apache.org/licenses/LICENSE-2.0
+http://www.apache.org/licenses/LICENSE-2.0
 
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
 -->
 
-# Trainium model cache
+# Optimum Neuron Compilation Cache System
 
-TODO:
+As you might have noticed while training your models on Trainium instances, Neuron cores work on models compiled to the `neff` format. 
+It means that training requires one or multiple compilation step(s) that can take a long time. 
+
+With [`torch-neuronx`](https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx), a Neuron-compatible  version of PyTorch, the compiled files are stored in a local directory once you performed compilation, usually `/var/tmp/neuron-compile-cache`. This way you do not need to perform compilation again when you start another training with similar parameters. 
+
+We go one step further with Optimum Neuron by plugging this local cache directory to the Hugging Face Hub. During training, the [`TrainiumTrainer`] will check if compilation files are available on the Hub and download them if they are found, allowing you to save both time and cost by skipping the compilation phase.
+
+## How the caching system works
+
+### Hash computation
+
+Many factors can trigger compilation among which: 
+
+- The model
+- The input shapes
+- The precision of the model, full-precision or bf16
+- The version of the compiler
+- The number of Neuron cores used
+
+Using all these parameters, we can compute a hash, and we can then compare hashes we have locally for our training session against hashes stored on the Hugging Face Hub, and act accordingly (download or push).
+
+
+### Using a custom cache repo on the Hugging Face Hub
+
+The official repo on the Hub storing compilation files is `aws-neuron/optimum-neuron-cache`. In this repo, we (the Optimum Neuron team) push compilation files for commonly used models so that it is publicly available and free to use for everyone. But there are two limitations:
+
+1. You will not be able to push your own compiled files on this repo 
+2. It is public and you might want to use a private repo for private models
+
+To alleviate that [you can create your own model repo on the Hub](https://huggingface.co/new) and set the environment variable `CUSTOM_CACHE_REPO`. For example, if you cache repo is called `michaelbenayoun/my_custom_cache_repo`, you just need to do:
+
+```bash
+CUSTOM_CACHE_REPO="michaelbenayoun/my_custom_cache_repo" torchrun ...
+```
+
+or:
+
+```bash
+export CUSTOM_CACHE_REPO="michaelbenayoun/my_custom_cache_repo"
+torchrun ...
+```
+
+If you are [logged in the Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/quick-start#login) on the Trainium instance, you will then be able to fetch files both from your custom cache repo, which can be public or private, and from the official cache repo, as well as pushing your compiled files to your custom cache repo. 
+
+### Cache system flow
+
+<p align="center">
+  <img alt="Cache system flow" src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/neuron/cache_system_flow.jpg">
+  <br>
+  <em style="color: grey">Cache system flow</em>
+</p>
+
+
+At each the beginning of each training step, the  [`TrainiumTrainer`] computes a `NeuronHash` and checks the cache repo(s) (official and custom) on the Hugging Face Hub to see if there are compiled files associated to this hash. 
+If that is the case, the files are downloaded directly to the local cache directory and no compilation is needed. Otherwise compilation is performed.
+
+
+Just as for downloading compiled files, the [`TrainiumTrainer`] will keep track of the newly created compilation files at each training step, and upload them to the Hugging Face Hub at save time or when training ends. This assumes that you have writing access to the cache repo, otherwise nothing will be pushed.