Merge branch 'main' into support-mixtral-kvcache-reuse

huggingface · May 4, 2024 · 07572a7 · 07572a7
2 parents 7ca1876 + f32fc57
commit 07572a7
Show file tree

Hide file tree

Showing 31 changed files with 282 additions and 109 deletions.
diff --git a/README.md b/README.md
@@ -17,18 +17,17 @@ limitations under the License.
 ![](https://github.com/huggingface/optimum-habana/blob/main/readme_logo.png)
 
 
-# Optimum Habana
+# Optimum for Intel® Gaudi® Accelerators 
 
-🤗 Optimum Habana is the interface between the 🤗 Transformers and Diffusers libraries and [Habana's Gaudi processor (HPU)](https://docs.habana.ai/en/latest/index.html).
+Optimum for Intel Gaudi - a.k.a. `optimum-habana` - is the interface between the Transformers and Diffusers libraries and [Intel Gaudi AI Accelerators (HPU)](https://docs.habana.ai/en/latest/index.html).
 It provides a set of tools enabling easy model loading, training and inference on single- and multi-HPU settings for different downstream tasks.
-The list of officially validated models and tasks is available [here](https://github.com/huggingface/optimum-habana#validated-models). Users can try other models and tasks with only few changes.
+The list of officially validated models and tasks is available [here](https://github.com/huggingface/optimum-habana#validated-models). Users can try other of the thousands of Hugging Face models on Intel Gaudi accelerators and tasks with only few changes.
 
 
-## What is a Habana Processing Unit (HPU)?
+## What are Intel Gaudi AI Accelerators (HPUs)?
 
 HPUs offer fast model training and inference as well as a great price-performance ratio.
-Check out [this blog post about BERT pre-training](https://huggingface.co/blog/pretraining-bert) and [this article benchmarking Habana Gaudi2 versus Nvidia A100 GPUs](https://huggingface.co/blog/habana-gaudi-2-benchmark) for concrete examples.
-If you are not familiar with HPUs and would like to know more about them, we recommend you take a look at [our conceptual guide](https://huggingface.co/docs/optimum/habana/concept_guides/hpu).
+Check out [this blog post about BLOOM inference](https://huggingface.co/blog/habana-gaudi-2-bloom) and [this post benchmarking Intel Gaudi 2 and NVIDIA A100 GPUs for BridgeTower training](https://huggingface.co/blog/bridgetower) for concrete examples.
 
 
 ## Install the library and get example scripts
@@ -51,7 +50,7 @@ To use the example associated with the latest stable release, run:
 
 ### Option 2: Use the latest main branch under development
 
-Optimum Habana is a fast-moving project, and you may want to install it from source and get the latest scripts :
+Optimum for Intel Gaudi is a fast-moving project, and you may want to install it from source and get the latest scripts :
 
 ```bash
 pip install git+https://github.com/huggingface/optimum-habana.git
@@ -76,15 +75,15 @@ To install the requirements for every example:
 
 ### Quick Start
 
-🤗 Optimum Habana was designed with one goal in mind: **to make training and inference straightforward for any 🤗 Transformers and 🤗 Diffusers user while leveraging the complete power of Gaudi processors**.
+Optimum for Intel Gaudi was designed with one goal in mind: **to make training and inference straightforward for Transformers and Diffusers users, while fully leveraging the power of Intel Gaudi AI Accelerators**.
 
 #### Transformers Interface
 
 There are two main classes one needs to know:
 - [GaudiTrainer](https://huggingface.co/docs/optimum/habana/package_reference/trainer): the trainer class that takes care of compiling and distributing the model to run on HPUs, and performing training and evaluation.
 - [GaudiConfig](https://huggingface.co/docs/optimum/habana/package_reference/gaudi_config): the class that enables to configure Habana Mixed Precision and to decide whether optimized operators and optimizers should be used or not.
 
-The [GaudiTrainer](https://huggingface.co/docs/optimum/habana/package_reference/trainer) is very similar to the [🤗 Transformers Trainer](https://huggingface.co/docs/transformers/main_classes/trainer), and adapting a script using the Trainer to make it work with Gaudi will mostly consist in simply swapping the `Trainer` class for the `GaudiTrainer` one.
+The [GaudiTrainer](https://huggingface.co/docs/optimum/habana/package_reference/trainer) is very similar to the [Transformers Trainer](https://huggingface.co/docs/transformers/main_classes/trainer), and adapting a script using the Trainer to make it work with Intel Gaudi accelerators will mostly consist in simply swapping the `Trainer` class for the `GaudiTrainer` one.
 That's how most of the [example scripts](https://github.com/huggingface/optimum-habana/tree/main/examples) were adapted from their [original counterparts](https://github.com/huggingface/transformers/tree/main/examples/pytorch).
 
 Here is an example:
@@ -115,12 +114,12 @@ Here is an example:
 )
 ```
 
-where `gaudi_config_name` is the name of a model from the [Hub](https://huggingface.co/Habana) (Gaudi configurations are stored in model repositories) or a path to a local Gaudi configuration file (you can see [here](https://huggingface.co/docs/optimum/habana/package_reference/gaudi_config) how to write your own).
+where `gaudi_config_name` is the name of a model from the [Hub](https://huggingface.co/Habana) (Intel Gaudi configurations are stored in model repositories) or a path to a local Intel Gaudi configuration file (you can see [here](https://huggingface.co/docs/optimum/habana/package_reference/gaudi_config) how to write your own).
 
 
 #### Diffusers Interface
 
-You can generate images from prompts using Stable Diffusion on Gaudi using the [`GaudiStableDiffusionPipeline`](https://huggingface.co/docs/optimum/habana/package_reference/stable_diffusion_pipeline) class and the [`GaudiDDIMScheduler`] which have been both optimized for HPUs. Here is how to use them and the differences with the 🤗 Diffusers library:
+You can generate images from prompts using Stable Diffusion on Intel Gaudi using the [`GaudiStableDiffusionPipeline`](https://huggingface.co/docs/optimum/habana/package_reference/stable_diffusion_pipeline) class and the [`GaudiDDIMScheduler`] which have been both optimized for HPUs. Here is how to use them and the differences with the Diffusers library:
 
 ```diff
 - from diffusers import DDIMScheduler, StableDiffusionPipeline
@@ -151,12 +150,12 @@ outputs = generator(
 
 ### Documentation
 
-Check out [the documentation of Optimum Habana](https://huggingface.co/docs/optimum/habana/index) for more advanced usage.
+Check out [the documentation of Optimum for Intel Gaudi](https://huggingface.co/docs/optimum/habana/index) for more advanced usage.
 
 
 ## Validated Models
 
-The following model architectures, tasks and device distributions have been validated for 🤗 Optimum Habana:
+The following model architectures, tasks and device distributions have been validated for Optimum for Intel Gaudi:
 
 > In the tables below, :heavy_check_mark: means single-card, multi-card and DeepSpeed have all been validated.
 
@@ -225,16 +224,17 @@ The following model architectures, tasks and device distributions have been vali
 
 </div>
 
-Other models and tasks supported by the 🤗 Transformers and 🤗 Diffusers library may also work. You can refer to this [section](https://github.com/huggingface/optimum-habana#how-to-use-it) for using them with 🤗 Optimum Habana. Besides, [this page](https://github.com/huggingface/optimum-habana/tree/main/examples) explains how to modify any [example](https://github.com/huggingface/transformers/tree/main/examples/pytorch) from the 🤗 Transformers library to make it work with 🤗 Optimum Habana.
+Other models and tasks supported by the Transformers and Diffusers libraries may also work. You can refer to this [section](https://github.com/huggingface/optimum-habana#how-to-use-it) for using them with Optimum for Intel Gaudi. In addition, [this page](https://github.com/huggingface/optimum-habana/tree/main/examples) explains how to modify any [example](https://github.com/huggingface/transformers/tree/main/examples/pytorch) from the Transformers library to make it work with Optimum for Intel Gaudi.
 
 If you find any issues while using those, please open an issue or a pull request.
 
+After training your model, feel free to submit it to the Intel [leaderboard](https://huggingface.co/spaces/Intel/powered_by_intel_llm_leaderboard) which is designed to evaluate, score, and rank open-source LLMs that have been pre-trained or fine-tuned on Intel Hardwares. Models submitted to the leaderboard will be evaluated on the Intel Developer Cloud. The evaluation platform consists of Gaudi Accelerators and Xeon CPUs running benchmarks from the Eleuther AI Language Model Evaluation Harness.
 
 ## Gaudi Setup
 
-Please refer to Habana Gaudi's official [installation guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html).
+Please refer to the Intel Gaudi AI Accelerator official [installation guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html).
 
-> Tests should be run in a Docker container based on Habana Docker images.
+> Tests should be run in a Docker container based on Intel Gaudi Docker images.
 >
 > The current version has been validated for SynapseAI 1.15.
 

diff --git a/docs/source/concept_guides/hpu.mdx b/docs/source/concept_guides/hpu.mdx
@@ -14,9 +14,9 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-# What are Habana's first-generation Gaudi, Gaudi2 and HPUs?
+# What are Intel® Gaudi® 1, Intel® Gaudi® 2 and HPUs?
 
-[Gaudi](https://habana.ai/training/gaudi/) and [Gaudi2](https://habana.ai/training/gaudi2/) are the first- and second-generation AI hardware accelerators designed by Habana Labs.
+[Intel Gaudi 1](https://habana.ai/training/gaudi/) and [Intel Gaudi 2](https://habana.ai/training/gaudi2/) are the first- and second-generation AI hardware accelerators designed by Habana Labs and Intel.
 A single server contains 8 devices called Habana Processing Units (HPUs) with 96GB of memory each on Gaudi2 and 32GB on first-gen Gaudi.
 Check out [here](https://docs.habana.ai/en/latest/Gaudi_Overview/Gaudi_Architecture.html) for more information about the underlying hardware architecture.
 
@@ -31,7 +31,7 @@ Two execution modes are supported on HPUs for PyTorch, which is the main deep le
 - *Eager mode* execution, where the framework executes one operation at a time as defined in [standard PyTorch eager mode](https://pytorch.org/tutorials/beginner/hybrid_frontend/learning_hybrid_frontend_through_example_tutorial.html).
 - *Lazy mode* execution, where operations are internally accumulated in a graph. The execution of the operations in the accumulated graph is triggered in a lazy manner, only when a tensor value is required by the user or when it is explicitly required in the script. The [SynapseAI graph compiler](https://docs.habana.ai/en/latest/Gaudi_Overview/SynapseAI_Software_Suite.html#graph-compiler-and-runtime) will optimize the execution of the operations accumulated in the graph (e.g. operator fusion, data layout management, parallelization, pipelining and memory management, graph-level optimizations).
 
-See [here](../usage_guides/accelerate_training#lazy-mode) how to use these execution modes in 🤗 Optimum Habana.
+See [here](../usage_guides/accelerate_training#lazy-mode) how to use these execution modes in Optimum for Intel Gaudi.
 
 
 ## Distributed training

diff --git a/docs/source/index.mdx b/docs/source/index.mdx
@@ -15,17 +15,17 @@ limitations under the License.
 -->
 
 
-# 🤗 Optimum Habana
+# Optimum for Intel Gaudi
 
-🤗 Optimum Habana is the interface between the 🤗 Transformers and 🤗 Diffusers libraries and [Habana's Gaudi processor (HPU)](https://docs.habana.ai/en/latest/index.html).
+Optimum for Intel Gaudi is the interface between the Transformers and Diffusers libraries and [Intel® Gaudi® AI Accelerators (HPUs)](https://docs.habana.ai/en/latest/index.html).
 It provides a set of tools that enable easy model loading, training and inference on single- and multi-HPU settings for various downstream tasks as shown in the table below.
 
 HPUs offer fast model training and inference as well as a great price-performance ratio.
-Check out [this blog post about BERT pre-training](https://huggingface.co/blog/pretraining-bert) and [this article benchmarking Habana Gaudi2 versus Nvidia A100 GPUs](https://huggingface.co/blog/habana-gaudi-2-benchmark) for concrete examples.
+Check out [this blog post about BERT pre-training](https://huggingface.co/blog/pretraining-bert) and [this post benchmarking Intel Gaudi 2 with NVIDIA A100 GPUs](https://huggingface.co/blog/habana-gaudi-2-benchmark) for concrete examples.
 If you are not familiar with HPUs, we recommend you take a look at [our conceptual guide](./concept_guides/hpu).
 
 
-The following model architectures, tasks and device distributions have been validated for 🤗 Optimum Habana:
+The following model architectures, tasks and device distributions have been validated for Optimum for Intel Gaudi:
 
 <Tip>
 

diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
@@ -12,8 +12,8 @@ specific language governing permissions and limitations under the License.
 
 # Installation
 
-To install 🤗 Optimum Habana, you first need to install SynapseAI and the Gaudi drivers by following the official [installation guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html).
-Then, 🤗 Optimum Habana can be installed using `pip` as follows:
+To install Optimum for Intel Gaudi, you first need to install SynapseAI and the Intel® Gaudi® drivers by following the official [installation guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html).
+Then, Optimum for Intel Gaudi can be installed using `pip` as follows:
 
 ```bash
 python -m pip install --upgrade-strategy eager optimum[habana]

diff --git a/docs/source/package_reference/gaudi_config.mdx b/docs/source/package_reference/gaudi_config.mdx
@@ -14,11 +14,11 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-# Gaudi Configuration
+# GaudiConfig
 
 Here is a description of each configuration parameter:
-- `use_fused_adam` enables to decide whether to use the [custom fused implementation of the ADAM optimizer provided by Habana](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#custom-optimizers).
-- `use_fused_clip_norm` enables to decide whether to use the [custom fused implementation of gradient norm clipping provided by Habana](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#other-custom-ops).
+- `use_fused_adam` enables to decide whether to use the [custom fused implementation of the ADAM optimizer provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#custom-optimizers).
+- `use_fused_clip_norm` enables to decide whether to use the [custom fused implementation of gradient norm clipping provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#other-custom-ops).
 - `use_torch_autocast` enables PyTorch autocast; used to define good pre-defined config; users should favor `--bf16` training argument
 - `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag LOWER_LIST is a preffered way for operator autocast list override
 - `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag FP32_LIST is a preffered way for operator autocast list override

diff --git a/docs/source/usage_guides/accelerate_training.mdx b/docs/source/usage_guides/accelerate_training.mdx
@@ -181,4 +181,4 @@ In distributed mode, memory stats are communicated only by the main process.
 
 </Tip>
 
-You can take a look at [Habana Gaudi's official documentation](https://docs.habana.ai/en/latest/PyTorch/PyTorch_User_Guide/Python_Packages.html#memory-stats-apis) for more information about the memory stats API.
+You can take a look at [Intel® Gaudi® AI Accelerator's official documentation](https://docs.habana.ai/en/latest/PyTorch/PyTorch_User_Guide/Python_Packages.html#memory-stats-apis) for more information about the memory stats API.
diff --git a/docs/source/usage_guides/overview.mdx b/docs/source/usage_guides/overview.mdx
@@ -16,7 +16,7 @@ limitations under the License.
 
 # Overview
 
-Welcome to the 🤗 Optimum Habana how-to guides!
+Welcome to the Optimum for Intel Gaudi how-to guides!
 These guides tackle more advanced topics and will show you how to easily get the best from HPUs:
 - [Pretraining models](./pretraining)
 - [Accelerating training](./accelerate_training)

diff --git a/examples/audio-classification/README.md b/examples/audio-classification/README.md
@@ -95,6 +95,8 @@ On 8 HPUs, this script should run in ~12 minutes and yield an accuracy of **80.4
 
 > If your model classification head dimensions do not fit the number of labels in the dataset, you can specify `--ignore_mismatched_sizes` to adapt it.
 
+> If you get an error reporting unused parameters in the model, you can specify `--ddp_find_unused_parameters True`. Using this parameter might affect the training speed.
+
 
 ## DeepSpeed
 

diff --git a/examples/audio-classification/run_audio_classification.py b/examples/audio-classification/run_audio_classification.py
@@ -390,8 +390,8 @@ def compute_metrics(eval_pred):
         ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
     )
 
-    # freeze the convolutional waveform encoder
-    if model_args.freeze_feature_encoder:
+    # freeze the convolutional waveform encoder if supported by model
+    if hasattr(model, "freeze_feature_encoder") and model_args.freeze_feature_encoder:
         model.freeze_feature_encoder()
 
     if training_args.do_train:

diff --git a/examples/language-modeling/run_lora_clm.py b/examples/language-modeling/run_lora_clm.py
@@ -134,15 +134,15 @@ class ModelArguments:
         default=False,
         metadata={
             "help": (
-                "Whether to run attention softmax layer in bf16 precision for fine-tuning. The current support is limited to Llama only.",
+                "Whether to run attention softmax layer in bf16 precision for fine-tuning. The current support is limited to Llama only."
             )
         },
     )
     use_flash_attention: bool = field(
         default=False,
         metadata={
             "help": (
-                "Whether to use Habana flash attention for fine-tuning. The current support is limited to Llama only.",
+                "Whether to use Habana flash attention for fine-tuning. The current support is limited to Llama only."
             )
         },
     )
@@ -151,7 +151,7 @@ class ModelArguments:
         metadata={
             "help": (
                 "Whether to enable recompute in Habana flash attention for fine-tuning."
-                " It is applicable only when use_flash_attention is True.",
+                " It is applicable only when use_flash_attention is True."
             )
         },
     )
@@ -160,16 +160,14 @@ class ModelArguments:
         metadata={
             "help": (
                 "Whether to enable causal mask in Habana flash attention for fine-tuning."
-                " It is applicable only when use_flash_attention is True.",
+                " It is applicable only when use_flash_attention is True."
             )
         },
     )
     use_fused_rope: bool = field(
         default=True,
         metadata={
-            "help": (
-                "Whether to use Habana fused-rope for fine-tuning. The current support is limited to Llama only.",
-            )
+            "help": ("Whether to use Habana fused-rope for fine-tuning. The current support is limited to Llama only.")
         },
     )
     load_meta_device: bool = field(

diff --git a/examples/speech-recognition/README.md b/examples/speech-recognition/README.md
@@ -80,7 +80,7 @@ python run_speech_recognition_ctc.py \
     --throughput_warmup_steps="3" \
     --bf16 \
     --use_hpu_graphs_for_training \
-    --use_hpu_grpahs_for_inference
+    --use_hpu_graphs_for_inference
 ```
 
 On a single HPU, this script should run in *ca.* 6 hours and yield a CTC loss of **0.059** and a word error rate of **0.0423**.

diff --git a/examples/stable-diffusion/README.md b/examples/stable-diffusion/README.md
@@ -16,7 +16,7 @@ limitations under the License.
 
 # Stable Diffusion Examples
 
-This directory contains a script that showcases how to perform text-to-image generation using Stable Diffusion on Habana Gaudi.
+This directory contains a script that showcases how to perform text-to-image generation using Stable Diffusion on Intel® Gaudi® AI Accelerators.
 
 Stable Diffusion was proposed in [Stable Diffusion Announcement](https://stability.ai/blog/stable-diffusion-announcement) by Patrick Esser and Robin Rombach and the Stability AI team.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -181,4 +181,4 @@ In distributed mode, memory stats are communicated only by the main process.

		</Tip>

		You can take a look at [Habana Gaudi's official documentation](https://docs.habana.ai/en/latest/PyTorch/PyTorch_User_Guide/Python_Packages.html#memory-stats-apis) for more information about the memory stats API.
		You can take a look at [Intel® Gaudi® AI Accelerator's official documentation](https://docs.habana.ai/en/latest/PyTorch/PyTorch_User_Guide/Python_Packages.html#memory-stats-apis) for more information about the memory stats API.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -95,6 +95,8 @@ On 8 HPUs, this script should run in ~12 minutes and yield an accuracy of **80.4

		> If your model classification head dimensions do not fit the number of labels in the dataset, you can specify `--ignore_mismatched_sizes` to adapt it.

		> If you get an error reporting unused parameters in the model, you can specify `--ddp_find_unused_parameters True`. Using this parameter might affect the training speed.


		## DeepSpeed

Expand Down