Skip to content

Commit

Permalink
[docs] Llama3 (#31662)
Browse files Browse the repository at this point in the history
quick usage to top
  • Loading branch information
stevhliu authored Jun 27, 2024
1 parent e44b878 commit 464aa74
Showing 1 changed file with 10 additions and 14 deletions.
24 changes: 10 additions & 14 deletions docs/source/en/model_doc/llama3.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,15 @@ rendered properly in your Markdown viewer.

# Llama3

```py3
import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B"

pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
pipeline("Hey how are you doing today?")
```

## Overview

Expand Down Expand Up @@ -66,20 +75,7 @@ model = AutoModelForCausalLM.from_pretrained("/output/path")
Note that executing the script requires enough CPU RAM to host the whole model in float16 precision (even if the biggest versions
come in several checkpoints they each contain a part of each weight of the model, so we need to load them all in RAM). For the 75B model, it's thus 145GB of RAM needed.


- When using Flash Attention 2 via `attn_implementation="flash_attention_2"`, don't pass `torch_dtype` to the `from_pretrained` class method and use Automatic Mixed-Precision training. When using `Trainer`, it is simply specifying either `fp16` or `bf16` to `True`. Otherwise, make sure you are using `torch.autocast`. This is required because the Flash Attention only support `fp16` and `bf16` data type.

## Quick usage

```py3
import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B"

pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
pipeline("Hey how are you doing today?")
```

## Resources
A ton of cool resources are already available on the documentation page of [~llama2], inviting contributors to add new resources curated for Llama3 here! 🤗
A ton of cool resources are already available on the documentation page of [Llama2](./llama2), inviting contributors to add new resources curated for Llama3 here! 🤗

0 comments on commit 464aa74

Please sign in to comment.