Mistral7b  takes 4 times its size in VRAM on A100

# Environment Setup
Runtime environment:

Target: x86_64-unknown-linux-gnu
Cargo version: 1.75.0
Commit sha: https://github.com/huggingface/text-generation-inference/commit/c38a7d7ddd9c612e368adec1ef94583be602fc7e
Docker label: sha-6c4496a
Kubernetes Cluster deployment

1 A100 GPU with 80GB RAM

12 CPU with 32 GB RAM

TGI version: 2.0.0

TGI Parameters:
    MAX_INPUT_LENGTH: "8000"
    MAX_TOTAL_TOKENS: "8512"
    MAX_CONCURRENT_REQUESTS: "128"
    LOG_LEVEL: "INFO"
    MAX_BATCH_TOTAL_TOKENS: "4294967295"
    WAITING_SERVED_RATIO: "0.3"
    MAX_WAITING_TOKENS: "0"
    MAX_BATCH_PREFILL_TOKENS: "32768"
    
# Question

I am trying to run Mistral7b on A100GB but I see it taking up 67GB of VRAM.

```
Mon May  6 10:02:50 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          On  |   00000000:17:00.0 Off |                    0 |
| N/A   47C    P0             87W /  300W |   67815MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
```

Why is that? Mistral7b is a 7b param model, and with 16-bit precision that should lead to 14GB of VRAM to store the model, right?
**What are the extra 53815MB in VRAM coming from?**


What seems to be the issue? Am I doing something wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mistral7b takes 4 times its size in VRAM on A100 #1863

Environment Setup

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mistral7b takes 4 times its size in VRAM on A100 #1863

Description

Environment Setup

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions