Skip to content

Mistral7b takes 4 times its size in VRAM on A100 #1863

Closed
@martinigoyanes

Description

@martinigoyanes

Environment Setup

Runtime environment:

Target: x86_64-unknown-linux-gnu
Cargo version: 1.75.0
Commit sha: c38a7d7
Docker label: sha-6c4496a
Kubernetes Cluster deployment

1 A100 GPU with 80GB RAM

12 CPU with 32 GB RAM

TGI version: 2.0.0

TGI Parameters:
MAX_INPUT_LENGTH: "8000"
MAX_TOTAL_TOKENS: "8512"
MAX_CONCURRENT_REQUESTS: "128"
LOG_LEVEL: "INFO"
MAX_BATCH_TOTAL_TOKENS: "4294967295"
WAITING_SERVED_RATIO: "0.3"
MAX_WAITING_TOKENS: "0"
MAX_BATCH_PREFILL_TOKENS: "32768"

Question

I am trying to run Mistral7b on A100GB but I see it taking up 67GB of VRAM.

Mon May  6 10:02:50 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          On  |   00000000:17:00.0 Off |                    0 |
| N/A   47C    P0             87W /  300W |   67815MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

Why is that? Mistral7b is a 7b param model, and with 16-bit precision that should lead to 14GB of VRAM to store the model, right?
What are the extra 53815MB in VRAM coming from?

What seems to be the issue? Am I doing something wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions