Description
Environment Setup
Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.75.0
Commit sha: c38a7d7
Docker label: sha-6c4496a
Kubernetes Cluster deployment
1 A100 GPU with 80GB RAM
12 CPU with 32 GB RAM
TGI version: 2.0.0
TGI Parameters:
MAX_INPUT_LENGTH: "8000"
MAX_TOTAL_TOKENS: "8512"
MAX_CONCURRENT_REQUESTS: "128"
LOG_LEVEL: "INFO"
MAX_BATCH_TOTAL_TOKENS: "4294967295"
WAITING_SERVED_RATIO: "0.3"
MAX_WAITING_TOKENS: "0"
MAX_BATCH_PREFILL_TOKENS: "32768"
Question
I am trying to run Mistral7b on A100GB but I see it taking up 67GB of VRAM.
Mon May 6 10:02:50 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:17:00.0 Off | 0 |
| N/A 47C P0 87W / 300W | 67815MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Why is that? Mistral7b is a 7b param model, and with 16-bit precision that should lead to 14GB of VRAM to store the model, right?
What are the extra 53815MB in VRAM coming from?
What seems to be the issue? Am I doing something wrong?