feat(server): auto max_batch_total_tokens for flash att models #630

OlivierDehaene · 2023-07-18T09:34:12Z

No description provided.

flozi00 · 2023-07-19T16:51:53Z

since this PR is merged I am running OOM all the time, even when the cli arg is set
using the docker container and 48GB GPU
7B model is the smallest i tested

OlivierDehaene · 2023-07-19T17:34:35Z

Can you open a separate issue with the issue template?

flozi00 · 2023-07-19T18:07:31Z

found more details and opened issue

OlivierDehaene added 8 commits July 18, 2023 16:38

feat(server): auto max_batch_total_tokens for flash att models

b165f8b

fix default value

4201a8b

fix default value

a6b128b

update logs

086d0c2

pad to block size

d2e3843

add block size parameter

79616a8

revert back to normal allocator

de892fb

cleanup

160a50a

OlivierDehaene force-pushed the feat/automatic_max branch from d311508 to 160a50a Compare July 18, 2023 14:38

OlivierDehaene added 4 commits July 18, 2023 17:03

add syncs

1686a7c

use max_memory_reserved

36a9bdd

sleep to connect to the CI runner

45d24be

add tmate

99568ee

Narsil mentioned this pull request Jul 18, 2023

How to config vllm gpu_memory_utilization? #636

Closed

OlivierDehaene added 7 commits July 19, 2023 00:17

reset peak memory

05d2a77

use less memory

0111869

add clear cache when batch is finished

8793ae5

revert

7f399cd

try 0.99

0a02801

0.985

406b094

0.98

2934543

OlivierDehaene merged commit fe80f53 into main Jul 19, 2023
5 checks passed

OlivierDehaene deleted the feat/automatic_max branch July 19, 2023 07:31

flozi00 mentioned this pull request Jul 19, 2023

auto max_batch_total_tokens OOM #651

Closed

4 tasks

OlivierDehaene mentioned this pull request Jul 20, 2023

Rules of thumb for setting max-batch-total-tokens and max-batch-prefill-tokens #629

Closed

Yard1 mentioned this pull request Jul 23, 2023

fix(server): 2-stage warmup for sharded models #678

Closed

5 tasks

maziyarpanahi mentioned this pull request Aug 22, 2023

max batch total tokens problem #832

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): auto max_batch_total_tokens for flash att models #630

feat(server): auto max_batch_total_tokens for flash att models #630

OlivierDehaene commented Jul 18, 2023

flozi00 commented Jul 19, 2023

OlivierDehaene commented Jul 19, 2023

flozi00 commented Jul 19, 2023

feat(server): auto max_batch_total_tokens for flash att models #630

feat(server): auto max_batch_total_tokens for flash att models #630

Conversation

OlivierDehaene commented Jul 18, 2023

flozi00 commented Jul 19, 2023

OlivierDehaene commented Jul 19, 2023

flozi00 commented Jul 19, 2023