fix: prefer inplace softmax to avoid copy #2661

drbh · 2024-10-17T03:05:30Z

This PR modifies log_softmax to operate in place, eliminating the need to copy large tensors. This optimization reduces memory consumption during warmup.

For instance, when using meta-llama/Meta-Llama-3.1-8B-Instruct on an L4, this change allows running the model with --max-batch-prefill-tokens increased from 7192 to 9874 without exceeding memory limits

Narsil · 2024-10-17T09:51:10Z

For instance, when using meta-llama/Meta-Llama-3.1-8B-Instruct on an L4, this change allows running the model with --max-batch-prefill-tokens increased from 7192 to 9874 without exceeding memory limits

With chunking in place, we don't want to increase max-batch-prefill-tokens anymore, but instead choose an optimal value :).

That being said, nice find, probably a lot of opportunities for such updates (let's focus on the large tensors though)

server/text_generation_server/models/flash_causal_lm.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

fix: prefer inplace softmax to avoid copy

8d7448d

Narsil previously approved these changes Oct 17, 2024

View reviewed changes

Narsil reviewed Oct 17, 2024

View reviewed changes

server/text_generation_server/models/flash_causal_lm.py Outdated Show resolved Hide resolved

drbh dismissed Narsil’s stale review via 3e0a82d October 17, 2024 12:48

Update server/text_generation_server/models/flash_causal_lm.py

3e0a82d

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

drbh merged commit 5f32dea into main Oct 17, 2024
8 checks passed

drbh deleted the prefer-inplace-softmax-for-prefill-logprobs branch October 17, 2024 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prefer inplace softmax to avoid copy #2661

fix: prefer inplace softmax to avoid copy #2661

drbh commented Oct 17, 2024

Narsil commented Oct 17, 2024

fix: prefer inplace softmax to avoid copy #2661

fix: prefer inplace softmax to avoid copy #2661

Conversation

drbh commented Oct 17, 2024

Narsil commented Oct 17, 2024