add kv_cache to LLM #244

vchiley · 2023-03-18T01:01:43Z

This pr include past_key_values (ie kv_cache) in the LLM so that inference can be accelerated.
We also become explicit about how we apply padding_mask for querys and keys.

Shoutout: @dakinggg for working through some of this with me.

cc @dskhudia @alextrott16 @samhavens for after training / inference

honglu2875 · 2023-03-18T02:03:19Z

@vchiley Cached kv values shift the positions as well. Maybe you want to shift the position embeddings in the following?

examples/examples/llm/src/models/mosaic_gpt.py

Line 148 in e98e675

pos = torch.arange(0, S, dtype=torch.long,

Compare with this in HF
https://github.com/huggingface/transformers/blob/60d51ef5123d949fd8c59cd4d3254e711541d278/src/transformers/models/gpt2/modeling_gpt2.py#L801

In our fork of mosaic models, we have the kv cache and the relevant part looks like the following:

        if past_key_values is None:
            past_key_values = [None] * self.cfg.n_layers
            past_position = 0
        else:
            assert len(past_key_values) == self.cfg.n_layers
            # get the key tensor whose spec should be (batch, seq, n_head, head_dim), and
            # collect the `seq`, so that we shift the position embedding later.
            past_position = past_key_values[0][0].size(1)

        tok_emb = self.transformer.wte(input_ids)  # type: ignore
        if self.alibi:
            x = tok_emb
        else:
            if S + past_position > self.cfg.max_seq_len:
                raise ValueError(
                    f'Cannot forward input with past sequence length {past_position} and current sequence length '
                    f'{S + 1}, this model only supports total sequence length <= {self.cfg.max_seq_len}.'
                )
            pos = torch.arange(past_position, S + past_position, dtype=torch.long,
                               device=input_ids.device).unsqueeze(0)
            pos_emb = self.transformer.wpe(pos)  # type: ignore
            x = tok_emb + pos_emb

dakinggg

LGTM, can you train a model and make sure nothing is broken?

examples/llm/src/models/layers/attention.py

examples/llm/src/models/layers/gpt_blocks.py

examples/llm/src/models/mosaic_gpt.py

dskhudia · 2023-03-20T23:06:10Z

Could you explain the reason for separating out query_padding_mask?

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

vchiley · 2023-03-20T23:32:18Z

@dskhudia this formulates a generic attn fn
the queries are potentially not the same as the keys/value and will need their own padding_mask.
This is useful for left padded inputs since past tokens influence future tokens, we NEED to mask them out.

examples/llm/src/models/layers/attention.py

examples/llm/src/models/mosaic_gpt.py

vchiley · 2023-03-21T00:55:13Z

Note: we should have a conversation about if all raise Error should be changed to assert (see here)

vchiley requested review from alextrott16, samhavens, dskhudia and abhi-mosaic March 18, 2023 01:01

vchiley self-assigned this Mar 18, 2023

vchiley requested a review from bmosaicml March 18, 2023 01:02

vchiley changed the title ~~init kv_cache pr~~ add kv_cache to LLM Mar 18, 2023

init kv_cache pr

5f702d8

vchiley force-pushed the attn_kv_cache branch from 4993352 to 5f702d8 Compare March 18, 2023 01:07

pass past_key_values thru model fwd

e98e675

vchiley force-pushed the attn_kv_cache branch 4 times, most recently from f8a03e7 to 26431f7 Compare March 20, 2023 16:49

add qpadmask

15ab86e

vchiley force-pushed the attn_kv_cache branch from 26431f7 to 15ab86e Compare March 20, 2023 16:50

vchiley added 2 commits March 20, 2023 17:03

enable key_padding_mask for past_key_value

cdaf9b9

use valid pos emb if using learned pos emb

48d6837

vchiley requested review from dakinggg and removed request for alextrott16 and samhavens March 20, 2023 17:31

vchiley marked this pull request as ready for review March 20, 2023 17:33

lint

4c2c895

vchiley force-pushed the attn_kv_cache branch 4 times, most recently from 68afe03 to aaf0658 Compare March 20, 2023 18:03

fix

9b77639

vchiley force-pushed the attn_kv_cache branch from aaf0658 to 9b77639 Compare March 20, 2023 18:05

vchiley added 2 commits March 20, 2023 18:09

only compute past_position when needed

2088c3f

nit

0d459af

vchiley force-pushed the attn_kv_cache branch from cc0e048 to 0d459af Compare March 20, 2023 21:03

dakinggg approved these changes Mar 20, 2023

View reviewed changes

examples/llm/src/models/layers/attention.py Outdated Show resolved Hide resolved

examples/llm/src/models/layers/gpt_blocks.py Outdated Show resolved Hide resolved

examples/llm/src/models/mosaic_gpt.py Outdated Show resolved Hide resolved

vchiley and others added 3 commits March 20, 2023 16:15

Update examples/llm/src/models/layers/gpt_blocks.py

47e75f7

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

fix suggested edit

d7ace0f

dk pr cmt

d021fab

vchiley force-pushed the attn_kv_cache branch from 7469154 to d021fab Compare March 20, 2023 23:25

dskhudia approved these changes Mar 20, 2023

View reviewed changes

examples/llm/src/models/layers/attention.py Show resolved Hide resolved

examples/llm/src/models/layers/attention.py Show resolved Hide resolved

examples/llm/src/models/mosaic_gpt.py Outdated Show resolved Hide resolved

vchiley mentioned this pull request Mar 20, 2023

Make MosaicGPT a HuggingFace PreTrainedModel #243

Merged

4 tasks

vchiley merged commit 83e6998 into mosaicml:main Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add kv_cache to LLM #244

add kv_cache to LLM #244

vchiley commented Mar 18, 2023 •

edited

Loading

honglu2875 commented Mar 18, 2023 •

edited

Loading

dakinggg left a comment

dskhudia commented Mar 20, 2023

vchiley commented Mar 20, 2023

vchiley commented Mar 21, 2023

add kv_cache to LLM #244

add kv_cache to LLM #244

Conversation

vchiley commented Mar 18, 2023 • edited Loading

honglu2875 commented Mar 18, 2023 • edited Loading

dakinggg left a comment

Choose a reason for hiding this comment

dskhudia commented Mar 20, 2023

vchiley commented Mar 20, 2023

vchiley commented Mar 21, 2023

vchiley commented Mar 18, 2023 •

edited

Loading

honglu2875 commented Mar 18, 2023 •

edited

Loading