Add support for BLOOM #331

WoosukKwon · 2023-07-02T07:35:55Z

Closes #61

This PR adds the BLOOM model and modifies the paged attention kernel to support ALiBi bias.

WoosukKwon · 2023-07-03T18:54:00Z

@zhuohan123 I've fixed the PR to follow our new formatter. It should be ready for review now. Please take a look!

zhuohan123

LGTM! Thank you for your hard work! Left some of my questions about design choices. In addition, what's the speed difference between similar-size LLaMA and BLOOM?

zhuohan123 · 2023-07-03T19:04:30Z

csrc/attention/attention_kernels.cu

-
+      float qk = scale * Qk_dot<scalar_t, THREAD_GROUP_SIZE>::dot(q_vecs, k_vecs);
+      // Add the ALiBi bias if slopes are given.
+      qk += (alibi_slope != 0) ? alibi_slope * (token_idx - context_len) : 0;


This condition seems unnecessary. If alibi_slope == 0, then alibi_slope * (token_idx - context_len) will be 0 as well.

Yes. It's to avoid the redundant computation of 0 * (token_idx - context_len).

zhuohan123 · 2023-07-03T19:16:05Z

vllm/model_executor/layers/attention.py

@@ -53,13 +55,21 @@ def __init__(self, num_heads: int, head_size: int, scale: float) -> None:
            raise ValueError(f"head_size ({self.head_size}) is not supported. "
                             f"Supported head sizes: {_SUPPORTED_HEAD_SIZES}.")

+    def set_attn_bias(self, input_metadata: InputMetadata) -> None:
+        if input_metadata.attn_bias:
+            # Already set by a previous layer.


Why do you choose this design, instead of explicitly initializing attn_bias in advance, say at the beginning of the forward function of BLOOM?

Good question. It's because alibi_slopes is stored in the attention layer. If we want to create the attention bias in BloomForCausalLM, then we have to store alibi_slopes in both places (because alibi_slopes is also used for the decoding attention).

I kinda agree that this design is not ideal. But couldn't find a better way to do so.

Hukongtao · 2023-07-13T04:32:06Z

I used vLLM try to speed up my BLOOM model, but found that the speed did not improve. Moreover, the memory usage of vLLM is higher, what may be the reason?
vLLM:

HF:

Hukongtao · 2023-07-13T04:41:11Z

@WoosukKwon do you have some benchmarks about speed and memory with BLOOM？

vllm.utils.is_hpu() was redundant for some time now and has always been problematic particularly for torch.compile mode. Now, we're fully switching to current_platform.is_hpu().

WoosukKwon added 7 commits July 2, 2023 03:23

Add BLOOM without ALiBi

218bbb5

Minor

a65960b

Add BLOOM to supported models

dc45776

Inheritance -> Composition

b2f1167

Add ALiBi bias to attention kernel

cca8695

Add PagedAttentionWithALiBi

b691559

Fix BLOOM

8aac4ed

WoosukKwon requested a review from zhuohan123 July 2, 2023 07:35

WoosukKwon added 7 commits July 2, 2023 08:03

[Minor] Fix comment

18d54e6

[Minor] single quote -> double quote

093bfcd

[Minor] Add more comments

047a0be

Merge branch 'main' into bloom

5d189ca

Format

0bdc743

yaf

e11a911

Allow wildcard import

2617adf

zhuohan123 approved these changes Jul 3, 2023

View reviewed changes

emsi mentioned this pull request Jul 3, 2023

Add support for MPT #334

Merged

Fix test_attention

4529e7d

WoosukKwon merged commit e41f067 into main Jul 3, 2023

WoosukKwon deleted the bloom branch July 3, 2023 20:12

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add support for BLOOM (vllm-project#331)

f5df1d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add support for BLOOM #331

Add support for BLOOM #331

Uh oh!

WoosukKwon commented Jul 2, 2023

Uh oh!

WoosukKwon commented Jul 3, 2023

Uh oh!

zhuohan123 left a comment

Uh oh!

zhuohan123 Jul 3, 2023

Uh oh!

WoosukKwon Jul 3, 2023

Uh oh!

zhuohan123 Jul 3, 2023

Uh oh!

WoosukKwon Jul 3, 2023

Uh oh!

Hukongtao commented Jul 13, 2023

Uh oh!

Hukongtao commented Jul 13, 2023

Uh oh!

Uh oh!

Uh oh!

Add support for BLOOM #331

Add support for BLOOM #331

Uh oh!

Conversation

WoosukKwon commented Jul 2, 2023

Uh oh!

WoosukKwon commented Jul 3, 2023

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

zhuohan123 Jul 3, 2023

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Jul 3, 2023

Choose a reason for hiding this comment

Uh oh!

zhuohan123 Jul 3, 2023

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Jul 3, 2023

Choose a reason for hiding this comment

Uh oh!

Hukongtao commented Jul 13, 2023

Uh oh!

Hukongtao commented Jul 13, 2023

Uh oh!

Uh oh!