-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core][5/N] Fully working chunked prefill e2e #3884
Changes from 1 commit
61b7294
0272344
01e5b3b
502bd19
afa247e
a4cbe2d
e735cc2
62db33a
a18ae3a
4b84904
5ec4891
b814fdb
d01f893
346e862
addf88e
26bfcc3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -210,8 +210,8 @@ def forward( | |
decode_meta = attn_metadata.decode_metadata | ||
assert decode_meta is not None | ||
# Decoding run. | ||
output[num_prefill_tokens:] = PagedAttention.forward_decode( | ||
query, | ||
out = PagedAttention.forward_decode( | ||
decode_query, | ||
key_cache, | ||
value_cache, | ||
decode_meta.block_tables, | ||
|
@@ -223,6 +223,8 @@ def forward( | |
self.alibi_slopes, | ||
kv_scale, | ||
) | ||
assert out.shape == output[num_prefill_tokens:].shape | ||
output[num_prefill_tokens:] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oops that;s correct. I don't know how the test passes... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh it is handled 8afca50 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We didn't enable much test for cpu path, we will try to add step by step later. thanks for your confirming!:) |
||
|
||
# Reshape the output tensor. | ||
return output.view(-1, self.num_heads * self.head_size) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WoosukKwon @zhuohan123 would love your feedback on this interface change where the new
AttentionMetadata
can contains both prefill and decode stage metadata. I think this is a necessary change but would like to hear your thought on the interface design.