Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of improvements (Still 2 allocators) #2449

Merged
merged 49 commits into from
Aug 29, 2024
Merged
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
60719ba
Making prefix/flashinfer the default and testing the full release tests.
Narsil Aug 16, 2024
9d4c5d3
Include flashinfer in the docker.
Narsil Aug 16, 2024
f2bdc65
Using prebuilt.
Narsil Aug 16, 2024
f55278d
Allowing window_left_size (dummy version).
Narsil Aug 17, 2024
cba59ac
Disabling flashinfer/prefix caching on odd head_dim
Narsil Aug 19, 2024
a6cd5fe
Disable prefix caching for lora.
Narsil Aug 20, 2024
f0b35f9
More specific codes.
Narsil Aug 20, 2024
ffb6841
Update lock
Narsil Aug 20, 2024
ba1ce20
Updating integration tests with new values with FI/FD.
Narsil Aug 20, 2024
17c8a5e
Update cargo lock ?
Narsil Aug 20, 2024
344fee0
Upgrade to 1.80 because of bitstream...
Narsil Aug 20, 2024
860b550
Everywhere 1.80
Narsil Aug 20, 2024
8d0220a
Forgot last default place.
Narsil Aug 20, 2024
b80593b
Apply suggestions from code review
Narsil Aug 21, 2024
0bf4eb9
Updated flake lock
Narsil Aug 21, 2024
5eb6ea0
Tmp
Narsil Aug 22, 2024
32f6416
Upgrade resolution system for less errors in resolution.
Narsil Aug 23, 2024
c53968d
Remove lambda for cleaner function.
Narsil Aug 23, 2024
682db34
Handling debugger.
Narsil Aug 26, 2024
1568e82
OVerride the env in server tests.
Narsil Aug 26, 2024
f5182c1
Is this enough to make it work ?
Narsil Aug 26, 2024
26e5037
This seems to be working.
Narsil Aug 26, 2024
27b566b
Downgrade some logs.
Narsil Aug 26, 2024
e30fb25
Fixing the default for vlm.
Narsil Aug 26, 2024
f1c0735
Don't enable prefix caching on VLM just yet.
Narsil Aug 27, 2024
7f1816a
Change `add_special_tokens` in order to have the correct tokens for chat
Narsil Aug 27, 2024
65b94a6
Fixing prefix caching for flashdecoding.
Narsil Aug 27, 2024
bb9769e
Update all models.
Narsil Aug 27, 2024
55d984d
Fixed flashinfer version.
Narsil Aug 27, 2024
9dacac3
add_special_tokens is internal only
Narsil Aug 27, 2024
e0069a3
Fixing seqlen with the new vlms.
Narsil Aug 27, 2024
2cf1f5c
Fixing the issue with `add_special_tokens` not being passed around.
Narsil Aug 27, 2024
ccaf1d0
Fixing the test.
Narsil Aug 27, 2024
8ac1ffa
Removing encoder_decoder (seq2seq).
Narsil Aug 27, 2024
c6f1a61
Update the chat test.
Narsil Aug 27, 2024
0a60973
Fixing the batching tokenization in flash causal lm.
Narsil Aug 28, 2024
e6ee67f
Truncating left for radix purposes.
Narsil Aug 28, 2024
f886747
Oops this doesn't belong here.
Narsil Aug 28, 2024
1232556
Put back default pure shell.
Narsil Aug 28, 2024
8d01848
Update server tests
Narsil Aug 28, 2024
8a4df6e
Only n_heads / process_group.size() are necessary.
Narsil Aug 28, 2024
e7e0363
Revert the integrationt tests change (seem linked to head_size
Narsil Aug 28, 2024
9c839ca
Adding error message when assert is violated.
Narsil Aug 28, 2024
bef2f6b
Fixing the free algorithm to handle times where the common prefix is
Narsil Aug 29, 2024
4b37500
Apply suggestions from code review
Narsil Aug 29, 2024
d77f5f2
Update server/text_generation_server/layers/attention/common.py
Narsil Aug 29, 2024
9bfdac2
Fix disabling prefix caching - Fix windowing checks.
Narsil Aug 29, 2024
0c00b94
Revert the Cohere tokenizer change (for now using a revision instead).
Narsil Aug 29, 2024
b412679
Fmt.
Narsil Aug 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update lock
  • Loading branch information
Narsil committed Aug 27, 2024
commit ffb6841121e53081be2007d5cbd19a69924b09c4
Loading