Prefix caching #2402

Narsil · 2024-08-12T13:24:53Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

server/text_generation_server/models/globals.py

- Remove logs - Disable VLMs (they do not work) - Disable prefix caching when user wants prefill logprobs.

* Prefix caching WIP * Fixing prefix attention. * Fixing flashinfer import. * Fixing black. * Fixing medusa (still wrong outputs, but functional). * Just medusa values now. * Fixing medusa without prefix caching. * Fixing prefix caching. * Medusa requires reshaping. * Removing the logs. * Remove router.nix * Fixup: - Remove logs - Disable VLMs (they do not work) - Disable prefix caching when user wants prefill logprobs. * Update flake.lock --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>

ZTianle · 2024-10-17T13:20:45Z

@Narsil Hi there,
I'm curious about the progress on resolving the issue with medusa decoding when prefix caching is enabled. I've been experimenting with a medusa model on TGI 2.3 using prefix caching, but I'm still encountering incorrect outputs.
Has there been any advancement in addressing this compatibility problem between medusa models and prefix caching? Any insights or updates on this matter would be greatly appreciated.

drbh reviewed Aug 12, 2024

View reviewed changes

server/text_generation_server/models/globals.py Outdated Show resolved Hide resolved

Narsil force-pushed the radix2 branch from 8af42e5 to 4e25869 Compare August 13, 2024 09:59

Narsil changed the title ~~[WIP] Prefix caching~~ Prefix caching Aug 13, 2024

Narsil force-pushed the radix2 branch 3 times, most recently from 0ea0a3d to f5c4ada Compare August 15, 2024 08:25

ErikKaum mentioned this pull request Aug 15, 2024

Can't run llama3.1-70b at full context #2301

Open

4 tasks

Narsil force-pushed the radix2 branch from f5c4ada to a8ea56b Compare August 16, 2024 12:06

Narsil added the Release tests label Aug 16, 2024

Narsil force-pushed the radix2 branch from a8ea56b to 6052c76 Compare August 16, 2024 12:23

danieldk and others added 13 commits August 16, 2024 21:35

Prefix caching WIP

44a77dc

Fixing prefix attention.

0c90550

Fixing flashinfer import.

89b42c9

Fixing black.

b31ec3b

Fixing medusa (still wrong outputs, but functional).

549f0e9

Just medusa values now.

4c8dcbb

Fixing medusa without prefix caching.

b2933b7

Fixing prefix caching.

99b6b5c

Medusa requires reshaping.

4fff77e

Removing the logs.

4a38185

Remove router.nix

97c5041

Fixup:

95155a2

- Remove logs - Disable VLMs (they do not work) - Disable prefix caching when user wants prefill logprobs.

Update flake.lock

caf9fcc

Narsil force-pushed the radix2 branch from 6052c76 to caf9fcc Compare August 16, 2024 19:36

OlivierDehaene approved these changes Aug 20, 2024

View reviewed changes

Narsil merged commit b70ae09 into main Aug 20, 2024
11 checks passed

Narsil deleted the radix2 branch August 20, 2024 09:15

OlivierDehaene mentioned this pull request Sep 26, 2024

Add prefix caching predibase/lorax#581

Merged

tgaddair mentioned this pull request Oct 31, 2024

Fix prefix caching + speculative decoding #2711

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefix caching #2402

Prefix caching #2402

Narsil commented Aug 12, 2024

ZTianle commented Oct 17, 2024

Prefix caching #2402

Prefix caching #2402

Conversation

Narsil commented Aug 12, 2024

What does this PR do?

Before submitting

Who can review?

ZTianle commented Oct 17, 2024