Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefix caching #2402

Merged
merged 13 commits into from
Aug 20, 2024
Merged

Prefix caching #2402

merged 13 commits into from
Aug 20, 2024

Conversation

Narsil
Copy link
Collaborator

@Narsil Narsil commented Aug 12, 2024

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil Narsil changed the title [WIP] Prefix caching Prefix caching Aug 13, 2024
@Narsil Narsil force-pushed the radix2 branch 3 times, most recently from 0ea0a3d to f5c4ada Compare August 15, 2024 08:25
@Narsil Narsil merged commit b70ae09 into main Aug 20, 2024
11 checks passed
@Narsil Narsil deleted the radix2 branch August 20, 2024 09:15
yuanwu2017 pushed a commit to yuanwu2017/tgi-gaudi that referenced this pull request Sep 26, 2024
* Prefix caching WIP

* Fixing prefix attention.

* Fixing flashinfer import.

* Fixing black.

* Fixing medusa (still wrong outputs, but functional).

* Just medusa values now.

* Fixing medusa without prefix caching.

* Fixing prefix caching.

* Medusa requires reshaping.

* Removing the logs.

* Remove router.nix

* Fixup:

- Remove logs
- Disable VLMs (they do not work)
- Disable prefix caching when user wants prefill logprobs.

* Update flake.lock

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
@ZTianle
Copy link

ZTianle commented Oct 17, 2024

@Narsil Hi there,
I'm curious about the progress on resolving the issue with medusa decoding when prefix caching is enabled. I've been experimenting with a medusa model on TGI 2.3 using prefix caching, but I'm still encountering incorrect outputs.
Has there been any advancement in addressing this compatibility problem between medusa models and prefix caching? Any insights or updates on this matter would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants