Repository allows one to train eagle draft model fully compatible with SGLang that achives paper score in terms of end to end latency speed up and generation throughput. I will work on this project to make it minimalistic as possible while making it scalable to allow you to train SOTA eagle draft model under 1 hour on a single node of enterprise GPUs but not limited to. Checkout pages to get started
-
Notifications
You must be signed in to change notification settings - Fork 0
Pretty and simple to use implementation of speculative decoding algorithm eagle which is extrapolation algorithm for greater language model efficiency 🦅
License
vladislavkruglikov/eagle
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
Repository files navigation
About
Pretty and simple to use implementation of speculative decoding algorithm eagle which is extrapolation algorithm for greater language model efficiency 🦅