Releases · DefTruth/Awesome-LLM-Inference

31 Jan 06:54

DefTruth

v2.6.11

d7914c0

v2.6.11 Latest

Latest

What's Changed

add MiniMax-01 in Trending LLM/VLM Topics and Long Context Attention by @shaoyuyoung in #112
[feat] add deepseek-r1 by @shaoyuyoung in #113
🔥🔥[DistServe] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving by @DefTruth in #114
🔥🔥[KVDirect] KVDirect: Distributed Disaggregated LLM Inference by @DefTruth in #115
🔥🔥[DeServe] DESERVE: TOWARDS AFFORDABLE OFFLINE LLM INFERENCE VIA DECENTRALIZATION by @DefTruth in #116
🔥🔥[Mooncake] Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving by @DefTruth in #117

New Contributors

@shaoyuyoung made their first contribution in #112

Full Changelog: v2.6.10...v2.6.11

Contributors

DefTruth and shaoyuyoung

Assets 2

06 Jan 06:12

DefTruth

v2.6.10

b8b3a43

v2.6.10

What's Changed

🔥🔥🔥[DeepSeek-V3] DeepSeek-V3 Technical Report by @DefTruth in #109
🔥🔥[SP: TokenRing] TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication by @DefTruth in #110
🔥🔥[FFPA] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth) by @DefTruth in #111

Full Changelog: v2.6.9...v2.6.10

Contributors

DefTruth

Assets 2

22 Dec 08:04

DefTruth

v2.6.9

6ad7b30

v2.6.9

What's Changed

🔥🔥[TurboAttention] TURBOATTENTION: EFFICIENT ATTENTION APPROXIMATION FOR HIGH THROUGHPUTS LLMS by @DefTruth in #105
🔥🔥[NITRO] NITRO: LLM INFERENCE ON INTEL® LAPTOP NPUS by @DefTruth in #106
🔥[DynamicKV] DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs by @DefTruth in #107
🔥🔥[HADACORE] HADACORE: TENSOR CORE ACCELERATED HADAMARD TRANSFORM KERNEL by @DefTruth in #108

Full Changelog: v2.6.8...v2.6.9

Contributors

DefTruth

Assets 2

09 Dec 01:22

DefTruth

v2.6.8

32fdb84

v2.6.8

What's Changed

🔥[ClusterKV] ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression by @DefTruth in #103
🔥[BatchLLM] BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching by @DefTruth in #104

Full Changelog: v2.6.7...v2.6.8

Contributors

DefTruth

Assets 2

02 Dec 05:30

DefTruth

v2.6.7

9f548f6

v2.6.7

What's Changed

🔥[Star-Attention: 11x~ speedup] Star Attention: Efficient LLM Inference over Long Sequences by @DefTruth in #101
🔥[KV Cache Recomputation] Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation by @DefTruth in #102

Full Changelog: v2.6.6...v2.6.7

Contributors

DefTruth

Assets 2

25 Nov 03:22

DefTruth

v2.6.6

40292d7

v2.6.6

What's Changed

Add code link to BPT by @DefTruth in #95
add vAttention code link by @KevinZeng08 in #96
🔥[SageAttention] SAGEATTENTION: ACCURATE 8-BIT ATTENTION FOR PLUG-AND-PLAY INFERENCE ACCELERATION(@thu-ml) by @DefTruth in #97
🔥[SageAttention-2] SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration(@thu-ml) by @DefTruth in #98
🔥[Squeezed Attention] SQUEEZED ATTENTION: Accelerating Long Context Length LLM Inference(@uc Berkeley) by @DefTruth in #99
🔥[SparseInfer] SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference by @DefTruth in #100

New Contributors

@KevinZeng08 made their first contribution in #96

Full Changelog: v2.6.5...v2.6.6

Contributors

uc, thu-ml, and 2 other contributors

Assets 2

18 Nov 02:53

DefTruth

v2.6.5

06c76ad

v2.6.5

What's Changed

Add DP/TP/SP/CP papers with codes by @DefTruth in #92
🔥🔥[SP: BPT] Blockwise Parallel Transformer for Large Context Models by @DefTruth in #93
🔥🔥[TP: Comm Compression] Communication Compression for Tensor Parallel LLM Inference by @DefTruth in #94

Full Changelog: v2.6.4...v2.6.5

Contributors

DefTruth

Assets 2

13 Nov 07:02

DefTruth

v2.6.4

f3f27a7

v2.6.4

What's Changed

🔥[BitNet] BitNet a4.8: 4-bit Activations for 1-bit LLMs by @DefTruth in #91

Full Changelog: v2.6.3...v2.6.4

Contributors

DefTruth

Assets 2

01 Nov 01:18

DefTruth

v2.6.3

a854d6c

v2.6.3

What's Changed

🔥[Fast Best-of-N] Fast Best-of-N Decoding via Speculative Rejection by @DefTruth in #89
🔥[Tensor Product] Acceleration of Tensor-Product Operations with Tensor Cores by @DefTruth in #90

Full Changelog: v2.6.2...v2.6.3

Contributors

DefTruth

Assets 2

28 Oct 02:38

DefTruth

v2.6.2

613300d

v2.6.2

What's Changed

early exit of LLM inference by @boyi-liu in #85
Add paper AdaKV by @FFY0 in #86
Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance by @aharshms in #87
🔥[FastAttention] FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs for Efficient Inference by @DefTruth in #88

New Contributors

@boyi-liu made their first contribution in #85
@FFY0 made their first contribution in #86
@aharshms made their first contribution in #87

Full Changelog: v2.6.1...v2.6.2

Contributors

DefTruth, aharshms, and 2 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: DefTruth/Awesome-LLM-Inference

v2.6.11

What's Changed

New Contributors

Contributors

v2.6.10

What's Changed

Contributors

v2.6.9

What's Changed

Contributors

v2.6.8

What's Changed

Contributors

v2.6.7

What's Changed

Contributors

v2.6.6

What's Changed

New Contributors

Contributors

v2.6.5

What's Changed

Contributors

v2.6.4

What's Changed

Contributors

v2.6.3

What's Changed

Contributors

v2.6.2

What's Changed

New Contributors

Contributors