Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
DefTruth authored Nov 29, 2023
1 parent 05c51e1 commit 087ef07
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ Awesome-LLM-Inference: A small collection for Awesome LLM Inference **[Papers|Bl
|2018.03|[Tensor Core] NVIDIA Tensor Core Programmability, Performance & Precision |[[arxiv][pdf]](https://arxiv.org/pdf/1803.04014.pdf)|⚠️|⭐️⭐️⭐️ |
|2018.05|[Online Softmax] Online normalizer calculation for softmax |[[arxiv][pdf]](https://arxiv.org/pdf/1805.02867.pdf)|⚠️|⭐️⭐️⭐️ |
|2020.05|🔥🔥[**Megatron-LM**] Training Multi-Billion Parameter Language Models Using Model Parallelism|[[arxiv][pdf]](https://arxiv.org/pdf/1909.08053.pdf)|[[GitHub][Megatron-LM]](https://github.com/NVIDIA/Megatron-LM) ![](https://img.shields.io/github/stars/NVIDIA/Megatron-LM.svg?style=social)|⭐️⭐️⭐️⭐️⭐️ |
|2021.04|[RoPE] ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING |[[arxiv][pdf]](https://arxiv.org/pdf/2104.09864.pdf)|[[GitHub][transformers]](https://huggingface.co/docs/transformers/model_doc/roformer) ![](https://img.shields.io/github/stars/huggingface/transformers.svg?style=social)|⭐️⭐️⭐️⭐️⭐️ |
|2021.04|[RoPE] ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING |[[arxiv][pdf]](https://arxiv.org/pdf/2104.09864.pdf)|[[GitHub][transformers]](https://huggingface.co/docs/transformers/model_doc/roformer) ![](https://img.shields.io/github/stars/huggingface/transformers.svg?style=social)|⭐️⭐️⭐️ |
|2022.05|🔥🔥[**FlashAttention**] Fast and Memory-Efficient Exact Attention with IO-Awareness |[[arxiv][pdf]](https://arxiv.org/pdf/2205.14135.pdf)|[[GitHub][flash-attention]](https://github.com/Dao-AILab/flash-attention) ![](https://img.shields.io/github/stars/Dao-AILab/flash-attention.svg?style=social)|⭐️⭐️⭐️⭐️⭐️ |
|2022.06|🔥🔥[**ZeroQuant**] Efficient and Affordable Post-Training Quantization for Large-Scale Transformers |[[arxiv][pdf]](https://arxiv.org/pdf/2206.01861.pdf)|[[GitHub][DeepSpeed]](https://github.com/microsoft/DeepSpeed) ![](https://img.shields.io/github/stars/microsoft/DeepSpeed.svg?style=social)|⭐️⭐️⭐️ |
|2022.06|🔥🔥[**ZeroQuant**] Efficient and Affordable Post-Training Quantization for Large-Scale Transformers |[[arxiv][pdf]](https://arxiv.org/pdf/2206.01861.pdf)|[[GitHub][DeepSpeed]](https://github.com/microsoft/DeepSpeed) ![](https://img.shields.io/github/stars/microsoft/DeepSpeed.svg?style=social)|⭐️⭐️⭐️⭐️⭐️ |
|2022.07|🔥🔥[**Continuous Batching**] Orca: A Distributed Serving System for Transformer-Based Generative Models |[[osdi22-yu][pdf]](https://www.usenix.org/system/files/osdi22-yu.pdf)|⚠️|⭐️⭐️⭐️⭐️⭐️ |
|2022.08|[FP8-Quantization] FP8 Quantization: The Power of the Exponent | [[arxiv][pdf]](https://arxiv.org/pdf/2208.09225.pdf) | ⚠️ |⭐️⭐️⭐️ |
|2022.08|[LLM.int8()] 8-bit Matrix Multiplication for Transformers at Scale |[[arxiv][pdf]](https://arxiv.org/pdf/2208.07339.pdf)|[[GitHub][bitsandbytes]](https://github.com/timdettmers/bitsandbytes) ![](https://img.shields.io/github/stars/timdettmers/bitsandbytes.svg?style=social)|⭐️⭐️⭐️ |
Expand Down

0 comments on commit 087ef07

Please sign in to comment.