Prefix-Aware Attention for LLM Decoding
-
Updated
Mar 31, 2026 - Python
Prefix-Aware Attention for LLM Decoding
Production inference for encoder models - ColBERT, GLiNER, ColPali, embeddings etc. - as vLLM plugins for online and in-process deployment
A curated list of plugins built on top of vLLM
FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference
A manager to load vllm plugins without rebuilding image for each new plugin.
vLLM Plugins for additional features like decoding strategies, monitoring, models etc
Add a description, image, and links to the vllm-plugins topic page so that developers can more easily learn about it.
To associate your repository with the vllm-plugins topic, visit your repo's landing page and select "manage topics."