You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`llama-cpp-python` provides a high-performance, memory-efficient specialized class `LlamaEmbedding` for generating text embeddings and calculating reranking scores.
738
738
739
-
**Key Features:**
739
+
### Key Features:
740
740
***Streaming Batch Processing:** Process massive datasets (e.g., Hundreds of documents) without running out of memory (OOM).
741
741
***Native Reranking:** Built-in support for Cross-Encoder models (outputting relevance scores instead of vectors).
742
742
***Optimized Performance:** Utilizes Unified KV Cache for parallel encoding of multiple documents.
The standard `Llama` class still supports basic embedding generation, but it lacks the memory optimizations and reranking capabilities of `LlamaEmbedding`.
0 commit comments