Skip to content

Commit b6e0de0

Browse files
committed
Update README.md
1 parent fcdf9ac commit b6e0de0

File tree

1 file changed

+13
-5
lines changed

1 file changed

+13
-5
lines changed

README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -736,14 +736,22 @@ print(res["choices"][0]["message"]["content"])
736736

737737
`llama-cpp-python` provides a high-performance, memory-efficient specialized class `LlamaEmbedding` for generating text embeddings and calculating reranking scores.
738738

739-
**Key Features:**
739+
### Key Features:
740740
* **Streaming Batch Processing:** Process massive datasets (e.g., Hundreds of documents) without running out of memory (OOM).
741741
* **Native Reranking:** Built-in support for Cross-Encoder models (outputting relevance scores instead of vectors).
742742
* **Optimized Performance:** Utilizes Unified KV Cache for parallel encoding of multiple documents.
743743

744+
### Support Embeddings & Rerank Model:
745+
746+
747+
| Model | Type | HF_Link | Status |
748+
|--------------------|-----------|-----------------------------------------------|--------------|
749+
| `bge-m3` | Embedding |https://huggingface.co/BAAI/bge-m3 | Useful ✅ |
750+
|`bge-reranker-v2-m3`| Rerank |https://huggingface.co/BAAI/bge-reranker-v2-m3 | Useful ✅ |
751+
744752
### TODO(JamePeng): Needs more extensive testing with various embedding and rerank models. :)
745753

746-
#### 1. Text Embeddings (Vector Search)
754+
### 1. Text Embeddings (Vector Search)
747755

748756
To generate embeddings, use the `LlamaEmbedding` class. It automatically configures the model for vector generation.
749757

@@ -781,7 +789,7 @@ response = llm.create_embedding(
781789
print(response["cosineSimilarity"])
782790
```
783791

784-
#### 2. Reranking (Cross-Encoder Scoring)
792+
### 2. Reranking (Cross-Encoder Scoring)
785793

786794
Reranking models (like `bge-reranker`) take a **Query** and a list of **Documents** as input and output a relevance score (scalar) for each document.
787795

@@ -814,7 +822,7 @@ print(scores)
814822
# e.g., [-0.15, -8.23, 5.67] -> The 3rd doc is the best match
815823
```
816824

817-
#### 3. Normalization
825+
### 3. Normalization
818826

819827
The `embed` method supports various mathematical normalization strategies via the `normalize` parameter.
820828

@@ -844,7 +852,7 @@ vec_int16 = llm.embed("text", normalize=NORM_MODE_MAX_INT16, n_gpu_layers=-1)
844852
embeddings_raw = llm.embed(["search query", "document text"], normalize=NORM_MODE_NONE, n_gpu_layers=-1)
845853
```
846854

847-
#### Legacy Usage (Deprecated)
855+
### Legacy Usage (Deprecated)
848856

849857
The standard `Llama` class still supports basic embedding generation, but it lacks the memory optimizations and reranking capabilities of `LlamaEmbedding`.
850858

0 commit comments

Comments
 (0)