An opinionated buyer's guide for text embeddings in production — RAG, search, classification.
Text embeddings convert text into dense vectors for semantic search, retrieval, clustering, and classification. This list helps you choose the right embedding model for your use case.
Last reviewed: January 2025 · Suggest an update
Just want a recommendation? Start here:
| Use Case | Model | Why |
|---|---|---|
| Best overall (API) | text-embedding-3-large | Highest quality, 8k context, adjustable dims |
| Best overall (open) | NV-Embed-v2 | MTEB #1, 32k context |
| Best budget | text-embedding-3-small | $0.02/1M tokens, still good quality |
| Best local/private | nomic-embed-text-v2-moe | MoE architecture, multilingual, GGUF available |
| Best multilingual | multilingual-e5-large | 100+ languages, MIT license |
| Best for code | voyage-code-2 | Purpose-built, 16k context |
⚠️ = Non-commercial license. Check before using in production.
- Quick Picks
- How to Choose
- Common Gotchas
- General Purpose
- Specialized
- Rerankers
- Horizon
- Benchmarks & Leaderboards
- Tools & Evaluation
- Resources
- Related Lists
| Question | Recommendation |
|---|---|
| Need best quality, don't mind API costs? | OpenAI text-embedding-3-large or Cohere embed-v3 |
| Want open source, good quality? | gte-large-en-v1.5 or bge-large-en-v1.5 |
| Need multilingual? | multilingual-e5-large or Cohere embed-multilingual-v3 |
| Working with code? | voyage-code-2 |
| Have very long documents? | jina-embeddings-v2-base-en (8k) or NV-Embed-v2 (32k) |
| Running locally/edge? | nomic-embed-text-v2-moe or v1.5 (GGUF available) |
| Need on-prem / data privacy? | Open source models only — see Open Source section |
Key tradeoffs:
- Dimensions: Higher = more expressive but more storage/compute. 768-1024 is the sweet spot for most use cases.
- Context length: Most models max at 512 tokens; some go to 8k+. Longer = fewer chunks needed.
- Open vs API: Open = privacy, cost control, on-prem; API = simplicity, no infrastructure.
- Quality vs speed: Larger models score higher on benchmarks but have higher latency.
Things that bite engineers in production:
| Issue | What to watch for |
|---|---|
| Query/passage prefixes | E5 models require query: and passage: prefixes. Without them, quality drops significantly. Check model cards. |
| Normalization | Some models output normalized vectors (use cosine), others don't (use dot product). Mixing these breaks similarity scores. |
| Matryoshka dimensions | Models like OpenAI's and Nomic's support truncating dimensions (e.g., 3072→256). You must re-normalize after truncation. |
| License traps | CC-BY-NC (NV-Embed-v2, SFR-Embedding) = no commercial use. Check before deploying. |
| Context overflow | Tokens beyond max length are silently truncated. For long docs, chunk first or use long-context models. |
| Embedding drift | API providers may update models silently. Pin versions or re-embed periodically if using managed APIs. |
| Model | Provider | Dims | Max Tokens | MTEB Avg | License | Notes |
|---|---|---|---|---|---|---|
| NV-Embed-v2 | NVIDIA | 4096 | 32768 | 72.3 | CC-BY-NC-4.0 | Current MTEB #1, very long context |
| Llama-Embed-Nemotron-8B | NVIDIA | 4096 | 8192 | 69.6 | Llama 3.1 | Open weights, MMTEB leader, multilingual |
| stella-en-1.5B-v5 | NovaSearch | 1024 | 512 | 66.9 | MIT | Strong quality, moderate size |
| gte-large-en-v1.5 | Alibaba | 1024 | 8192 | 65.4 | Apache 2.0 | Long context, top tier |
| mxbai-embed-large-v1 | Mixedbread | 1024 | 512 | 64.7 | Apache 2.0 | Strong MTEB performer |
| snowflake-arctic-embed-l | Snowflake | 1024 | 512 | 64.5 | Apache 2.0 | Strong retrieval |
| bge-large-en-v1.5 | BAAI | 1024 | 512 | 64.2 | MIT | Widely adopted, battle-tested |
| gte-base-en-v1.5 | Alibaba | 768 | 8192 | 64.1 | Apache 2.0 | Smaller + long context |
| SFR-Embedding-2_R | Salesforce | 4096 | 8192 | 67.5 | CC-BY-NC-4.0 | Strong retrieval, long context |
| bge-base-en-v1.5 | BAAI | 768 | 512 | 63.5 | MIT | Good speed/quality balance |
| nomic-embed-text-v2-moe | Nomic | 768 | 8192 | 65.8 | Apache 2.0 | MoE, multilingual, Matryoshka dims |
| nomic-embed-text-v1.5 | Nomic | 768 | 8192 | 62.3 | Apache 2.0 | Lighter option, GGUF for local |
| e5-large-v2 | Microsoft | 1024 | 512 | 62.2 | MIT | Requires "query:" prefix |
| e5-base-v2 | Microsoft | 768 | 512 | 61.5 | MIT | Smaller variant |
| Model | Provider | Dims | Max Tokens | Pricing (per 1M tokens) | Notes |
|---|---|---|---|---|---|
| text-embedding-3-large | OpenAI | 3072 | 8191 | $0.13 | Best quality, adjustable dims (Matryoshka) |
| gemini-embedding-001 | 3072 | 8192 | $0.00 (free tier) | MTEB leader, task-type parameter | |
| voyage-large-2 | Voyage AI | 1536 | 16000 | $0.12 | Longest context |
| embed-english-v3.0 | Cohere | 1024 | 512 | $0.10 | Strong retrieval |
| embed-large-v1 | Mixedbread | 1024 | 512 | $0.05 | Good quality/price |
| embedding-001 | 768 | 2048 | $0.025 | Vertex AI | |
| text-embedding-3-small | OpenAI | 1536 | 8191 | $0.02 | Best budget option |
| jina-embeddings-v2-base-en | Jina AI | 768 | 8192 | $0.02 | Open weights also available |
| Model | Provider | Dims | Languages | Max Tokens | Notes |
|---|---|---|---|---|---|
| bge-m3 | BAAI | 1024 | 100+ | 8192 | Hybrid dense+sparse, long context |
| multilingual-e5-large | Microsoft | 1024 | 100+ | 512 | Best open multilingual |
| EmbeddingGemma-300M | 768 | 100+ | 2048 | Top multilingual under 500M params, Matryoshka dims | |
| multilingual-e5-base | Microsoft | 768 | 100+ | 512 | Smaller variant |
| embed-multilingual-v3.0 | Cohere | 1024 | 100+ | 512 | API, strong quality |
| paraphrase-multilingual-mpnet-base-v2 | SBERT | 768 | 50+ | 512 | Sentence-transformers |
| Model | Provider | Dims | Languages | Notes |
|---|---|---|---|---|
| voyage-code-2 | Voyage AI | 1536 | 20+ | Best code retrieval, 16k context |
| StarEncoder | BigCode | 768 | 80+ | StarCoder-based, open source |
| codebert-base | Microsoft | 768 | 6 | Open source, smaller |
| code-search-ada-002 | OpenAI | 1536 | Multiple | Legacy but still used |
Models supporting 4k+ tokens — useful for embedding full documents without chunking.
| Model | Provider | Dims | Max Tokens | Notes |
|---|---|---|---|---|
| NV-Embed-v2 | NVIDIA | 4096 | 32768 | Longest context (open), MTEB #1 |
| voyage-large-2 | Voyage AI | 1536 | 16000 | Longest context (API) |
| gte-large-en-v1.5 | Alibaba | 1024 | 8192 | Top quality (open) |
| jina-embeddings-v2-base-en | Jina AI | 768 | 8192 | Open + API available |
| nomic-embed-text-v2-moe | Nomic | 768 | 8192 | MoE, multilingual, GGUF available |
| text-embedding-3-large | OpenAI | 3072 | 8191 | Adjustable dimensions |
| bge-m3 | BAAI | 1024 | 8192 | Also multilingual |
| Model | Provider | Domain | Dims | Notes |
|---|---|---|---|---|
| legal-bert-base-uncased | NLP@AUEb | Legal | 768 | Trained on legal corpora |
| PubMedBERT | Microsoft | Biomedical | 768 | PubMed abstracts |
| SciBERT | Allen AI | Scientific | 768 | Scientific papers |
| finbert | FinBERT | Finance | 768 | Financial sentiment |
Rerankers improve retrieval quality by rescoring initial results. Use after embedding-based retrieval.
| Model | Provider | Type | Notes |
|---|---|---|---|
| rerank-english-v3.0 | Cohere | API | Production-ready, easy to integrate |
| rerank-multilingual-v3.0 | Cohere | API | 100+ languages |
| bge-reranker-v2-m3 | BAAI | Open | Multilingual, pairs with BGE embeddings |
| bge-reranker-large | BAAI | Open | English-focused, strong quality |
| ms-marco-MiniLM-L-12-v2 | SBERT | Open | Lightweight, fast |
| jina-reranker-v2-base-multilingual | Jina AI | Open | 100+ languages, 1k context |
| mxbai-rerank-large-v1 | Mixedbread | Open | Strong quality |
When to use a reranker:
- You have more than ~20 candidates from initial retrieval
- Quality matters more than latency
- Your embedding model's ranking isn't precise enough
🔭 Emerging approaches worth watching. These represent paradigm shifts or new capabilities that may reshape best practices.
| Model | What's New | Link |
|---|---|---|
| GritLM | Single model does both text generation AND embeddings. No need for separate models. 7B params, competitive on MTEB while also being a capable LLM. | Paper ・ HuggingFace |
Traditional approach: chunk documents → embed each chunk independently.
Late chunking: embed the full document first (using long-context model), then extract chunk representations that retain document context. Reduces information loss at chunk boundaries.
| Resource | Description | Link |
|---|---|---|
| Jina Late Chunking | Original technique explanation + implementation | Blog |
| Contextual Retrieval | Anthropic's related approach using LLMs to add context | Blog |
Using decoder-only LLMs as embedding models—often by pooling hidden states or clever prompting.
| Approach | What's New | Link |
|---|---|---|
| Echo Embeddings | Repeat input text to simulate bidirectional attention in autoregressive LLMs. Simple trick, strong results. | Paper (ICLR 2025) |
| LLM2Vec | Convert any decoder LLM into an embedding model via bidirectional attention + masked next token prediction. | Paper ・ GitHub |
Embedding models that handle both text and images together—useful for document retrieval with figures, screenshots, slides.
| Model | What's New | Link |
|---|---|---|
| Voyage Multimodal-3 | Interleaved text + images. Strong on PDFs, slides, screenshots. | Docs |
| Jina CLIP v2 | Open source text-image embeddings, 8k text context | HuggingFace |
| Benchmark | What it measures | Best for | Link |
|---|---|---|---|
| MTEB | 8 task types (retrieval, classification, clustering, etc.) across 58 datasets, 112 languages | Overall embedding quality comparison | Leaderboard |
| BEIR | Zero-shot retrieval across 18 diverse datasets | Retrieval-focused evaluation | GitHub |
| MIRACL | Multilingual retrieval across 18 languages | Non-English retrieval | GitHub |
| C-MTEB | Chinese-specific embedding tasks | Chinese language models | Leaderboard |
Note: MTEB scores are useful for comparison but don't always predict real-world performance. Test on your own data with tools like ragtune.
| Tool | Description | Link |
|---|---|---|
| ragtune | CLI for benchmarking RAG retrieval quality. Compare embedding models on your queries and documents. | GitHub |
| RAGatouille | Easy-to-use ColBERT retrieval. Late interaction for better precision than dense embeddings. | GitHub |
| MTEB | Official benchmark toolkit for evaluating embeddings on standard tasks | GitHub |
| sentence-transformers | Framework for using, comparing, and training embeddings | GitHub |
| Embeddings Projector | Visualize high-dimensional embeddings in 2D/3D | TensorFlow |
| Tool | Description | Link |
|---|---|---|
| sentence-transformers | Training custom embedding models with contrastive learning | Docs |
| FlagEmbedding | BAAI's toolkit for fine-tuning BGE models | GitHub |
| uniem | Unified embedding model training framework | GitHub |
| Tool | Description | Link |
|---|---|---|
| FastEmbed | Fast, lightweight embedding inference by Qdrant | GitHub |
| Infinity | High-throughput embedding server, OpenAI-compatible API | GitHub |
| Model2Vec | Distill sentence transformers to static embeddings — 500x faster, 50x smaller | GitHub |
| Ollama | Run embedding models locally (GGUF format) | Ollama |
| llama.cpp | C++ inference for quantized models | GitHub |
| TEI | Hugging Face's Text Embeddings Inference server | GitHub |
Foundational:
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (2019) — Started modern sentence embeddings
- MTEB: Massive Text Embedding Benchmark (2022) — The standard benchmark
- Text and Code Embeddings by Contrastive Pre-Training (2022) — OpenAI's approach
Recent advances:
- Improving Text Embeddings with Large Language Models (2024) — LLM-based embedding training (E5-mistral)
- BGE M3-Embedding (2024) — Multi-lingual, multi-functionality, multi-granularity
- Matryoshka Representation Learning (2022) — Flexible dimension embeddings
- NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models (2024) — NVIDIA's approach
- Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings (2023) — Long-context embeddings
Understanding embeddings:
- Text Embeddings Reveal (Almost) As Much As Text (2023) — Privacy implications
- BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation (2021) — Retrieval benchmark
- Sentence-Transformers Documentation — Comprehensive embedding guide
- Hugging Face NLP Course — Includes embedding fundamentals
- Choosing an Embedding Model — Pinecone's practical guide
- Cohere Embed Guide — Good API-focused tutorial
For adjacent topics, see these curated lists:
- awesome-vector-databases — Vector storage and retrieval
- awesome-rag — Retrieval-augmented generation
- awesome-semantic-search — Semantic search resources
- awesome-local-ai — Local AI inference
See CONTRIBUTING.md for guidelines on adding new models or tools.
To the extent possible under law, the authors have waived all copyright and related rights to this work.
