LLM Embedding Benchmark: Technical Evaluation

This repository provides a framework to benchmark the performance and efficiency of various Large Language Model (LLM) embedding models. Using an Apache Spark Analysis Report as a sample technical dataset, this project evaluates both local and API-based models on their retrieval accuracy (MRR, Recall@3) and computational latency.

Benchmark Results

Key Technical Features

Memory-Safe Benchmarking: Implements sequential loading and float16 precision for local models to prevent OutOfMemory errors on low-RAM hardware.
RAG Evaluation Metrics: Uses Mean Reciprocal Rank (MRR) and Recall@K to measure retrieval quality.
Hybrid Support: Evaluates both on-premise (Nomic, Qwen, BGE) and cloud-based (Google GenAI, Cohere) embedding providers.
Synthetic Q&A Generation: Includes logic to generate ground-truth testing pairs directly from technical PDF content.

Analysis & Findings

The benchmark reveals that Nomic-V1 is the current "sweet spot" for technical document retrieval on standard hardware. While Qwen-0.6B offers comparable precision, the 12x higher latency on CPU makes it less suitable for real-time applications without GPU acceleration. Surprisingly, local models outperformed general-purpose APIs in Recall@3, suggesting that technical domains benefit significantly from specialized local embedding architectures.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
benchmark_spark.py		benchmark_spark.py
benchmark_spark_v2.py		benchmark_spark_v2.py
compare_apis.py		compare_apis.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Embedding Benchmark: Technical Evaluation

Benchmark Results

Key Technical Features

Analysis & Findings

About

Uh oh!

Releases

Packages

Languages

aravpanwar/Embedding_Comparision

Folders and files

Latest commit

History

Repository files navigation

LLM Embedding Benchmark: Technical Evaluation

Benchmark Results

Key Technical Features

Analysis & Findings

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages