Skip to content

This repository provides a framework to benchmark the performance and efficiency of various Large Language Model (LLM) embedding models. Using an Apache Spark Analysis Report as a sample technical dataset, this project evaluates both local and API-based models on their retrieval accuracy (MRR, Recall@3) and computational latency.

Notifications You must be signed in to change notification settings

aravpanwar/Embedding_Comparision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Embedding Benchmark: Technical Evaluation

This repository provides a framework to benchmark the performance and efficiency of various Large Language Model (LLM) embedding models. Using an Apache Spark Analysis Report as a sample technical dataset, this project evaluates both local and API-based models on their retrieval accuracy (MRR, Recall@3) and computational latency.

Benchmark Results

image

Key Technical Features

  • Memory-Safe Benchmarking: Implements sequential loading and float16 precision for local models to prevent OutOfMemory errors on low-RAM hardware.
  • RAG Evaluation Metrics: Uses Mean Reciprocal Rank (MRR) and Recall@K to measure retrieval quality.
  • Hybrid Support: Evaluates both on-premise (Nomic, Qwen, BGE) and cloud-based (Google GenAI, Cohere) embedding providers.
  • Synthetic Q&A Generation: Includes logic to generate ground-truth testing pairs directly from technical PDF content.

Analysis & Findings

The benchmark reveals that Nomic-V1 is the current "sweet spot" for technical document retrieval on standard hardware. While Qwen-0.6B offers comparable precision, the 12x higher latency on CPU makes it less suitable for real-time applications without GPU acceleration. Surprisingly, local models outperformed general-purpose APIs in Recall@3, suggesting that technical domains benefit significantly from specialized local embedding architectures.

About

This repository provides a framework to benchmark the performance and efficiency of various Large Language Model (LLM) embedding models. Using an Apache Spark Analysis Report as a sample technical dataset, this project evaluates both local and API-based models on their retrieval accuracy (MRR, Recall@3) and computational latency.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages