Vector RAG vs Graph RAG: Quality Comparison

A comprehensive comparative study evaluating the performance of Vector-based RAG versus Graph-based RAG systems for Enterprise Knowledge Retrieval.

Vector DB vs Graph DB Visualization

Standard Vector RAG (Simple/Custom)

GraphRAG Custom Mode

GraphRAG Open Mode

Objective

Implement and benchmark the quality and efficiency of four RAG implementations on corporate documents (Collective Bargaining Agreements, Company Regulations, and Ethical Codes):

Simple Standard RAG - Basic chunking and standard retrieval with file routing
Custom Standard RAG - Optimized retrieval with intelligent title-based chunking and retrieval
GraphRAG Custom (Strict Mode) - Hand-crafted graph schema with hybrid Vector/Cypher retrieval
GraphRAG Open (Automatic Mode) - Fully automated graph extraction and hybrid retrieval

📊 Key Results

After evaluating 15 questions across all four systems using LLM-as-a-judge and RAGAS metrics:

Win Distribution

Custom Standard RAG           8 wins ( 53.3%) ██████████████████████████
GraphRAG Strict Mode          5 wins ( 33.3%) ████████████████
Simple Standard RAG           2 wins ( 13.3%) ██████
GraphRAG Open Mode            0 wins (  0.0%) 
Tie                           0 wins (  0.0%)

Token Efficiency (Average Input Tokens)

Custom Standard RAG            794.1 tokens/query
Simple Standard RAG           1494.4 tokens/query  (1.9x more)
GraphRAG Open Mode            7208.5 tokens/query  (9.1x more)
GraphRAG Strict Mode          7848.7 tokens/query  (9.9x more)

RAGAS Metrics

Pipeline	Faithfulness	Answer Relevance	Context Relevance
Simple Standard RAG	0.585	0.670	0.500
Custom Standard RAG	0.843	0.699	0.827
GraphRAG Open Mode	0.718	0.625	0.731
GraphRAG Strict Mode	0.924	0.701	0.865

Metric Definitions (Range: [0-1], Higher is Better):

Faithfulness: Measures if the answer is derived only from the retrieved context (hallucination check).

High: Factually accurate to source. Low: Hallucinated content.

Answer Relevance: Measures how pertinent the answer is to the user's question.

High: Directly addresses the query. Low: Vague or off-topic.

Context Relevance: Measures if the retrieved context contains only the necessary information (signal-to-noise ratio).

High: Precise retrieval. Low: Too much noise or missing info.

🔍 Analysis & Conclusions

Standard RAG: Optimization Matters

The comparison between Simple and Custom Standard RAG reveals the massive impact of optimization:

Custom Standard RAG is the most efficient (794 tokens) and achieved the highest win rate (53.3%). It uses optimized retrieval (routed hybrid search) and precise chunking based on extracted titles.
Simple Standard RAG performed significantly worse (13.3% wins) and used ~2x more tokens (1494) than the Custom version. Its low Context Relevance (0.500) suggests currently retrieved contexts are too broad or irrelevant, confusing the LLM.

Custom RAG vs GraphRAG Strict: Efficiency vs Perfection

GraphRAG Strict achieves the absolute highest quality scores (Faithfulness > 0.92) but at a 10x higher token cost.
Custom Standard RAG remains the best balanced choice for production, offering excellent quality (Faithfulness 0.84) with minimal resource usage.

Final Verdict

For enterprise knowledge retrieval:

Best Quality: GraphRAG Custom Mode (custom schema + hybrid retrieval). Use it when accuracy is key and cost/latency are secondary.
Best Efficiency: Standard Vector RAG (excellent quality-to-cost ratio). Production default, best balance of cost, speed, and accuracy.
Best for Prototyping: GraphRAG Open Mode (quick setup, acceptable quality). Use it for rapid prototyping or time-constrained proof-of-concepts.

📂 Project Structure

compare_rag/
├── std_rag/                        # Standard Vector RAG implementation
│   ├── rag.py                      # Main RAG pipeline
│   ├── retrieve.py                 # Intelligent retrieval with file routing
│   ├── paragraph_injection.py      # Vector DB injection with metadata
│   └── README.md                   # Detailed implementation docs
│
├── graph_rag/                      # Graph-based RAG implementation
│   ├── ingest.py                   # Entry point for graph ingestion
│   ├── main.py                     # Entry point for querying
│   ├── src/
│   │   ├── ingestion/              # PDF loading, chunking, extraction
│   │   ├── retrieval/              # Similarity + Cypher retrievers
│   │   ├── graph/                  # Neo4j client wrapper
│   │   └── config/                 # Settings and credentials
│   ├── prompts/                    # Extraction and retrieval prompts
│   └── README.md                   # Detailed implementation docs
│
└── test_rag/                       # Evaluation framework
    ├── compare.py                  # Query all 3 RAG systems
    ├── judge.py                    # LLM-as-a-judge evaluation
    ├── prepare_ragas_data.py       # Convert to RAGAS format
    ├── run_ragas_eval.py           # RAGAS metrics evaluation
    ├── view_results.py             # Results visualization
    ├── questions.json              # Test questions
    ├── QA.json                     # Responses from all systems
    ├── evaluation_results.json     # Judge scores + reasoning
    └── ragas_results.json          # RAGAS metrics

🛠️ Implementation Details

1. Simple Standard RAG (Baseline)

A baseline vector-based system representing a "vanilla" RAG implementation.

Key Features:

Naive Chunking: Fixed-size token chunking with overlap
Simple Retrieval: Standard cosine similarity search with file routing

Performance:

1494 tokens/query
2 wins in quality evaluation
Faithfulness: 0.585 (Lowest among custom implementations)

2. Custom Standard RAG (`std_rag/`)

A sophisticated vector-based retrieval system using Milvus DB and LangChain.

Key Features:

Intelligent File Routing: LLM-powered document detection to route queries to relevant files
Paragraph-Level Understanding: Extracts and matches paragraph titles for precise retrieval
Multi-Level Search:
- Standard semantic search across all documents
- Complete search with file routing + title matching

Technology Stack:

Vector DB: Milvus Lite (local)
Embeddings: paraphrase-multilingual-mpnet-base-v2
LLM: Azure OpenAI
Chunking: Token-based with overlap

Performance:

794 tokens/query (most efficient)
6 wins in quality evaluation
Faithfulness: 0.843

See std_rag/README.md for detailed implementation.

3. GraphRAG Custom - Strict Mode (`graph_rag/` with `open_mode=False`)

A custom-designed knowledge graph with hand-crafted schema and hybrid retrieval strategy.

Architecture:

Predefined Schema:
- Nodes: Articolo (Article), Diritto (Right), Dovere (Duty), Argomento (Topic)
- Relationships: MENZIONA_ARTICOLO, DEFINISCE_DIRITTO, DEFINISCE_DOVERE, HA_ARGOMENTO
Hybrid Retrieval:
1. Vector Similarity: Find relevant chunks via embeddings
2. Graph Traversal: Expand context by following relationships (sequential chunks, related topics)
3. Text-to-Cypher: Convert questions to Cypher queries for structural questions (based on few-shot examples)
Parallel Processing: Similarity and Cypher retrievers run concurrently

Technology Stack:

Graph DB: Neo4j (local or Aura)
Embeddings: paraphrase-multilingual-mpnet-base-v2
LLM: Azure OpenAI
Entity Extraction: LLMGraphTransformer with custom prompts

Performance:

7848 tokens/query (10x more than Standard RAG)
6 wins in quality evaluation
Highest RAGAS scores: Faithfulness 0.924, Context Relevance 0.865

See graph_rag/README.md for detailed implementation.

4. GraphRAG Open Mode (`graph_rag/` with `open_mode=True`)

A fully automatic graph construction approach that requires no schema design.

How It Works:

Automatic Entity Extraction: LLM extracts any entities from text
Generic Relationships: All entities connected via HAS_ENTITY relationship
No Schema Constraints: Adapts to any document type
Same Retrieval Strategy: Uses vector similarity + graph traversal and text2Cypher (without few-shots)

Advantages:

✅ Implementation speed: Can be set up in hours
✅ No domain expertise required: No need to design schema
✅ Domain agnostic: Works on any document type

Disadvantages:

❌ Lower quality: 0 wins, lowest RAGAS scores
❌ High token cost: 7208 tokens/query without quality improvement
❌ Generic structure: Misses domain-specific relationships

Performance:

7208 tokens/query
0 wins in quality evaluation
Faithfulness: 0.718

Evaluation Framework (`test_rag/`)

A comprehensive testing pipeline combining LLM-as-a-judge and RAGAS metrics.

Evaluation Workflow

# Step 1: Query all three RAG systems
python test_rag/compare.py                  # Generates: QA.json (questions + answers from all 3 systems)

# Step 2: LLM-as-a-judge evaluation
python test_rag/judge.py                    # Generates: evaluation_results.json (winner + reasoning for each question)

# Step 3: Prepare RAGAS dataset
python test_rag/prepare_ragas_data.py       # Generates: ragas_dataset.hf (HuggingFace dataset format)

# Step 4: Run RAGAS evaluation
python test_rag/run_ragas_eval.py           # Generates: ragas_results.json, ragas_results.csv

# Step 5: View comprehensive results
python test_rag/view_results.py

Metrics Used

1. LLM-as-a-Judge (judge.py)

Uses Azure OpenAI to compare answers side-by-side
Evaluates: accuracy, completeness, relevance, clarity
Outputs: winner (A/B/C/Tie) + reasoning

2. RAGAS Framework (run_ragas_eval.py)

Faithfulness: Are answers grounded in retrieved context?
Answer Relevance: Does the answer address the question?
Context Relevance: Is retrieved context relevant to the question?

Getting Started

Prerequisites

Python 3.10+
Neo4j Database (for GraphRAG)
Azure OpenAI API Key

Configure Environment

Create .env files in std_rag/ and graph_rag/ with your credentials:

# Azure OpenAI Configuration
AZURE_ENDPOINT=
AZURE_DEPLOYMENT=
AZURE_API_VERSION=
AZURE_API_KEY=

# Neo4j Configuration (for GraphRAG only)
NEO4J_URI=
NEO4J_USERNAME=
NEO4J_PASSWORD=

📈 Key Takeaways

There is no one-size-fits-all solution: The "best" RAG system depends on your constraints (quality vs cost vs development time).
Custom implementations win on quality: GraphRAG Custom's hand-crafted schema delivers the highest metrics, but requires domain expertise and 10x more tokens.
Vector RAG is surprisingly competitive: Custom Vector RAG achieves nearly equivalent quality with 10x better efficiency, making it the best value proposition.
Automatic approaches sacrifice quality: GraphRAG Open is fast to build but doesn't deliver production-ready quality.
Hybrid strategies matter: GraphRAG Strict's combination of vector search, graph traversal, and Cypher queries provides the most comprehensive retrieval.
Evaluation is critical: In ground truth-free scenarios, using both LLM-as-a-judge and RAGAS provides complementary insights into system performance.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
graph_rag		graph_rag
std_rag		std_rag
test_rag		test_rag
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vector RAG vs Graph RAG: Quality Comparison

Vector DB vs Graph DB Visualization

Objective

📊 Key Results

Win Distribution

Token Efficiency (Average Input Tokens)

RAGAS Metrics

🔍 Analysis & Conclusions

Standard RAG: Optimization Matters

Custom RAG vs GraphRAG Strict: Efficiency vs Perfection

Final Verdict

📂 Project Structure

🛠️ Implementation Details

1. Simple Standard RAG (Baseline)

2. Custom Standard RAG (`std_rag/`)

3. GraphRAG Custom - Strict Mode (`graph_rag/` with `open_mode=False`)

4. GraphRAG Open Mode (`graph_rag/` with `open_mode=True`)

Evaluation Framework (`test_rag/`)

Evaluation Workflow

Metrics Used

Getting Started

Prerequisites

Configure Environment

📈 Key Takeaways

Check the Implementation

About

Uh oh!

Releases

Packages

Languages

License

francescobrigante/VectorRAG-vs-GraphRAG

Folders and files

Latest commit

History

Repository files navigation

Vector RAG vs Graph RAG: Quality Comparison

Vector DB vs Graph DB Visualization

Objective

📊 Key Results

Win Distribution

Token Efficiency (Average Input Tokens)

RAGAS Metrics

🔍 Analysis & Conclusions

Standard RAG: Optimization Matters

Custom RAG vs GraphRAG Strict: Efficiency vs Perfection

Final Verdict

📂 Project Structure

🛠️ Implementation Details

1. Simple Standard RAG (Baseline)

2. Custom Standard RAG (std_rag/)

3. GraphRAG Custom - Strict Mode (graph_rag/ with open_mode=False)

4. GraphRAG Open Mode (graph_rag/ with open_mode=True)

Evaluation Framework (test_rag/)

Evaluation Workflow

Metrics Used

Getting Started

Prerequisites

Configure Environment

📈 Key Takeaways

Check the Implementation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. Custom Standard RAG (`std_rag/`)

3. GraphRAG Custom - Strict Mode (`graph_rag/` with `open_mode=False`)

4. GraphRAG Open Mode (`graph_rag/` with `open_mode=True`)

Evaluation Framework (`test_rag/`)

Packages