Skip to content

Comparative study evaluating performances of Milvus Vector-based RAG vs Neo4j Graph-based RAG systems for Enterprise Knowledge Retrieval

License

Notifications You must be signed in to change notification settings

francescobrigante/VectorRAG-vs-GraphRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Vector RAG vs Graph RAG: Quality Comparison

A comprehensive comparative study evaluating the performance of Vector-based RAG versus Graph-based RAG systems for Enterprise Knowledge Retrieval.

Vector DB vs Graph DB Visualization

Standard Vector RAG (Simple/Custom)
GraphRAG Custom Mode
GraphRAG Open Mode

Objective

Implement and benchmark the quality and efficiency of four RAG implementations on corporate documents (Collective Bargaining Agreements, Company Regulations, and Ethical Codes):

  1. Simple Standard RAG - Basic chunking and standard retrieval with file routing
  2. Custom Standard RAG - Optimized retrieval with intelligent title-based chunking and retrieval
  3. GraphRAG Custom (Strict Mode) - Hand-crafted graph schema with hybrid Vector/Cypher retrieval
  4. GraphRAG Open (Automatic Mode) - Fully automated graph extraction and hybrid retrieval

πŸ“Š Key Results

After evaluating 15 questions across all four systems using LLM-as-a-judge and RAGAS metrics:

Win Distribution

Custom Standard RAG           8 wins ( 53.3%) β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
GraphRAG Strict Mode          5 wins ( 33.3%) β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
Simple Standard RAG           2 wins ( 13.3%) β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
GraphRAG Open Mode            0 wins (  0.0%) 
Tie                           0 wins (  0.0%) 

Token Efficiency (Average Input Tokens)

Custom Standard RAG            794.1 tokens/query
Simple Standard RAG           1494.4 tokens/query  (1.9x more)
GraphRAG Open Mode            7208.5 tokens/query  (9.1x more)
GraphRAG Strict Mode          7848.7 tokens/query  (9.9x more)

RAGAS Metrics

Pipeline Faithfulness Answer Relevance Context Relevance
Simple Standard RAG 0.585 0.670 0.500
Custom Standard RAG 0.843 0.699 0.827
GraphRAG Open Mode 0.718 0.625 0.731
GraphRAG Strict Mode 0.924 0.701 0.865

Metric Definitions (Range: [0-1], Higher is Better):

  • Faithfulness: Measures if the answer is derived only from the retrieved context (hallucination check).
    • High: Factually accurate to source. Low: Hallucinated content.
  • Answer Relevance: Measures how pertinent the answer is to the user's question.
    • High: Directly addresses the query. Low: Vague or off-topic.
  • Context Relevance: Measures if the retrieved context contains only the necessary information (signal-to-noise ratio).
    • High: Precise retrieval. Low: Too much noise or missing info.

πŸ” Analysis & Conclusions

Standard RAG: Optimization Matters

The comparison between Simple and Custom Standard RAG reveals the massive impact of optimization:

  • Custom Standard RAG is the most efficient (794 tokens) and achieved the highest win rate (53.3%). It uses optimized retrieval (routed hybrid search) and precise chunking based on extracted titles.
  • Simple Standard RAG performed significantly worse (13.3% wins) and used ~2x more tokens (1494) than the Custom version. Its low Context Relevance (0.500) suggests currently retrieved contexts are too broad or irrelevant, confusing the LLM.

Custom RAG vs GraphRAG Strict: Efficiency vs Perfection

  • GraphRAG Strict achieves the absolute highest quality scores (Faithfulness > 0.92) but at a 10x higher token cost.
  • Custom Standard RAG remains the best balanced choice for production, offering excellent quality (Faithfulness 0.84) with minimal resource usage.

Final Verdict

For enterprise knowledge retrieval:

  1. Best Quality: GraphRAG Custom Mode (custom schema + hybrid retrieval). Use it when accuracy is key and cost/latency are secondary.
  2. Best Efficiency: Standard Vector RAG (excellent quality-to-cost ratio). Production default, best balance of cost, speed, and accuracy.
  3. Best for Prototyping: GraphRAG Open Mode (quick setup, acceptable quality). Use it for rapid prototyping or time-constrained proof-of-concepts.

πŸ“‚ Project Structure

compare_rag/
β”œβ”€β”€ std_rag/                        # Standard Vector RAG implementation
β”‚   β”œβ”€β”€ rag.py                      # Main RAG pipeline
β”‚   β”œβ”€β”€ retrieve.py                 # Intelligent retrieval with file routing
β”‚   β”œβ”€β”€ paragraph_injection.py      # Vector DB injection with metadata
β”‚   └── README.md                   # Detailed implementation docs
β”‚
β”œβ”€β”€ graph_rag/                      # Graph-based RAG implementation
β”‚   β”œβ”€β”€ ingest.py                   # Entry point for graph ingestion
β”‚   β”œβ”€β”€ main.py                     # Entry point for querying
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ ingestion/              # PDF loading, chunking, extraction
β”‚   β”‚   β”œβ”€β”€ retrieval/              # Similarity + Cypher retrievers
β”‚   β”‚   β”œβ”€β”€ graph/                  # Neo4j client wrapper
β”‚   β”‚   └── config/                 # Settings and credentials
β”‚   β”œβ”€β”€ prompts/                    # Extraction and retrieval prompts
β”‚   └── README.md                   # Detailed implementation docs
β”‚
└── test_rag/                       # Evaluation framework
    β”œβ”€β”€ compare.py                  # Query all 3 RAG systems
    β”œβ”€β”€ judge.py                    # LLM-as-a-judge evaluation
    β”œβ”€β”€ prepare_ragas_data.py       # Convert to RAGAS format
    β”œβ”€β”€ run_ragas_eval.py           # RAGAS metrics evaluation
    β”œβ”€β”€ view_results.py             # Results visualization
    β”œβ”€β”€ questions.json              # Test questions
    β”œβ”€β”€ QA.json                     # Responses from all systems
    β”œβ”€β”€ evaluation_results.json     # Judge scores + reasoning
    └── ragas_results.json          # RAGAS metrics

πŸ› οΈ Implementation Details

1. Simple Standard RAG (Baseline)

A baseline vector-based system representing a "vanilla" RAG implementation.

Key Features:

  • Naive Chunking: Fixed-size token chunking with overlap
  • Simple Retrieval: Standard cosine similarity search with file routing

Performance:

  • 1494 tokens/query
  • 2 wins in quality evaluation
  • Faithfulness: 0.585 (Lowest among custom implementations)

2. Custom Standard RAG (std_rag/)

A sophisticated vector-based retrieval system using Milvus DB and LangChain.

Key Features:

  • Intelligent File Routing: LLM-powered document detection to route queries to relevant files
  • Paragraph-Level Understanding: Extracts and matches paragraph titles for precise retrieval
  • Multi-Level Search:
    • Standard semantic search across all documents
    • Complete search with file routing + title matching

Technology Stack:

  • Vector DB: Milvus Lite (local)
  • Embeddings: paraphrase-multilingual-mpnet-base-v2
  • LLM: Azure OpenAI
  • Chunking: Token-based with overlap

Performance:

  • 794 tokens/query (most efficient)
  • 6 wins in quality evaluation
  • Faithfulness: 0.843

See std_rag/README.md for detailed implementation.


3. GraphRAG Custom - Strict Mode (graph_rag/ with open_mode=False)

A custom-designed knowledge graph with hand-crafted schema and hybrid retrieval strategy.

Architecture:

  • Predefined Schema:
    • Nodes: Articolo (Article), Diritto (Right), Dovere (Duty), Argomento (Topic)
    • Relationships: MENZIONA_ARTICOLO, DEFINISCE_DIRITTO, DEFINISCE_DOVERE, HA_ARGOMENTO
  • Hybrid Retrieval:
    1. Vector Similarity: Find relevant chunks via embeddings
    2. Graph Traversal: Expand context by following relationships (sequential chunks, related topics)
    3. Text-to-Cypher: Convert questions to Cypher queries for structural questions (based on few-shot examples)
  • Parallel Processing: Similarity and Cypher retrievers run concurrently

Technology Stack:

  • Graph DB: Neo4j (local or Aura)
  • Embeddings: paraphrase-multilingual-mpnet-base-v2
  • LLM: Azure OpenAI
  • Entity Extraction: LLMGraphTransformer with custom prompts

Performance:

  • 7848 tokens/query (10x more than Standard RAG)
  • 6 wins in quality evaluation
  • Highest RAGAS scores: Faithfulness 0.924, Context Relevance 0.865

See graph_rag/README.md for detailed implementation.


4. GraphRAG Open Mode (graph_rag/ with open_mode=True)

A fully automatic graph construction approach that requires no schema design.

How It Works:

  • Automatic Entity Extraction: LLM extracts any entities from text
  • Generic Relationships: All entities connected via HAS_ENTITY relationship
  • No Schema Constraints: Adapts to any document type
  • Same Retrieval Strategy: Uses vector similarity + graph traversal and text2Cypher (without few-shots)

Advantages:

  • βœ… Implementation speed: Can be set up in hours
  • βœ… No domain expertise required: No need to design schema
  • βœ… Domain agnostic: Works on any document type

Disadvantages:

  • ❌ Lower quality: 0 wins, lowest RAGAS scores
  • ❌ High token cost: 7208 tokens/query without quality improvement
  • ❌ Generic structure: Misses domain-specific relationships

Performance:

  • 7208 tokens/query
  • 0 wins in quality evaluation
  • Faithfulness: 0.718

Evaluation Framework (test_rag/)

A comprehensive testing pipeline combining LLM-as-a-judge and RAGAS metrics.

Evaluation Workflow

# Step 1: Query all three RAG systems
python test_rag/compare.py                  # Generates: QA.json (questions + answers from all 3 systems)

# Step 2: LLM-as-a-judge evaluation
python test_rag/judge.py                    # Generates: evaluation_results.json (winner + reasoning for each question)

# Step 3: Prepare RAGAS dataset
python test_rag/prepare_ragas_data.py       # Generates: ragas_dataset.hf (HuggingFace dataset format)

# Step 4: Run RAGAS evaluation
python test_rag/run_ragas_eval.py           # Generates: ragas_results.json, ragas_results.csv

# Step 5: View comprehensive results
python test_rag/view_results.py

Metrics Used

1. LLM-as-a-Judge (judge.py)

  • Uses Azure OpenAI to compare answers side-by-side
  • Evaluates: accuracy, completeness, relevance, clarity
  • Outputs: winner (A/B/C/Tie) + reasoning

2. RAGAS Framework (run_ragas_eval.py)

  • Faithfulness: Are answers grounded in retrieved context?
  • Answer Relevance: Does the answer address the question?
  • Context Relevance: Is retrieved context relevant to the question?

Getting Started

Prerequisites

  • Python 3.10+
  • Neo4j Database (for GraphRAG)
  • Azure OpenAI API Key

Configure Environment

Create .env files in std_rag/ and graph_rag/ with your credentials:

# Azure OpenAI Configuration
AZURE_ENDPOINT=
AZURE_DEPLOYMENT=
AZURE_API_VERSION=
AZURE_API_KEY=

# Neo4j Configuration (for GraphRAG only)
NEO4J_URI=
NEO4J_USERNAME=
NEO4J_PASSWORD=

πŸ“ˆ Key Takeaways

  1. There is no one-size-fits-all solution: The "best" RAG system depends on your constraints (quality vs cost vs development time).

  2. Custom implementations win on quality: GraphRAG Custom's hand-crafted schema delivers the highest metrics, but requires domain expertise and 10x more tokens.

  3. Vector RAG is surprisingly competitive: Custom Vector RAG achieves nearly equivalent quality with 10x better efficiency, making it the best value proposition.

  4. Automatic approaches sacrifice quality: GraphRAG Open is fast to build but doesn't deliver production-ready quality.

  5. Hybrid strategies matter: GraphRAG Strict's combination of vector search, graph traversal, and Cypher queries provides the most comprehensive retrieval.

  6. Evaluation is critical: In ground truth-free scenarios, using both LLM-as-a-judge and RAGAS provides complementary insights into system performance.


Check the Implementation

About

Comparative study evaluating performances of Milvus Vector-based RAG vs Neo4j Graph-based RAG systems for Enterprise Knowledge Retrieval

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published