Recently, Retrieval-Augmented Generation (RAG) has achieved remarkable success in addressing the challenges of Large Language Models (LLMs) without necessitating retraining. By referencing an external knowledge base, RAG refines LLM outputs, effectively mitigating issues such as ``hallucination'', lack of domain-specific knowledge, and outdated information. However, the complex structure of relationships among different entities in databases presents challenges for RAG systems. In response, GraphRAG leverages structural information across entities to enable more precise and comprehensive retrieval, capturing relational knowledge and facilitating more accurate, context-aware responses. Given the novelty and potential of GraphRAG, a systematic review of current technologies is imperative. This paper provides the first comprehensive overview of GraphRAG methodologies. We formalize the GraphRAG workflow, encompassing Graph-Based Indexing, Graph-Guided Retrieval, and Graph-Enhanced Generation. We then outline the core technologies and training methods at each stage. Additionally, we examine downstream tasks, application domains, evaluation methodologies, and industrial use cases of GraphRAG. Finally, we explore future research directions to inspire further inquiries and advance progress in the field.
Paper Link: https://arxiv.org/abs/2408.08921
- [2024/9/10] We released the second version and created the repository on GitHub.
- [2024/8/15] We released the first version of our survey on arXiv.
- Overview of GraphRAG
- Graph-Based Indexing
- Graph-Guided Retrieval
- Graph-Enhanced Generation
- Downstream Tasks
- Citation
- Contact Us
We divide GraphRAG into three stages: G-Indexing, G-Retrieval, and G-Generation. We categorize the retrieval sources into open-source knowledge graphs and self-constructed graph data. Various enhancing techniques like query enhancement and knowledge enhancement may be adopted to boost the relevance of the results. Unlike RAG, which uses retrieved text directly for generation, GraphRAG requires converting the retrieved graph information into patterns acceptable to generators to enhance the task performance.
The construction and indexing of graph databases form the foundation of GraphRAG, where the quality of the graph database directly impacts GraphRAG's performance.
In GraphRAG, the retrieval process is crucial for ensuring the quality and relevance of generated outputs by extracting pertinent and high-quality graph data from external graph databases. However, retrieving graph data presents two significant challenges: (1) Explosive Candidate Subgraphs: As the graph size increases, the number of candidate subgraphs grows exponentially, requiring heuristic search algorithms to efficiently explore and retrieve relevant subgraphs. (2) Insufficient Similarity Measurement: Accurately measuring similarity between textual queries and graph data necessitates the development of algorithms capable of understanding both textual and structural information. Considerable efforts have previously been dedicated to optimizing the retrieval process to address the above challenges. This survey focuses on examining various aspects of the retrieval process within GraphRAG, including the selection of the retriever, retrieval paradigm, retrieval granularity, and effective enhancement techniques.
The generation stage is another crucial step in GraphRAG, aimed at integrating the retrieved graph data with the query to enhance response quality. In this stage, suitable generation models must be selected based on the downstream tasks. The retrieved graph data is then transformed into formats compatible with the generators. The generator takes both the query and the transformed graph data as inputs to produce the final response. Beyond these fundamental processes, generative enhancement techniques can further improve the output by intensifying the interaction between the query and the graph data and enriching the content generation itself.
- ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science, arxiv 2023, [paper].
- Graph Neural Network Enhanced Retrieval for Question Answering of LLMs, arxiv 2024, [paper].
- Knowledge Graph Prompting for Multi-Document Question Answering, AAAI 2024, [paper].
- Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge, arxiv 2024, [paper].
- From Local to Global: A Graph RAG Approach to Query-Focused Summarization, arxiv 2024, [paper].
- LightRAG: Simple and Fast Retrieval-Augmented Generation, arxiv 2024, [paper].
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models, NeurIPS 2024, [paper].
- DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature, EMNLP (Findings) 2024, [paper].
- Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs, NAACL (Findings) 2024, [paper].
- Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering, SIGIR 2024, [paper].
- RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval, ICLR 2024, [paper].
- SiReRAG: Indexing Similar and Related Information for Multihop Reasoning, arxiv 2024, [paper].
- KG-Retriever: Efficient Knowledge Indexing for Retrieval-Augmented Large Language Models, arxiv 2024, [paper].
- HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses, arxiv 2023, [paper].
- Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs, ACL (Findings 2024), [paper].
- Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning, ICLR 2024, [paper].
- Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval, arxiv 2024, [paper].
- Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph, ICLR 2024, [paper].
- GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering, WWW 2023, [paper].
- QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering, NAACL 2021, [paper].
- Graph Reasoning for Question Answering with Triplet Retrieval, ACL Findings 2023, [paper].
- MVP-Tuning: Multi-View Knowledge Retrieval with Prompt Tuning for Commonsense Reasoning, ACL 2023, [paper].
- UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models, arxiv 2024, [paper].
- DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases, ICLR 2024, [paper].
- From Local to Global: A Graph RAG Approach to Query-Focused Summarization, arxiv 2024, [paper].
- G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering, NeurIPS 2024, [paper].
- GRAG: Graph Retrieval-Augmented Generation, arxiv 2024, [paper].
- Subgraph Retrieval Enhanced by Graph-Text Alignment for Commonsense Question Answering, ECML-PKDD 2024, [paper].
- HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction, ICAIF 2024, [paper].
- EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems, ACL 2024, [paper].
- QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering, NAACL 2021, [paper].
- GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering, WWW 2023, [paper].
- G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering, NeurIPS 2024, [paper].
- Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge, arxiv 2024, [paper].
- GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning, arxiv 2024, [paper].
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models, NeurIPS 2024, [paper].
- Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG, arxiv 2024, [paper].
- Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation, arxiv 2024, [paper].
- Graph Reasoning for Question Answering with Triplet Retrieval, ACL Findings 2023, [paper].
- DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases, ICLR 2024, [paper].
- Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph integration, ACL (Findings) 2024, [paper].
- Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering, ACL 2022, [paper].
- KG-GPT: A General Framework for Reasoning on Knowledge Graphs Using Large Language Models, EMNLP (Findings) 2023, [paper].
- Text-To-KG Alignment: Comparing Current Methods on Classification Tasks, arxiv 2023, [paper].
- StructGPT: A General Framework for Large Language Model to Reason over Structured Data, EMNLP 2023, [paper].
- GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning, arxiv 2024, [paper].
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models, NeurIPS 2024, [paper].
- GRAG: Graph Retrieval-Augmented Generation, arxiv 2024, [paper].
- Graph Reasoning for Question Answering with Triplet Retrieval, ACL (Findings) 2023, [paper].
- G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering, NeurIPS 2024, [paper].
- KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning, EMNLP 2019, [paper].
- QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering, NAACL 2021, [paper].
- GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering, WWW 2023, [paper].
- Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning, ICLR 2024, [paper].
- KG-GPT: A General Framework for Reasoning on Knowledge Graphs Using Large Language Models, EMNLP (Findings) 2023, [paper].
- PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text, EMNLP 2019, [paper].
- Knowledge Graph Prompting for Multi-Document Question Answering, AAAI 2024, [paper].
- Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering, arxiv 2023, [paper].
- KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph, arxiv 2023, [paper].
- Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval, arxiv 2024, [paper].
- Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph, ICLR 2024, [paper].
- Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering, ACL 2022, [paper].
- Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs, NeurIPS 2024, [paper].
- StructGPT: A General Framework for Large Language Model to Reason over Structured Data, EMNLP 2023, [paper].
- KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph, arxiv 2024, [paper].
- Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs, ACL (Findings 2024), [paper].
- GeAR: Graph-enhanced Agent for Retrieval-augmented Generation, arxiv 2024, [paper].
- ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs, ACL (Findings) 2024, [paper].
- KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases, arxiv 2023, [paper].
- Generate-on-Graph: Treat LLM as both Agent and KG for Incomplete Knowledge Graph Question Answering, EMNLP 2024, [paper].
- Reasoning on Efficient Knowledge Paths: Knowledge Graph Guides Large Language Model for Domain Question Answering, ICKG 2024, [paper].
- A Graph-Guided Reasoning Approach for Open-ended Commonsense Question Answering, arxiv 2023, [paper].
- GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning, arxiv 2024, [paper].
- ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science, arxiv 2023, [paper].
- Graph Neural Network Enhanced Retrieval for Question Answering of LLMs, arxiv 2024, [paper].
- Knowledge Graph Prompting for Multi-Document Question Answering, AAAI 2024, [paper].
- Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph, arxiv 2024, [paper].
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models, NeurIPS 2024, [paper].
- KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques, BioNLP@ACL 2024, [paper].
- MVP-Tuning: Multi-View Knowledge Retrieval with Prompt Tuning for Commonsense Reasoning, ACL 2023, [paper].
- Graph Reasoning for Question Answering with Triplet Retrieval, ACL Findings 2023, [paper].
- UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models, arxiv 2024, [paper].
- Keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM, arxiv 2024, [paper].
- Reasoning on Efficient Knowledge Paths: Knowledge Graph Guides Large Language Model for Domain Question Answering, ICKG 2024, [paper].
- Contextual Path Retrieval: A Contextual Entity Relation Embedding-based Approach, TOIS 2023, [paper].
- HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses, arxiv 2023, [paper].
- Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval, arxiv 2024, [paper].
- Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph, ICLR 2024, [paper].
- Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning, ICLR 2024, [paper].
- Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering, arxiv 2023, [paper].
- KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph, arxiv 2023, [paper].
- GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning, arxiv 2024, [paper].
- Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs, NAACL (Findings) 2024, [paper].
- QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering, NAACL 2021, [paper].
- Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering, EMNLP 2020, [paper].
- GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering, WWW 2023, [paper].
- GRAG: Graph Retrieval-Augmented Generation, arxiv 2024, [paper].
- MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models, ACL 2024, [paper].
- DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature, EMNLP (Findings) 2024, [paper].
- A Graph-Guided Reasoning Approach for Open-ended Commonsense Question Answering, arxiv 2023, [paper].
- Subgraph Retrieval Enhanced by Graph-Text Alignment for Commonsense Question Answering, ECML-PKDD 2024, [paper].
- Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning, ICLR 2024, [paper].
- Multi-hop Question Answering under Temporal Knowledge Editing, arxiv 2024, [paper].
- HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses, arxiv 2023, [paper].
- KG-GPT: A General Framework for Reasoning on Knowledge Graphs Using Large Language Models, EMNLP (Findings) 2023, [paper].
- Complex Logical Reasoning over Knowledge Graphs using Large Language Models, arxiv 2023, [paper].
- KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph, arxiv 2023, [paper].
- Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering, ACL 2022, [paper].
- MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models, ACL 2024, [paper].
- DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature, EMNLP (Findings) 2024, [paper].
If you find this survey useful for your research or development, please cite our paper:
@misc{peng2024graphragsurvey,
title={Graph Retrieval-Augmented Generation: A Survey},
author={Boci Peng and Yun Zhu and Yongchao Liu and Xiaohe Bo and Haizhou Shi and Chuntao Hong and Yan Zhang and Siliang Tang},
year={2024},
eprint={2408.08921},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2408.08921},
}
If you have any questions or suggestions, please feel free to contact us via:
Email: bcpeng@stu.pku.edu.cn