Skip to content

[RFC] Implement a graphLookup command to demonstrate multi-hop queries #5088

@LantaoJin

Description

@LantaoJin

Problem Statement
We can functionally implement a PPL graphLookup command that is semantically equivalent to MongoDB’s $graphLookup pipeline aggregation query, which is to demonstrate multi-hop queries based on a graph model.

Current State
Lack of multi-hop ability in query

Long-Term Goals

  • Describe the ideal outcome or future state you aim to achieve.
    • A new PPL command to support multi-hop query with high performance.
  • What are the primary objectives of this solution should fulfill in the long run?
    • Support GraphRag query with PPL

Proposal
PPL syntax:

source = <vertex_index> 
| graphlookup
  <edge_index>
  startWith=<expression>
  fromField=<field>
  toField=<field>
  [maxDepth=<number>]
  [match=<expression>]
  as <new_field>

Approach
Approach 1
Implement with Calcite Recursive CTE: POC here


                    ┌──────────────┐                                     
                    │ Graph G=(V,E)│                                     
                    └──────┬───────┘                                     
                           │ represented as                              
                           ▼                                             
                    ┌──────────────┐                                     
                    │edges(src,dst)│                                     
                    └──────┬───────┘                                     
                           │                                             
           ┌───────────────┼───────────────┐                             
           │               │               │                             
           ▼               ▼               ▼                             
      ┌─────────┐    ┌───────────┐    ┌───────────┐                      
      │  JOIN   │    │ UNION ALL │    │ RECURSIVE │                      
      │=traverse│    │=accumulate│    │= iterate  │                      
      │one edge │    │           │    │           │                      
      └─────────┘    └───────────┘    └───────────┘                      
           │               │               │                             
           └───────────────┼───────────────┘                             
                           │                                             
                           ▼                                             
                    ┌───────────────┐                                    
                    │ Graph Search  │                                    
                    │   Result      │                                    
                    │(reachable set)│                                    
                    └───────────────┘   

Approach 2 (perferred)
Provide a native Enumerable operator to implement high performance multi-hop traverse.

Metadata

Metadata

Labels

PPLPiped processing language

Type

No type

Projects

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions