LangChain integration for BigQuery Graph. Provides a GraphStore implementation and two retriever classes for building Graph RAG applications with BigQuery.
- BigQueryGraphStore --
GraphStoreinterface for BigQuery Graphs (schema management, GQL queries, graph document ingestion) - BigQueryGraphVectorContextRetriever -- Vector similarity search on graph nodes with optional multi-hop neighborhood expansion
- BigQueryGraphTextToGQLRetriever -- LLM-powered natural language to GQL translation with optional few-shot examples
pip install langchain-bigquery-graphFor development:
pip install langchain-bigquery-graph[dev]- Python 3.10+
- A Google Cloud project with BigQuery API enabled
- Authentication configured:
gcloud auth application-default login
- Set the following environment variable to use Vertex AI with ADC (no API key required):
export GOOGLE_GENAI_USE_VERTEXAI=true - A BigQuery dataset must be created before using
BigQueryGraphStore. Tables and property graphs are created automatically, but the dataset is not.bq mk --dataset --location=us-central1 YOUR_PROJECT_ID:YOUR_DATASET
from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.documents import Document
from langchain_bigquery_graph import BigQueryGraphStore
store = BigQueryGraphStore(
project_id="my-project",
dataset_id="my_dataset",
graph_name="knowledge_graph",
location="us-central1",
)
# Define nodes and relationships
alice = Node(id="alice", type="Person", properties={"name": "Alice", "age": 30})
bob = Node(id="bob", type="Person", properties={"name": "Bob", "age": 25})
acme = Node(id="acme", type="Company", properties={"name": "Acme Corp"})
works_at = Relationship(source=alice, target=acme, type="WORKS_AT")
knows = Relationship(source=alice, target=bob, type="KNOWS")
doc = GraphDocument(
nodes=[alice, bob, acme],
relationships=[works_at, knows],
source=Document(page_content="Alice works at Acme Corp and knows Bob."),
)
# This creates tables, the property graph, and inserts data
store.add_graph_documents([doc])
# Query with GQL
results = store.query(
"GRAPH `my_dataset`.`knowledge_graph` MATCH (p:Person) RETURN p.name AS name"
)
print(results)
# [{'name': 'Alice'}, {'name': 'Bob'}]Search graph nodes by vector similarity and optionally expand results by traversing the graph neighborhood. Vector search runs as SQL on the base table (since BigQuery property graphs don't support ARRAY properties), while graph traversal uses GQL.
from langchain_google_vertexai import VertexAIEmbeddings
from langchain_bigquery_graph import BigQueryGraphVectorContextRetriever
embeddings = GoogleGenerativeAIEmbeddings(model="gemini-embedding-001")
# Return specific properties from matching nodes
retriever = BigQueryGraphVectorContextRetriever(
graph_store=store,
embedding_service=embeddings,
label_expr="Person",
embeddings_column="embedding",
return_properties_list=["name", "age"],
top_k=5,
k=10,
)
docs = retriever.invoke("Who works at Acme?")With multi-hop expansion:
# Expand results by traversing 2 hops from matching nodes
retriever = BigQueryGraphVectorContextRetriever(
graph_store=store,
embedding_service=embeddings,
label_expr="Person",
expand_by_hops=2,
top_k=5,
)
docs = retriever.invoke("Tell me about Alice")You can also use the factory method:
retriever = BigQueryGraphVectorContextRetriever.from_params(
embedding_service=embeddings,
graph_store=store,
expand_by_hops=1,
)Translate natural language questions into GQL queries using an LLM, execute them, and return the results.
from langchain_google_vertexai import ChatVertexAI
from langchain_bigquery_graph import BigQueryGraphTextToGQLRetriever
llm = ChatVertexAI(model="gemini-2.5-flash")
retriever = BigQueryGraphTextToGQLRetriever.from_params(
llm=llm,
graph_store=store,
k=10,
)
docs = retriever.invoke("Find all people who work at Acme Corp")With few-shot examples for better GQL generation:
from langchain_google_vertexai import VertexAIEmbeddings
retriever = BigQueryGraphTextToGQLRetriever.from_params(
llm=llm,
embedding_service=GoogleGenerativeAIEmbeddings(model="gemini-embedding-001"),
graph_store=store,
)
retriever.add_example(
question="Who works at Acme?",
gql="GRAPH `my_dataset`.`knowledge_graph` MATCH (p:Person)-[:WORKS_AT]->(c:Company {name: 'Acme Corp'}) RETURN p.name AS name",
)
docs = retriever.invoke("Which people are employed by Acme Corp?")| Parameter | Type | Default | Description |
|---|---|---|---|
project_id |
str |
required | Google Cloud project ID |
dataset_id |
str |
required | BigQuery dataset ID |
graph_name |
str |
required | Property graph name |
client |
bigquery.Client |
None |
Optional pre-configured client |
location |
str |
None |
BigQuery location (e.g., us-central1). Ignored if client is provided |
use_flexible_schema |
bool |
False |
Use JSON-based flexible schema |
static_node_properties |
List[str] |
[] |
Properties stored as static columns in flexible schema |
static_edge_properties |
List[str] |
[] |
Properties stored as static columns in flexible schema |
Methods:
| Method | Description |
|---|---|
query(query, params) |
Execute a GQL query and return results |
get_schema |
Property graph schema as JSON string |
get_structured_schema |
Schema as a Python dictionary |
add_graph_documents(docs) |
Create tables, graph DDL, and insert data |
refresh_schema() |
Reload schema from INFORMATION_SCHEMA |
cleanup() |
Drop property graph and all associated tables |
| Parameter | Type | Default | Description |
|---|---|---|---|
graph_store |
BigQueryGraphStore |
required | The graph store to search |
embedding_service |
Embeddings |
required | Embedding model for vectorizing queries |
label_expr |
str |
"%" |
Label expression to filter nodes |
return_properties_list |
List[str] |
[] |
Specific properties to return (mutually exclusive with expand_by_hops) |
embeddings_column |
str |
"embedding" |
Column name storing node embeddings |
distance_strategy |
DistanceStrategy |
COSINE |
COSINE or EUCLIDEAN |
top_k |
int |
3 |
Number of vector similarity matches |
expand_by_hops |
int |
-1 |
Hops to traverse for neighborhood expansion (mutually exclusive with return_properties_list) |
k |
int |
10 |
Max number of graph results to return |
Note: Exactly one of
return_properties_listorexpand_by_hopsmust be provided. Withreturn_properties_list, results come from a direct SQL query on the base table. Withexpand_by_hops, vector search finds matching node IDs via SQL, then GQL traverses the graph neighborhood.
| Parameter | Type | Default | Description |
|---|---|---|---|
graph_store |
BigQueryGraphStore |
required | The graph store to query |
llm |
BaseLanguageModel |
required | LLM for GQL generation |
k |
int |
10 |
Max number of results to return |
selector |
SemanticSimilarityExampleSelector |
None |
Few-shot example selector (auto-created via from_params) |
from langchain_bigquery_graph import DistanceStrategy
DistanceStrategy.COSINE # COSINE_DISTANCE
DistanceStrategy.EUCLIDEAN # EUCLIDEAN_DISTANCEcd examples
cp .env.example .env
# Edit .env with your project settings
pip install -r requirements.txt
# or: pip install -e "..[examples]"
python basic_usage.py
python basic_usage.py --cleanup # remove graph and tables after runninglangchain-bigquery-graph/
├── src/langchain_bigquery_graph/
│ ├── __init__.py # Public exports
│ ├── graph_store.py # BigQueryGraphStore, BigQueryGraphSchema, ElementSchema
│ ├── graph_retriever.py # BigQueryGraphVectorContextRetriever,
│ │ # BigQueryGraphTextToGQLRetriever, DistanceStrategy
│ ├── graph_utils.py # GQL syntax fixing and extraction
│ └── prompts.py # GQL generation prompt templates
├── tests/
│ ├── test_graph_store.py
│ └── test_graph_retriever.py
├── examples/ # Jupyter notebooks and Python example scripts
└── pyproject.toml
| Decision | Approach | Rationale |
|---|---|---|
| Upsert | MERGE INTO ... WHEN MATCHED / NOT MATCHED |
BigQuery lacks INSERT OR UPDATE |
| Primary Key | NOT ENFORCED |
BigQuery advisory-only PKs |
| Graph DDL | CREATE OR REPLACE PROPERTY GRAPH |
Idempotent creation; supports schema evolution on existing graphs |
| Vector Search | SQL on base table with COSINE_DISTANCE / EUCLIDEAN_DISTANCE |
Property graphs don't support ARRAY properties; vector search uses SQL, graph traversal uses GQL |
| JSON conversion | TO_JSON |
BigQuery equivalent of Spanner's SAFE_TO_JSON |
| Schema evolution | ALTER TABLE ADD COLUMN IF NOT EXISTS |
Idempotent; safely adds new properties even if table already exists |
| ARRAY properties | Stored in table, excluded from LABEL PROPERTIES |
BigQuery property graphs don't support ARRAY types; data is stored but not exposed as graph properties |
pip install -e ".[dev]"
pytest tests/ -vAll tests use mocked BigQuery clients and do not require a real BigQuery connection.
- langchain-bigquery-python -- Unified
langchain-bigquerypackage combining this Graph Store with the Hybrid Search vector store in a single distribution - langchain-google-spanner-python -- Spanner Graph integration that this project is based on
- LightRAG -- Simple and Fast Retrieval-Augmented Generation that incorporates graph structures into text indexing and retrieval processes.
- lightrag-bigquery -- BigQuery storage backend for LightRAG
- PathRAG -- A Path-based Retrieval-Augmented Generation (PathRAG) library. Contributed the Google Cloud Spanner storage backend (Graph, Vector, KV) and LiteLLM/Gemini model support to the original framework.
- pathrag-bigquery -- A Google Cloud BigQuery storage plugin for PathRAG. It provides KV Vector, and Graph storage classes as an external plugin — no modifications to PathRAG source code required.
MIT License. See LICENSE for details.