Oracle SQL Property GraphsΒ β¦ Β No Vector DBΒ β¦ Β No EmbeddingsΒ β¦ Β Graph-Based Reasoning
Quick StartΒ β’ Β How It WorksΒ β’ Β Live DemoΒ β’ Β SQL/PGQΒ β’ Β Graph SchemaΒ
OraclePageIndex turns documents into knowledge graphs stored natively in Oracle Database using SQL Property Graphs (SQL/PGQ). Unlike traditional RAG that chunks text and matches vectors, OraclePageIndex builds a structured graph of documents, sections, entities, and relationships β then traverses it with standard SQL to find answers.
It's an Oracle AI Database-powered fork of PageIndex by VectifyAI, replacing OpenAI + JSON files with Ollama + Oracle Property Graphs. The result: fully local, fully open-source document intelligence with the power of a real database behind it.
Traditional vector-based RAG relies on semantic similarity β but similarity β relevance. When a financial analyst searches for "Apple's supply chain risks," cosine similarity might surface paragraphs about fruit supply chains. What we need is reasoning over structure, and that's exactly what knowledge graphs provide.
OraclePageIndex builds a SQL Property Graph from each document and uses Oracle's GRAPH_TABLE with MATCH patterns to traverse relationships β the same way a human expert would flip through a table of contents, cross-reference entities, and follow citations.
| Vector RAG | OraclePageIndex (Graph RAG) | |
|---|---|---|
| How it works | Chunk text β embed β cosine similarity | Parse structure β extract entities β graph traversal |
| Storage | Vector embeddings in a vector DB | Property graph in Oracle Database |
| Retrieval | Approximate nearest-neighbor search | Exact graph traversal via SQL/PGQ |
| Relationships | Lost during chunking | First-class citizens (named edges) |
| Explainability | Opaque similarity scores | Traceable graph paths with named relationships |
| Multi-document | Separate vector spaces per doc | Unified knowledge graph with cross-document links |
| Query language | Proprietary APIs | Standard SQL with GRAPH_TABLE |
| Multi-hop reasoning | Requires re-ranking hacks | Native recursive path expressions (->+) |
- No Vector DB: Uses document structure and LLM reasoning for retrieval, powered by Oracle SQL Property Graphs.
- No Chunking: Documents are organized into natural sections, not arbitrary fixed-size chunks.
- Graph-Based Reasoning: Retrieval follows named relationships (MENTIONS, PARENT_OF, RELATED_TO) β not opaque similarity scores.
- Standard SQL: Every query is pure SQL with
GRAPH_TABLE. No proprietary graph query language to learn. - Fully Local: Ollama for LLM inference, Oracle Free for storage. No API keys, no cloud dependencies.
- Entity Extraction: LLM-powered extraction of people, organizations, technologies, locations, concepts, and more.
- Interactive Visualization: D3.js force-directed graph with color-coded nodes, searchable entities, and multiple layouts.
We indexed Apple's 121-page annual report (SEC 10-K filing, fiscal year 2024) end-to-end. Here's what Oracle's property graph captured:
| Metric | Count |
|---|---|
| Document | 1 (Apple 10-K 2024) |
| Sections | 121 page-level sections with summaries |
| Entities | 686 unique entities extracted via LLM |
| Mention Edges | 951 section β entity connections |
| Entity Relationships | 33 cross-entity links |
| D3.js Graph | 808 nodes, 1,105 edges |
Entity Types Extracted
The LLM extracted structured entities across categories:
- Organizations: Apple Inc., SEC, Standard & Poor's, Bank of New York Mellon
- Products: iPhone 16 Pro, iPhone 16, iPhone SE, Apple Watch Series 10, Mac
- Technologies: iOS, macOS, Apple Intelligence, XBRL
- People: Timothy D. Cook, Luca Maestri, Arthur D. Levinson, board members
- Locations: China mainland, India, Japan, South Korea, Taiwan, Vietnam
- Concepts: Risk Factors, Cybersecurity, Human Capital, Net Sales, Tariffs
- Financial: Common Stock, Equity Awards, RSUs, Commercial Paper, Term Debt
SELECT e2.name, e2.entity_type, r.relationship
FROM GRAPH_TABLE (doc_knowledge_graph
MATCH (e1 IS entity WHERE e1.name = 'Apple Inc.')
-[r IS related_to]-> (e2 IS entity)
COLUMNS (e2.name, e2.entity_type, r.relationship)
);Result:
| Entity | Type | Relationship |
|---|---|---|
| Form 10-K | CONCEPT | USED_BY |
| SEC | ORGANIZATION | RELATED_TO |
| Securities Exchange Act of 1934 | LAW | RELATED_TO |
| Net Sales | METRIC | RELATED_TO |
| Product Introductions | CONCEPT | RELATED_TO |
| Human Capital | CONCEPT | RELATED_TO |
| China mainland | LOCATION | OPERATES_IN |
| India | LOCATION | OPERATES_IN |
| Japan | LOCATION | OPERATES_IN |
| South Korea | LOCATION | OPERATES_IN |
| Taiwan | LOCATION | OPERATES_IN |
| Vietnam | LOCATION | OPERATES_IN |
$ python run.py query "What are the main risk factors for Apple?"Answer: Based on the knowledge graph, Apple's main risk factors include:
- Supply Chain Risks: Heavy reliance on manufacturing in China, South Korea, Taiwan, India, and Vietnam
- Financial Risks: Commercial paper interest rate exposure, term debt obligations, currency exchange fluctuations
- Product Risks: Dependence on successful product introductions; component shortages
- Legal & Regulatory: Subject to Securities Exchange Act; SEC reporting requirements
Sources: Pages 3, 9, 11, 27, 45, 76, 81, 98, 99, 108, 110, 121
Related Entities: Apple Inc., China mainland, India, Japan, South Korea, Taiwan, Vietnam, Net Sales, Product Introductions, Tariffs, SEC
OraclePageIndex processes documents in two phases: indexing (PDF β Property Graph) and querying (question β graph traversal β answer).
PDF Document
β
βΌ
βββββββββββββββ
β 1. Parse β Extract text per page, detect structure
ββββββββ¬βββββββ
βΌ
βββββββββββββββ
β 2. Structure β Ollama builds hierarchical section tree
ββββββββ¬βββββββ
βΌ
ββββββββββββββββ
β 3. Summarize β Generate concise summary per section
ββββββββ¬βββββββ
βΌ
ββββββββββββββββββββ
β 4. Extract β Ollama identifies PERSON, ORG, TECH, CONCEPT...
β Entities β
ββββββββ¬ββββββββββββ
βΌ
ββββββββββββββββββββ
β 5. Store in β INSERT vertices + edges into Property Graph
β Oracle β
ββββββββ¬ββββββββββββ
βΌ
ββββββββββββββββββββ
β 6. Link β Ollama discovers cross-entity relationships
β Entities β
ββββββββββββββββββββ
Result: A rich Oracle Property Graph with SQL/PGQ access
User Question: "What products does Apple sell?"
β
βΌ
ββββββββββββββββββββββ
β 1. Concept β Ollama extracts: ["Apple", "products"]
β Extraction β
ββββββββ¬ββββββββββββββ
βΌ
ββββββββββββββββββββββ
β 2. Graph Traversal β SQL/PGQ finds matching entities + sections
ββββββββ¬ββββββββββββββ
βΌ
ββββββββββββββββββββββ
β 3. Context β Retrieve text from graph-matched sections
β Assembly β
ββββββββ¬ββββββββββββββ
βΌ
ββββββββββββββββββββββ
β 4. LLM Reasoning β Ollama reasons over graph-retrieved context
ββββββββββββββββββββββ
β Answer with sources and related entities from the graph
OraclePageIndex Architecture
+----------+ +---------+ +βββββββββββββββββββββββββββ+
β β β β β Oracle Database 26ai β
β PDF ββββββββΊβ Ollama ββββββββΊβ β
β Document β β LLM β β βββββββββββββββββββββββ β
+----------+ +---------+ β β Property Graph β β
β parse β β β β β
β extract β β β [documents]ββββ β β
β reason β β β [sections]βββββ€ β β
+--------+ β β β [entities]βββββ€ β β
β β β β [relationships] β β
βΌ βΌ β βββββββββββββββββββββββ β
+---------+ +βββββββββββββ¬ββββββββββββββ+
β β β
β FastAPI βββββββββββββββββββββ
β Server β SQL/PGQ queries
+----β¬----+
β
βΌ
βββββββββββββββββ
β D3.js β
β Visualization β
β (Force Graph) β
βββββββββββββββββ
- Docker (for Oracle Database 26ai Free)
- Python 3.11+
- Ollama with a model pulled (e.g.,
ollama pull gemma3orollama pull llama3.1)
docker compose up -d
# Wait ~2 minutes for initialization
docker compose logs -f oracle-db # watch until "DATABASE IS READY TO USE"docker exec -i <container-name> sqlplus -s "/ as sysdba" <<'EOF'
ALTER SESSION SET CONTAINER = FREEPDB1;
ALTER PROFILE DEFAULT LIMIT PASSWORD_VERIFY_FUNCTION NULL;
CREATE USER pageindex IDENTIFIED BY pageindex;
GRANT CONNECT, RESOURCE TO pageindex;
GRANT UNLIMITED TABLESPACE TO pageindex;
GRANT CREATE PROPERTY GRAPH TO pageindex;
EXIT;
EOFThe key grant is CREATE PROPERTY GRAPH β this enables Oracle's SQL/PGQ capabilities.
pip install oracledb httpx pyyaml tiktoken PyPDF2 PyMuPDF fastapi uvicorn# Initialize the schema (creates 6 tables + Property Graph)
python run.py initThis creates the Oracle Property Graph doc_knowledge_graph with:
- 3 vertex tables:
documents,sections,entities - 3 edge tables:
section_hierarchy,section_entities,entity_relationships
python run.py index /path/to/document.pdfExample output
Indexing complete.
Document: apple-10k-2024.pdf
Sections: 121
Entities: 686
Relationships: 33
python run.py query "What are the key financial risks?"The query engine:
- Extracts concepts from your question via Ollama
- Traverses the Oracle Property Graph to find matching sections
- Enriches context with related entities
- Reasons over the graph-retrieved context for a grounded answer
python run.py serve
# Open http://localhost:8000Interactive D3.js force-directed graph with:
- Color-coded nodes: Documents (blue), Sections (green), Entities (orange)
- Edge types: Hierarchy (solid), Mentions (dashed), Relationships (dotted)
- Click any node for details, search for entities, toggle layouts
Optional CLI parameters
# Override model or Oracle DSN
python run.py --model llama3.1 --oracle-dsn myhost:1521/MYPDB index doc.pdf
# Verbose logging
python run.py -v query "What is the revenue?"
# Custom server host/port
python run.py serve --host 0.0.0.0 --port 9000The entire knowledge graph is queryable with standard SQL using GRAPH_TABLE. This is what makes Oracle special β no proprietary graph query language, just SQL.
SELECT s.title, e.name AS entity, d.doc_name
FROM GRAPH_TABLE (doc_knowledge_graph
MATCH (e IS entity WHERE e.name = 'iPhone')
<-[m IS mentions]- (s IS section)
COLUMNS (s.title, e.name, s.doc_id)
) gt
JOIN documents d ON d.doc_id = gt.doc_id;SELECT e1.name AS source, r.relationship, e2.name AS target, e2.entity_type
FROM GRAPH_TABLE (doc_knowledge_graph
MATCH (e1 IS entity WHERE e1.name = 'Apple Inc.')
-[r IS related_to]-> (e2 IS entity)
COLUMNS (e1.name, r.relationship, e2.name, e2.entity_type)
);-- Find ALL descendants of a section (multi-hop!)
SELECT child.title, child.summary
FROM GRAPH_TABLE (doc_knowledge_graph
MATCH (parent IS section WHERE parent.title = 'Introduction')
-[h IS parent_of]->+ (child IS section)
COLUMNS (child.title, child.summary)
);The ->+ syntax is SQL/PGQ's recursive path expression β it traverses the hierarchy to any depth in a single query. No recursive CTEs needed.
-- Find all documents and sections that discuss a concept
SELECT d.doc_name, s.title, m.relevance, e.entity_type
FROM GRAPH_TABLE (doc_knowledge_graph
MATCH (e IS entity WHERE e.name = 'Cybersecurity')
<-[m IS mentions]- (s IS section)
COLUMNS (e.entity_type, m.relevance, s.title, s.doc_id)
) gt
JOIN documents d ON d.doc_id = gt.doc_id
ORDER BY gt.relevance;CREATE PROPERTY GRAPH doc_knowledge_graph
VERTEX TABLES (
documents KEY (doc_id) LABEL document PROPERTIES ALL COLUMNS,
sections KEY (section_id) LABEL section PROPERTIES ALL COLUMNS,
entities KEY (entity_id) LABEL entity PROPERTIES ALL COLUMNS
)
EDGE TABLES (
section_hierarchy -- parent_of: section β section
KEY (edge_id)
SOURCE KEY (parent_id) REFERENCES sections (section_id)
DESTINATION KEY (child_id) REFERENCES sections (section_id)
LABEL parent_of PROPERTIES ALL COLUMNS,
section_entities -- mentions: section β entity
KEY (edge_id)
SOURCE KEY (section_id) REFERENCES sections (section_id)
DESTINATION KEY (entity_id) REFERENCES entities (entity_id)
LABEL mentions PROPERTIES ALL COLUMNS,
entity_relationships -- related_to: entity β entity
KEY (edge_id)
SOURCE KEY (source_entity) REFERENCES entities (entity_id)
DESTINATION KEY (target_entity) REFERENCES entities (entity_id)
LABEL related_to PROPERTIES ALL COLUMNS
);Relational tables (under the Property Graph)
Oracle SQL Property Graphs are defined on top of standard relational tables. You always retain full relational SQL access to the same data.
-- Vertex tables
CREATE TABLE documents (
doc_id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
doc_name VARCHAR2(500) NOT NULL,
doc_description CLOB,
source_path VARCHAR2(1000),
created_at TIMESTAMP DEFAULT SYSTIMESTAMP
);
CREATE TABLE sections (
section_id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
doc_id NUMBER NOT NULL REFERENCES documents(doc_id),
node_id VARCHAR2(10),
title VARCHAR2(1000),
summary CLOB,
text_content CLOB,
start_index NUMBER,
end_index NUMBER,
depth_level NUMBER DEFAULT 0
);
CREATE TABLE entities (
entity_id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name VARCHAR2(500) NOT NULL,
entity_type VARCHAR2(100),
description CLOB,
CONSTRAINT uq_entity UNIQUE (name, entity_type)
);
-- Edge tables
CREATE TABLE section_hierarchy (
edge_id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
parent_id NUMBER NOT NULL REFERENCES sections(section_id),
child_id NUMBER NOT NULL REFERENCES sections(section_id)
);
CREATE TABLE section_entities (
edge_id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
section_id NUMBER NOT NULL REFERENCES sections(section_id),
entity_id NUMBER NOT NULL REFERENCES entities(entity_id),
relevance VARCHAR2(20) DEFAULT 'MENTIONS'
);
CREATE TABLE entity_relationships (
edge_id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
source_entity NUMBER NOT NULL REFERENCES entities(entity_id),
target_entity NUMBER NOT NULL REFERENCES entities(entity_id),
relationship VARCHAR2(100) DEFAULT 'RELATED_TO'
);Edit oracle_pageindex/config.yaml:
ollama:
base_url: "http://localhost:11434"
model: "gemma3" # Any Ollama model
temperature: 0
oracle:
user: "pageindex"
password: "pageindex"
dsn: "localhost:1521/FREEPDB1"
pool_min: 1
pool_max: 5The project includes 30 unit tests covering all core modules. All tests use mocked Oracle and Ollama connections β no running services needed.
pytest tests/ -vtests/test_db.py 3 tests (connection pool, schema init, close)
tests/test_entity_extractor.py 3 tests (entity extraction, relationships, edge cases)
tests/test_graph.py 19 tests (all CRUD operations + SQL/PGQ queries)
tests/test_llm.py 5 tests (sync/async chat, JSON extraction)
OraclePageIndex/
oracle_pageindex/
cli.py # CLI: init, index, query, serve
config.yaml # Ollama + Oracle configuration
db.py # Oracle connection pool + schema init
entity_extractor.py # LLM-powered entity extraction
graph.py # Property Graph CRUD + SQL/PGQ queries
indexer.py # Full indexing pipeline
llm.py # Ollama API client
parser.py # PDF parsing + section tree builder
query.py # Graph-powered query engine
utils.py # Config, tokens, tree manipulation
api/
server.py # FastAPI + D3.js visualization server
viz/
index.html # D3.js single-page app
graph.js # Force-directed graph rendering
style.css # Dark-theme styling
tests/ # 30 tests (pytest)
setup_schema.sql # Oracle DDL + Property Graph definition
docker-compose.yml # Oracle 26ai Free container
run.py # CLI entry point
| PageIndex | OraclePageIndex | |
|---|---|---|
| Storage | JSON files on disk | Oracle Property Graph (SQL/PGQ) |
| LLM | OpenAI GPT-4o | Ollama (fully local, open-source) |
| Retrieval | LLM-driven tree navigation | SQL/PGQ graph traversal + LLM reasoning |
| Entities | None | Full named entity extraction + cross-doc linking |
| Relationships | None | Entity relationship discovery (RELATED_TO, PART_OF, OPERATES_IN, ...) |
| Visualization | None | Interactive D3.js force-directed graph |
| Multi-document | Separate JSON per doc | Unified knowledge graph across all documents |
| API | None | FastAPI serving graph data + queries |
| Query interface | None | Natural-language queries with source citations |
| Component | Technology |
|---|---|
| Database | Oracle Database 26ai Free (SQL Property Graphs, SQL/PGQ) |
| Graph Model | CREATE PROPERTY GRAPH with ISO SQL:2023 GRAPH_TABLE |
| LLM | Ollama (gemma3, llama3.1, or any supported model) |
| Backend | Python 3.11+, FastAPI, python-oracledb |
| Visualization | D3.js v7 (interactive force-directed graph) |
| PDF Parsing | PyMuPDF, PyPDF2 |
| Tokenization | tiktoken |
| Container | Docker Compose |
- PageIndex by VectifyAI β the original vectorless, reasoning-based RAG framework that inspired this project.
- Oracle Database β SQL Property Graphs (SQL/PGQ) provide the graph storage and query foundation.
- Ollama β local open-source LLM inference.
- D3.js β interactive graph visualization.
Leave a star if you find this useful!