Note: Full article write-up coming soon! For now, feel free to browse the code notebook
- Large Language Models (LLMs) have ushered in a new era for the use of natural language in question answering over enterprise data, particularly in SQL databases (aka Text-to-SQL).
- One powerful approach is the use of Knowledge Graphs, which provide an enriched semantic representation of the databases (beyond basic schemas) that can significantly enhance the accuracy of SQL queries generated.
- The notion of “semantic layers” building on longstanding ideas of assigning clear definitions to columns so as to facilitate analytics.
- Experiments have shown that a GPT-4 system configured with a zero-shot prompt and connected to a Knowledge Graph representation of an enterprise SQL database yields results up to three times more accurate than those produced with only a raw SQL schema. These results reinforce the critical role domain semantics can play in bridging the gap between complex database structures and effective, user-friendly queries.
- Knowledge engineering — the practice of mapping and governing an organization's conceptual understanding of data — often remains an implicit and underappreciated function spread across various roles. Data engineers, data stewards, analytics engineers, and analysts may each apply their own assumptions to data definitions in an ad hoc manner, leading to inconsistent or fragmented semantic layers.
- As organizations mature, they increasingly recognize the need to transition from basic semantic layers toward more comprehensive solutions like Knowledge Graphs and ontologies. By doing so, enterprises can not only improve data consistency and understanding but also position themselves to harness LLM technology more effectively.
- In this project, we demonstrate how Text-to-SQL with GraphRAG leverages a Knowledge Graph's semantic representation of an SQL database to deliver accuracy and reliability in natural language question answering.
- The goal is to showcase how a well-defined semantic layer — coupled with robust knowledge engineering — can unlock LLMs’ full potential for enterprise-scale data insights.
- Neo4j: A graph database that enables efficient storage and querying of knowledge graphs, providing the semantic representation of the SQL database for improved Text-to-SQL accuracy. We will also be using the
neo4j-graphrag
package to allow easy integration to Neo4j's GraphRAG features. - OpenAI GPT-4o model: LLM to process natural language queries and generates SQL queries, leveraging knowledge graph-enhanced context for higher precision.
- SQLAlchemy: A Python SQL toolkit and ORM that facilitates interaction with SQL databases, enabling seamless execution and management of generated SQL queries. In particular, we will be using SQLAlchemy to access the SQLite database (refer to
data/czech_financial.sqlite
) - DBeaver: A universal database management tool that provides a user-friendly interface for exploring, debugging, and validating SQL queries across different database systems.
- Download the Czech Financial Dataset from here: http://sorry.vse.cz/~berka/challenge/pkdd1999/data_berka.zip
- More info here: https://sorry.vse.cz/~berka/challenge/pkdd1999/chall.htm
- Then unzip unzipping the raw .ASC files into the
data/raw/
folder
- Run
$env:PYTHONPATH="$env:PYTHONPATH;C:\Users\<username>\<folder>\Text-to-SQL-with-Neo4j-GraphRAG\src"
to appendsrc
directory to thePYTHONPATH
environment variable, enabling Python to locate and import modules from that directory. - Run
python .\src\utils\convert_asc_to_sqlite.py
to convert ASC files into a SQLite DB file
- Use of DBeaver
- DBeaver Community Edition (CE) is a free cross-platform database tool for developers, database administrators, analysts, and everyone working with data. It supports all popular SQL databases like MySQL, MariaDB, PostgreSQL, SQLite, Apache Family, and more.
- Download from here: https://dbeaver.io/download/
- You can find the slides to my recent presentation at the Neo4j Meetup Tech Talk (May 2025) in the
presentations/
folder
- https://neo4j.com/blog/developer/enhancing-hybrid-retrieval-graphrag-python-package/
- https://www.linkedin.com/blog/engineering/ai/practical-text-to-sql-for-data-analytics
- https://www.sciencedirect.com/science/article/pii/S1570826824000441
- https://medium.com/@ianormy/microsoft-graphrag-with-an-rdf-knowledge-graph-part-3-328f85d7dab2