Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified asciidoc/courses/genai-graphrag-python/banner.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
59 changes: 53 additions & 6 deletions asciidoc/courses/genai-graphrag-python/course.adoc
Original file line number Diff line number Diff line change
@@ -1,8 +1,55 @@
= Constructing Knowledge Graphs with Neo4j GraphRAG Python
:categories: llms:99
= Constructing Knowledge Graphs with Neo4j GraphRAG for Python
:categories: llms:10, advanced:7, processing:5, generative-ai:4
:status: draft
:duration: 2 hours
:caption: Learn how to use Python and LLMs to convert unstructured data into knowledge graphs.
:usecase: blank-sandbox
:key-points: Create a knowledge graph using Neo4j GraphRAG for Python, Model a knowledge graph of structure and unstructured data, Query a knowledge graph using retrievers, Customize the knowledge graph build process
:repository: neo4j-graphacademy/genai-graphrag-python
:banner-style: light

In this course, you will learn how to:
== Course Description

* Use the `neo4j_graphrag` Python package to build graph retrieval agumented generation (GraphRAG) applications.
* Build pipelines to construct knowledge graphs from unstructured text.
* Combine semantic search and relationships to improve the quality of LLM generated responses.
In this hands-on course, you will learn how to create knowledge graphs using link:https://neo4j.com/docs/neo4j-graphrag-python/current/[Neo4j GraphRAG for Python^].

You will:

* Use the `neo4j_graphrag` Python package to build knowledge graphs from unstructured data.
* Add structured data to the knowledge graph to improve LLM responses.
* Create retrievers to search the knowledge graph.
* Learn how you can customize the build process to suit your data and use case.

Finally, you will use what you have learned to build a knowledge graph from your data.

=== Prerequisites

This is an advanced course and you should:

* Understand graph and Neo4 fundamental concepts - link:/courses/neo4j-fundamentals[Neo4j and Graph Fundamentals^].
* Have an understanding of how Generative AI, LLMs, and vector indexes are related to Neo4j - link:/courses/genai-fundamentals[Neo4j & GenerativeAI Fundamentals^].
* Be able to read and write simple Cypher queries - link:/courses/cypher-fundamentals[Cypher Fundamentals^].
* Understand how you can use an LLM to generate a knowledge graph - link:/courses/llm-knowledge-graph-construction[https://graphacademy.neo4j.com/courses/llm-knowledge-graph-construction/^].
* Have experience with programming in Python.

=== Duration

{duration}

=== What you will learn

What you will learn how to:

* Use the Neo4j GraphRAG for Python package to create a knowledge graph from unstructured data.
* Enhance a knowledge graph by adding structured data.
* Create retrievers to search a knowledge graph.
* Customize the knowledge graph build process to suit your data and use case.
* Model a knowledge graph of both structured and unstructured data.



[.includes]
== This course includes

* [lessons]#16 lessons#
* [challenges]#7 hands-on challenges#
* [quizes]#8 simple quizzes to support your learning#
159 changes: 159 additions & 0 deletions asciidoc/courses/genai-graphrag-python/illustration.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
= Constructing Knowledge Graphs
:type: lesson
:order: 1

In this lesson you will review the process of constructing knowledge graphs from unstructured text using an LLM.

== The construction process

Typically, you would follow these steps:

. Gather the data
. Chunk the data
. _Vectorize_ the data
. Pass the data to an LLM to extract nodes and relationships
. Use the output to generate the graph

=== Gather your data sources

The first step is to gather your unstructured data.
The data can be in the form of text documents, PDFs, publicly available data, or any other source of information.

Depending on the format, you may need to reformat the data into a format (typically text) that the LLM can process.

The data sources should contain the information you want to include in your knowledge graph.

=== Chunk the data

The next step is to break down the data into _right-sized_ parts.
This process is known as _chunking_.

The size of the chunks depends on the LLM you are using, the complexity of the data, and what you want to extract from the data.

You may not need to chunk the data if the LLM can process the entire document at once and it fits your requirements.

=== Vectorize the data

Depending on your requirements for querying and searching the data, you may need to create *vector embeddings*.
You can use any embedding model to create embeddings for each data chunk, but the same model must be used for all embeddings.

Placing these vectors into a link:https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/vector-indexes/[Vector index^] allows you to perform semantic searches, similarity searches, and clustering on the data.

[TIP]
.Chunking, Vectors, and Similarity Search
You can learn more about how to chunk documents, vectors, similarity search, and embeddings in the GraphAcademy course link:https://graphacademy.neo4j.com/courses/llm-vectors-unstructured/1-introduction/2-semantic-search/[Introduction to Vector Indexes and Unstructured Data^].

=== Extract nodes and relationships

The next step is to pass the unstructured text data to the LLM to extract the nodes and relationships.

You should provide a suitable prompt that will instruct the LLM to:

- Identify the entities in the text.
- Extract the relationships between the entities.
- Format the output so you can use it to generate the graph, for example, as JSON or another structured format.

Optionally, you may also provide additional context or constraints for the extraction, such as the type of entities or relationships you are interested in extracting.


=== Generate the graph

Finally, you can use the output from the LLM to generate the graph by creating the nodes and relationships within Neo4j.

The entity and relationship types would become labels and relationship types in the graph.
The _names_ would be the node and relationship identifiers.

== Example

If you wanted to construct a knowledge graph based on the link:https://en.wikipedia.org/wiki/Neo4j[Neo4j Wikipedia page^], you would:

. **Gather** the text from the page. +
+
image::images/neo4j-wiki.png["A screenshot of the Neo4j wiki page"]
. Split the text into **chunks**.
+
Neo4j is a graph database management system (GDBMS) developed
by Neo4j Inc.
+
{sp}
+
The data elements Neo4j stores are nodes, edges connecting them,
and attributes of nodes and edges...

. Generate **embeddings** and **vectors** for each chunk.
+
[0.21972137987, 0.12345678901, 0.98765432109, ...]

. **Extract** the entities and relationships using an **LLM**.
+
Send the text to the LLM with an appropriate prompt, for example:
+
Your task is to identify the entities and relations requested
with the user prompt from a given text. You must generate the
output in a JSON format containing a list with JSON objects.

Text:
{text}
+
Parse the entities and relationships output by the LLM.
+
[source, json]
----
{
"node_types": [
{
"label": "GraphDatabase",
"properties": [
{
"name": "Neo4j", "type": "STRING"
}
]
},
{
"label": "Company",
"properties": [
{
"name": "Neo4j Inc", "type": "STRING"
}
]
},
{
"label": "Programming Language",
"properties": [
{
"name": "Java", "type": "STRING"
}
]
}
],
"relationship_types": [
{
"label": "DEVELOPED_BY"
},
{
"label": "IMPLEMENTED_IN"
}
],
"patterns": [
["Neo4j", "DEVELOPED_BY", "Neo4j Inc"],
["Neo4j", "IMPLEMENTED_IN", "Java"],
]
}
----
. **Generate** the graph.
+
Use the data to construct the graph in Neo4j by creating nodes and relationships based on the entities and relationships extracted by the LLM.
+
[source, cypher, role=noplay nocopy]
.Generate the graph
----
MERGE (neo4jInc:Company {id: 'Neo4j Inc'})
MERGE (neo4j:GraphDatabase {id: 'Neo4j'})
MERGE (java:ProgrammingLanguage {id: 'Java'})
MERGE (neo4j)-[:DEVELOPED_BY]->(neo4jInc)
MERGE (neo4j)-[:IMPLEMENTED_IN]->(java)
----



[.quiz]
== Check your understanding

include::questions/1-steps.adoc[leveloffset=+1]

[.summary]
== Lesson Summary

In this lesson, you learned about how to construct a knowledge graph.

In the next lesson, you will setup your development environment to build knowledge graphs using Python and Neo4j.
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[.question]
= 1. Knowledge graph construction steps

Which of the following steps could be considered **optional**?

* [ ] Gather your data sources
* [x] Chunk the data
* [x] _Vectorize_ the data
* [ ] Pass the data to an LLM to extract nodes and relationships
* [ ] Use the output to generate the graph

[TIP,role=hint]
.Hint
====
The essential parts of the process are obtaining the data to pass to the LLM and using the output to generate the graph.
====

[TIP,role=solution]
.Solution
====
The optional steps are:

* Chunk the data
* _Vectorize_ the data

It may not be necessary to chunk the data or vectorize it depending on the LLM you are using, the complexity of the data, and your requirements.
====
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
= Setup your development environment
:order: 0
:type: lesson
:lab: {repository-link}
:disable-cache: true
:branch: main
:order: 2
:branch: new-course

You will use the `neo4j-graphrag` package to create retrievers and implement simple applications that use GraphRAG to provide context to LLM queries.

In this module, you will use Python, LangChain, and OpenAI to create a knowledge graph from unstructured data.
You must set up a development environment to run the code examples and exercises.

include::../../../../../../shared/courses/codespace/get-started.adoc[]
Expand All @@ -17,18 +16,18 @@ You will need link:https://python.org[Python] installed and the ability to insta

You may want to set up a virtual environment using link:https://docs.python.org/3/library/venv.html[`venv`^] or link:https://virtualenv.pypa.io/en/latest/[`virtualenv`^] to keep your dependencies separate from other projects.

Clone the link:{repository-link}[github.com/neo4j-graphacademy/llm-knowledge-graph-construction] repository:
Clone the link:{repository-link}[github.com/neo4j-graphacademy/genai-graphrag-python] repository:

[source,bash]
----
git clone https://github.com/neo4j-graphacademy/llm-knowledge-graph-construction
git clone https://github.com/neo4j-graphacademy/genai-graphrag-python
----

Install the required packages using `pip` and download the required data:
Install the required packages using `pip`:

[source,bash]
----
cd llm-knowledge-graph
cd genai-graphrag-python
pip install -r requirements.txt
----

Expand All @@ -45,20 +44,34 @@ Fill in the required values.
[source]
.Create a .env file
----
include::{repository-raw}/{branch}/.env.example[]
# Create a copy of this file and name it .env
OPENAI_API_KEY="sk-..."
NEO4J_URI="{instance-scheme}://{instance-ip}:{instance-boltPort}"
NEO4J_USERNAME="{instance-username}"
NEO4J_PASSWORD="{instance-password}"
NEO4J_PASSWORD="{instance-database}"
----
// include::{repository-raw}/{branch}/.env.example[]

Add your Open AI API key (`OPENAI_API_KEY`), which you can get from link:https://platform.openai.com[platform.openai.com^].

Update the Neo4j sandbox connection details:
ifeval::[{course-completed}==true]

.Course completed
[IMPORTANT]
====
You have completed this course.

The Neo4j sandbox instance is no longer available, you can create a Neo4j cloud instance using link:https://console.neo4j.io[Neo4j AuraDB^]
====

endif::[]


NEO4J_URI:: [copy]#bolt://{instance-ip}:{instance-boltPort}#
NEO4J_USERNAME:: [copy]#{instance-username}#
NEO4J_PASSWORD:: [copy]#{instance-password}#

== Test your setup

You can test your setup by running `llm-knowledge_graph/test_environment.py` - this will attempt to connect to the Neo4j sandbox and the OpenAI API.
You can test your setup by running `genai-graphrag-python/test_environment.py` - this will attempt to connect to the Neo4j sandbox and the OpenAI API.

You will see an `OK` message if you have set up your environment correctly. If any tests fail, check the contents of the `.env` file.

Expand All @@ -68,9 +81,13 @@ When you are ready, you can move on to the next task.

read::Success - let's get started![]



read::Continue[]

[.summary]
== Summary
== Lesson Summary

You have setup your environment and are ready to start this module.
In this lesson, you learned about ..

In the next lesson, you will explore a strategy for storing unstructured data in a graph.
In the next lesson, you will learn about ..
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
= Introduction
:order: 1

Welcome to Constructing Knowledge Graphs with Neo4j GraphRAG for Python.

== Module Overview

In this module, you will:

* Review the process of creating knowledge graphs from unstructured text.
* Setup a development environment to build your own knowledge graph.

If you are ready, let's get going!

link:./1-knowledge-graph-construction/[Ready? Let's go →, role=btn]
Loading