-
Notifications
You must be signed in to change notification settings - Fork 485
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Knowledge Graph Extraction example
- Loading branch information
1 parent
9ce0df3
commit 1bf23be
Showing
3 changed files
with
138 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
# Knowledge Graph Extraction | ||
|
||
In this guide, we use [outlines](https://outlines-dev.github.io/outlines/) to extract a knowledge graph from unstructured text. | ||
|
||
We will use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) library. Outlines supports llama-cpp-python, but we need to install it ourselves: | ||
|
||
```shell | ||
pip install llama-cpp-python | ||
``` | ||
|
||
We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): | ||
|
||
```shell | ||
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf | ||
``` | ||
|
||
We initialize the model: | ||
|
||
```python | ||
from llama_cpp import Llama | ||
from outlines import generate, models | ||
|
||
llm = Llama( | ||
"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", | ||
tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( | ||
"NousResearch/Hermes-2-Pro-Llama-3-8B" | ||
), | ||
n_gpu_layers=-1, | ||
flash_attn=True, | ||
n_ctx=8192, | ||
verbose=False | ||
) | ||
model = models.LlamaCpp(llm) | ||
``` | ||
|
||
## Knowledge Graph Extraction | ||
|
||
We first need to define our Pydantic class for each node and each edge of the knowledge graph: | ||
|
||
```python | ||
from pydantic import BaseModel, Field | ||
|
||
class Node(BaseModel): | ||
"""Node of the Knowledge Graph""" | ||
|
||
id: int = Field(..., description="Unique identifier of the node") | ||
label: str = Field(..., description="Label of the node") | ||
property: str = Field(..., description="Property of the node") | ||
|
||
|
||
class Edge(BaseModel): | ||
"""Edge of the Knowledge Graph""" | ||
|
||
source: int = Field(..., description="Unique source of the edge") | ||
target: int = Field(..., description="Unique target of the edge") | ||
label: str = Field(..., description="Label of the edge") | ||
property: str = Field(..., description="Property of the edge") | ||
``` | ||
|
||
We then define the Pydantic class for the knowledge graph and get its JSON schema: | ||
|
||
```python | ||
from typing import List | ||
|
||
class KnowledgeGraph(BaseModel): | ||
"""Generated Knowledge Graph""" | ||
|
||
nodes: List[Node] = Field(..., description="List of nodes of the knowledge graph") | ||
edges: List[Edge] = Field(..., description="List of edges of the knowledge graph") | ||
|
||
schema = KnowledgeGraph.model_json_schema() | ||
``` | ||
|
||
We then need to adapt our prompt to the [Hermes prompt format for JSON schema](https://github.com/NousResearch/Hermes-Function-Calling?tab=readme-ov-file#prompt-format-for-json-mode--structured-outputs): | ||
|
||
```python | ||
def generate_hermes_prompt(user_prompt): | ||
return ( | ||
"<|im_start|>system\n" | ||
"You are a world class AI model who answers questions in JSON " | ||
f"Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|im_end|>\n" | ||
"<|im_start|>user\n" | ||
+ user_prompt | ||
+ "<|im_end|>" | ||
+ "\n<|im_start|>assistant\n" | ||
"<schema>" | ||
) | ||
``` | ||
|
||
For a given user prompt, for example: | ||
|
||
```python | ||
user_prompt = "Alice loves Bob and she hates Charlie." | ||
``` | ||
|
||
We can use `generate.json` by passing the Pydantic class we previously defined, and call the generator with the Hermes prompt: | ||
|
||
```python | ||
from outlines import generate, models | ||
|
||
model = models.LlamaCpp(llm) | ||
generator = generate.json(model, KnowledgeGraph) | ||
prompt = generate_hermes_prompt(user_prompt) | ||
response = generator(prompt, max_tokens=1024, temperature=0, seed=42) | ||
``` | ||
|
||
We obtain the nodes and edges of the knowledge graph: | ||
|
||
```python | ||
print(response.nodes) | ||
print(response.edges) | ||
# [Node(id=1, label='Alice', property='Person'), | ||
# Node(id=2, label='Bob', property='Person'), | ||
# Node(id=3, label='Charlie', property='Person')] | ||
# [Edge(source=1, target=2, label='love', property='Relationship'), | ||
# Edge(source=1, target=3, label='hate', property='Relationship')] | ||
``` | ||
|
||
## (Optional) Visualizing the Knowledge Graph | ||
|
||
We can use the [Graphviz library](https://graphviz.readthedocs.io/en/stable/) to visualize the generated knowledge graph. For detailed installation instructions, see [here](https://graphviz.readthedocs.io/en/stable/#installation). | ||
|
||
```python | ||
from graphviz import Digraph | ||
|
||
dot = Digraph() | ||
for node in response.nodes: | ||
dot.node(str(node.id), node.label, shape='circle', width='1', height='1') | ||
for edge in response.edges: | ||
dot.edge(str(edge.source), str(edge.target), label=edge.label) | ||
|
||
dot.render('knowledge-graph.gv', view=True) | ||
``` | ||
|
||
![Image of the Extracted Knowledge Graph](./images/knowledge-graph-extraction.png) | ||
|
||
This example was originally contributed by [Alonso Silva](https://github.com/alonsosilvaallende). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters