Skip to content

Commit

Permalink
Add Knowledge Graph Extraction example
Browse files Browse the repository at this point in the history
  • Loading branch information
alonsosilvaallende authored and rlouf committed Jul 18, 2024
1 parent 9ce0df3 commit 1bf23be
Show file tree
Hide file tree
Showing 3 changed files with 138 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
137 changes: 137 additions & 0 deletions docs/cookbook/knowledge_graph_extraction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Knowledge Graph Extraction

In this guide, we use [outlines](https://outlines-dev.github.io/outlines/) to extract a knowledge graph from unstructured text.

We will use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) library. Outlines supports llama-cpp-python, but we need to install it ourselves:

```shell
pip install llama-cpp-python
```

We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):

```shell
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
```

We initialize the model:

```python
from llama_cpp import Llama
from outlines import generate, models

llm = Llama(
"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
"NousResearch/Hermes-2-Pro-Llama-3-8B"
),
n_gpu_layers=-1,
flash_attn=True,
n_ctx=8192,
verbose=False
)
model = models.LlamaCpp(llm)
```

## Knowledge Graph Extraction

We first need to define our Pydantic class for each node and each edge of the knowledge graph:

```python
from pydantic import BaseModel, Field

class Node(BaseModel):
"""Node of the Knowledge Graph"""

id: int = Field(..., description="Unique identifier of the node")
label: str = Field(..., description="Label of the node")
property: str = Field(..., description="Property of the node")


class Edge(BaseModel):
"""Edge of the Knowledge Graph"""

source: int = Field(..., description="Unique source of the edge")
target: int = Field(..., description="Unique target of the edge")
label: str = Field(..., description="Label of the edge")
property: str = Field(..., description="Property of the edge")
```

We then define the Pydantic class for the knowledge graph and get its JSON schema:

```python
from typing import List

class KnowledgeGraph(BaseModel):
"""Generated Knowledge Graph"""

nodes: List[Node] = Field(..., description="List of nodes of the knowledge graph")
edges: List[Edge] = Field(..., description="List of edges of the knowledge graph")

schema = KnowledgeGraph.model_json_schema()
```

We then need to adapt our prompt to the [Hermes prompt format for JSON schema](https://github.com/NousResearch/Hermes-Function-Calling?tab=readme-ov-file#prompt-format-for-json-mode--structured-outputs):

```python
def generate_hermes_prompt(user_prompt):
return (
"<|im_start|>system\n"
"You are a world class AI model who answers questions in JSON "
f"Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|im_end|>\n"
"<|im_start|>user\n"
+ user_prompt
+ "<|im_end|>"
+ "\n<|im_start|>assistant\n"
"<schema>"
)
```

For a given user prompt, for example:

```python
user_prompt = "Alice loves Bob and she hates Charlie."
```

We can use `generate.json` by passing the Pydantic class we previously defined, and call the generator with the Hermes prompt:

```python
from outlines import generate, models

model = models.LlamaCpp(llm)
generator = generate.json(model, KnowledgeGraph)
prompt = generate_hermes_prompt(user_prompt)
response = generator(prompt, max_tokens=1024, temperature=0, seed=42)
```

We obtain the nodes and edges of the knowledge graph:

```python
print(response.nodes)
print(response.edges)
# [Node(id=1, label='Alice', property='Person'),
# Node(id=2, label='Bob', property='Person'),
# Node(id=3, label='Charlie', property='Person')]
# [Edge(source=1, target=2, label='love', property='Relationship'),
# Edge(source=1, target=3, label='hate', property='Relationship')]
```

## (Optional) Visualizing the Knowledge Graph

We can use the [Graphviz library](https://graphviz.readthedocs.io/en/stable/) to visualize the generated knowledge graph. For detailed installation instructions, see [here](https://graphviz.readthedocs.io/en/stable/#installation).

```python
from graphviz import Digraph

dot = Digraph()
for node in response.nodes:
dot.node(str(node.id), node.label, shape='circle', width='1', height='1')
for edge in response.edges:
dot.edge(str(edge.source), str(edge.target), label=edge.label)

dot.render('knowledge-graph.gv', view=True)
```

![Image of the Extracted Knowledge Graph](./images/knowledge-graph-extraction.png)

This example was originally contributed by [Alonso Silva](https://github.com/alonsosilvaallende).
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ nav:
- Playing chess: cookbook/models_playing_chess.md
- Perspective-taking prompting: cookbook/simtom.md
- Question-answering with citations: cookbook/qa-with-citations.md
- Knowledge Graph Extraction: cookbook/knowledge_graph_extraction.md
- Run on the cloud:
- BentoML: cookbook/deploy-using-bentoml.md
- Cerebrium: cookbook/deploy-using-cerebrium.md
Expand Down

0 comments on commit 1bf23be

Please sign in to comment.