-
Notifications
You must be signed in to change notification settings - Fork 44.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/weaviate memory #424
Merged
BillSchumacher
merged 39 commits into
Significant-Gravitas:master
from
cs0lar:feature/weaviate-memory
Apr 16, 2023
Merged
Changes from 31 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
986d32c
added support for multiple memory provider and added weaviate integra…
csolar da4ba3c
added factory tests
csolar 1e63bc5
Merge branch 'master' into feature/weaviate-memory
cs0lar 0ce0c55
the three memory related commands memory_add, memory_del, memory_ovr …
cs0lar 97ac802
resolved conflicts between master and feature/weaviate-memory
csolar 76a1462
moved pinecone api config settings into provider class
csolar 5fe784a
added weaviate to the supported vector memory providers
cs0lar 786ee60
fixed formatting
csolar 3c7767f
fixed formatting
csolar 96c5e92
added support for weaviate embedded
cs0lar 453b428
added support for weaviate embedded
csolar 75c4132
Merge pull request #1 from cs0lar/feature/weaviate-embedded
cs0lar f2a6ac5
fixed order and removed dupes
csolar e3aea6d
added weaviate embedded section in README
csolar 67b84b5
added client install
csolar b9a4f97
resolved latest conflicts
csolar 415c1cb
fixed quotes
csolar 35ecd95
removed unnecessary flush()
csolar b7d0cc3
removed the extra class property
csolar 5308946
added support of API key based auth
csolar 5592dbd
resolved latest conflicts
csolar 855de18
Merge branch 'master' into feature/weaviate-memory
cs0lar 067e697
fixed weaviate test and fixed conflicts
cs0lar 2f8cf68
fixed conflicts
cs0lar 0c3562f
fixed config bug
cs0lar a94b93b
fixed conflicts
csolar 4c7deef
merged master and resolved conflicts
cs0lar b987cff
Merge branch 'master' into feature/weaviate-memory
cs0lar 005be02
fixed typo
cs0lar b2bfd39
fixed formatting
cs0lar 2678a5a
fixed merge conflicts
cs0lar 8916b76
fixed change request
csolar 899c815
fixed auth code
csolar 5122422
fixed merge conflicts
cs0lar 03d2032
merged master and resolved conflicts
cs0lar 23b89b8
merged master and resolved conflicts
cs0lar 4cd412c
Update requirements.txt
BillSchumacher 37a1dc1
Merge branch 'master' into feature/weaviate-memory
BillSchumacher b865e2c
Fix README
BillSchumacher File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -301,6 +301,28 @@ export PINECONE_ENV="Your pinecone region" # something like: us-east4-gcp | |
export MEMORY_BACKEND="pinecone" | ||
``` | ||
|
||
## Weaviate Setup | ||
|
||
[Weaviate](https://weaviate.io/) is an open-source vector database. It allows to store data objects and vector embeddings from ML-models and scales seamlessly to billion of data objects. [An instance of Weaviate can be created locally (using Docker), on Kubernetes or using Weaviate Cloud Services](https://weaviate.io/developers/weaviate/quickstart). | ||
Although still experimental, [Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded) is supported which allows the Auto-GPT process itself to start a Weaviate instance. To enable it, set `USE_WEAVIATE_EMBEDDED` to `True` and make sure you `pip install "weaviate-client>=3.15.4"`. | ||
|
||
#### Setting up environment variables | ||
|
||
In your `.env` file set the following: | ||
|
||
``` | ||
MEMORY_BACKEND=weaviate | ||
WEAVIATE_HOST="127.0.0.1" # the IP or domain of the running Weaviate instance | ||
WEAVIATE_PORT="8080" | ||
WEAVIATE_PROTOCOL="http" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should add something like "set |
||
WEAVIATE_USERNAME="your username" | ||
WEAVIATE_PASSWORD="your password" | ||
WEAVIATE_API_KEY="your weaviate API key if you have one" | ||
WEAVIATE_EMBEDDED_PATH="/home/me/.local/share/weaviate" # this is optional and indicates where the data should be persisted when running an embedded instance | ||
USE_WEAVIATE_EMBEDDED=False # set to True to run Embedded Weaviate | ||
MEMORY_INDEX="Autogpt" # name of the index to create for the application | ||
``` | ||
|
||
## Setting Your Cache Type | ||
|
||
By default Auto-GPT is going to use LocalCache instead of redis or Pinecone. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
from autogpt.config import Config | ||
from autogpt.memory.base import MemoryProviderSingleton, get_ada_embedding | ||
import uuid | ||
import weaviate | ||
from weaviate import Client | ||
from weaviate.embedded import EmbeddedOptions | ||
from weaviate.util import generate_uuid5 | ||
|
||
|
||
def default_schema(weaviate_index): | ||
return { | ||
"class": weaviate_index, | ||
"properties": [ | ||
{ | ||
"name": "raw_text", | ||
"dataType": ["text"], | ||
"description": "original text for the embedding" | ||
} | ||
], | ||
} | ||
|
||
|
||
class WeaviateMemory(MemoryProviderSingleton): | ||
def __init__(self, cfg): | ||
auth_credentials = self._build_auth_credentials(cfg) | ||
|
||
url = f'{cfg.weaviate_protocol}://{cfg.weaviate_host}:{cfg.weaviate_port}' | ||
|
||
if cfg.use_weaviate_embedded: | ||
self.client = Client(embedded_options=EmbeddedOptions( | ||
hostname=cfg.weaviate_host, | ||
port=int(cfg.weaviate_port), | ||
persistence_data_path=cfg.weaviate_embedded_path | ||
)) | ||
|
||
print(f"Weaviate Embedded running on: {url} with persistence path: {cfg.weaviate_embedded_path}") | ||
else: | ||
self.client = Client(url, auth_client_secret=auth_credentials) | ||
|
||
self.index = cfg.memory_index | ||
self._create_schema() | ||
|
||
def _create_schema(self): | ||
schema = default_schema(self.index) | ||
if not self.client.schema.contains(schema): | ||
self.client.schema.create_class(schema) | ||
|
||
def _build_auth_credentials(self, cfg): | ||
if cfg.weaviate_username and cfg.weaviate_password: | ||
return weaviate_auth.AuthClientPassword(cfg.weaviate_username, cfg.weaviate_password) | ||
if cfg.weaviate_api_key: | ||
return weaviate.auth.AuthApiKey(api_key=cfg.weaviate_api_key) | ||
else: | ||
return None | ||
|
||
def add(self, data): | ||
vector = get_ada_embedding(data) | ||
|
||
doc_uuid = generate_uuid5(data, self.index) | ||
data_object = { | ||
'raw_text': data | ||
} | ||
|
||
with self.client.batch as batch: | ||
batch.add_data_object( | ||
uuid=doc_uuid, | ||
data_object=data_object, | ||
class_name=self.index, | ||
vector=vector | ||
) | ||
|
||
return f"Inserting data into memory at uuid: {doc_uuid}:\n data: {data}" | ||
|
||
def get(self, data): | ||
return self.get_relevant(data, 1) | ||
|
||
def clear(self): | ||
self.client.schema.delete_all() | ||
|
||
# weaviate does not yet have a neat way to just remove the items in an index | ||
# without removing the entire schema, therefore we need to re-create it | ||
# after a call to delete_all | ||
self._create_schema() | ||
|
||
return 'Obliterated' | ||
|
||
def get_relevant(self, data, num_relevant=5): | ||
query_embedding = get_ada_embedding(data) | ||
try: | ||
results = self.client.query.get(self.index, ['raw_text']) \ | ||
.with_near_vector({'vector': query_embedding, 'certainty': 0.7}) \ | ||
.with_limit(num_relevant) \ | ||
.do() | ||
|
||
if len(results['data']['Get'][self.index]) > 0: | ||
return [str(item['raw_text']) for item in results['data']['Get'][self.index]] | ||
else: | ||
return [] | ||
|
||
except Exception as err: | ||
print(f'Unexpected error {err=}, {type(err)=}') | ||
return [] | ||
|
||
def get_stats(self): | ||
result = self.client.query.aggregate(self.index) \ | ||
.with_meta_count() \ | ||
.do() | ||
class_data = result['data']['Aggregate'][self.index] | ||
|
||
return class_data[0]['meta'] if class_data else {} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,7 @@ pymilvus==2.2.4 | |
redis | ||
orjson | ||
Pillow | ||
weaviate-client==3.15.5 | ||
selenium | ||
webdriver-manager | ||
coverage | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
import unittest | ||
from unittest import mock | ||
import sys | ||
import os | ||
|
||
from weaviate import Client | ||
from weaviate.util import get_valid_uuid | ||
from uuid import uuid4 | ||
|
||
from autogpt.config import Config | ||
from autogpt.memory.weaviate import WeaviateMemory | ||
from autogpt.memory.base import get_ada_embedding | ||
|
||
|
||
@mock.patch.dict(os.environ, { | ||
"WEAVIATE_HOST": "127.0.0.1", | ||
"WEAVIATE_PROTOCOL": "http", | ||
"WEAVIATE_PORT": "8080", | ||
"WEAVIATE_USERNAME": "", | ||
"WEAVIATE_PASSWORD": "", | ||
"MEMORY_INDEX": "AutogptTests" | ||
}) | ||
class TestWeaviateMemory(unittest.TestCase): | ||
cfg = None | ||
client = None | ||
|
||
@classmethod | ||
def setUpClass(cls): | ||
# only create the connection to weaviate once | ||
cls.cfg = Config() | ||
|
||
if cls.cfg.use_weaviate_embedded: | ||
from weaviate.embedded import EmbeddedOptions | ||
|
||
cls.client = Client(embedded_options=EmbeddedOptions( | ||
hostname=cls.cfg.weaviate_host, | ||
port=int(cls.cfg.weaviate_port), | ||
persistence_data_path=cls.cfg.weaviate_embedded_path | ||
)) | ||
else: | ||
cls.client = Client(f"{cls.cfg.weaviate_protocol}://{cls.cfg.weaviate_host}:{self.cfg.weaviate_port}") | ||
|
||
""" | ||
In order to run these tests you will need a local instance of | ||
Weaviate running. Refer to https://weaviate.io/developers/weaviate/installation/docker-compose | ||
for creating local instances using docker. | ||
Alternatively in your .env file set the following environmental variables to run Weaviate embedded (see: https://weaviate.io/developers/weaviate/installation/embedded): | ||
|
||
USE_WEAVIATE_EMBEDDED=True | ||
WEAVIATE_EMBEDDED_PATH="/home/me/.local/share/weaviate" | ||
""" | ||
def setUp(self): | ||
try: | ||
self.client.schema.delete_class(self.cfg.memory_index) | ||
except: | ||
pass | ||
|
||
self.memory = WeaviateMemory(self.cfg) | ||
|
||
def test_add(self): | ||
doc = 'You are a Titan name Thanos and you are looking for the Infinity Stones' | ||
self.memory.add(doc) | ||
result = self.client.query.get(self.cfg.memory_index, ['raw_text']).do() | ||
actual = result['data']['Get'][self.cfg.memory_index] | ||
|
||
self.assertEqual(len(actual), 1) | ||
self.assertEqual(actual[0]['raw_text'], doc) | ||
|
||
def test_get(self): | ||
doc = 'You are an Avenger and swore to defend the Galaxy from a menace called Thanos' | ||
|
||
with self.client.batch as batch: | ||
batch.add_data_object( | ||
uuid=get_valid_uuid(uuid4()), | ||
data_object={'raw_text': doc}, | ||
class_name=self.cfg.memory_index, | ||
vector=get_ada_embedding(doc) | ||
) | ||
|
||
batch.flush() | ||
|
||
actual = self.memory.get(doc) | ||
|
||
self.assertEqual(len(actual), 1) | ||
self.assertEqual(actual[0], doc) | ||
|
||
def test_get_stats(self): | ||
docs = [ | ||
'You are now about to count the number of docs in this index', | ||
'And then you about to find out if you can count correctly' | ||
] | ||
|
||
[self.memory.add(doc) for doc in docs] | ||
|
||
stats = self.memory.get_stats() | ||
|
||
self.assertTrue(stats) | ||
self.assertTrue('count' in stats) | ||
self.assertEqual(stats['count'], 2) | ||
|
||
def test_clear(self): | ||
docs = [ | ||
'Shame this is the last test for this class', | ||
'Testing is fun when someone else is doing it' | ||
] | ||
|
||
[self.memory.add(doc) for doc in docs] | ||
|
||
self.assertEqual(self.memory.get_stats()['count'], 2) | ||
|
||
self.memory.clear() | ||
|
||
self.assertEqual(self.memory.get_stats()['count'], 0) | ||
|
||
|
||
if __name__ == '__main__': | ||
unittest.main() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cs0lar should also mention embedded weaviate here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good spot, thanks! This is now done.