Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/weaviate memory #424

Merged
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
986d32c
added support for multiple memory provider and added weaviate integra…
csolar Apr 7, 2023
da4ba3c
added factory tests
csolar Apr 7, 2023
1e63bc5
Merge branch 'master' into feature/weaviate-memory
cs0lar Apr 8, 2023
0ce0c55
the three memory related commands memory_add, memory_del, memory_ovr …
cs0lar Apr 8, 2023
97ac802
resolved conflicts between master and feature/weaviate-memory
csolar Apr 8, 2023
76a1462
moved pinecone api config settings into provider class
csolar Apr 8, 2023
5fe784a
added weaviate to the supported vector memory providers
cs0lar Apr 11, 2023
786ee60
fixed formatting
csolar Apr 11, 2023
3c7767f
fixed formatting
csolar Apr 11, 2023
96c5e92
added support for weaviate embedded
cs0lar Apr 12, 2023
453b428
added support for weaviate embedded
csolar Apr 12, 2023
75c4132
Merge pull request #1 from cs0lar/feature/weaviate-embedded
cs0lar Apr 12, 2023
f2a6ac5
fixed order and removed dupes
csolar Apr 12, 2023
e3aea6d
added weaviate embedded section in README
csolar Apr 12, 2023
67b84b5
added client install
csolar Apr 12, 2023
b9a4f97
resolved latest conflicts
csolar Apr 12, 2023
415c1cb
fixed quotes
csolar Apr 12, 2023
35ecd95
removed unnecessary flush()
csolar Apr 12, 2023
b7d0cc3
removed the extra class property
csolar Apr 12, 2023
5308946
added support of API key based auth
csolar Apr 12, 2023
5592dbd
resolved latest conflicts
csolar Apr 12, 2023
855de18
Merge branch 'master' into feature/weaviate-memory
cs0lar Apr 13, 2023
067e697
fixed weaviate test and fixed conflicts
cs0lar Apr 13, 2023
2f8cf68
fixed conflicts
cs0lar Apr 13, 2023
0c3562f
fixed config bug
cs0lar Apr 13, 2023
a94b93b
fixed conflicts
csolar Apr 13, 2023
4c7deef
merged master and resolved conflicts
cs0lar Apr 15, 2023
b987cff
Merge branch 'master' into feature/weaviate-memory
cs0lar Apr 15, 2023
005be02
fixed typo
cs0lar Apr 15, 2023
b2bfd39
fixed formatting
cs0lar Apr 15, 2023
2678a5a
fixed merge conflicts
cs0lar Apr 15, 2023
8916b76
fixed change request
csolar Apr 15, 2023
899c815
fixed auth code
csolar Apr 15, 2023
5122422
fixed merge conflicts
cs0lar Apr 15, 2023
03d2032
merged master and resolved conflicts
cs0lar Apr 15, 2023
23b89b8
merged master and resolved conflicts
cs0lar Apr 16, 2023
4cd412c
Update requirements.txt
BillSchumacher Apr 16, 2023
37a1dc1
Merge branch 'master' into feature/weaviate-memory
BillSchumacher Apr 16, 2023
b865e2c
Fix README
BillSchumacher Apr 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,18 @@ FAST_LLM_MODEL=gpt-3.5-turbo
GOOGLE_API_KEY=
CUSTOM_SEARCH_ENGINE_ID=
USE_AZURE=False
OPENAI_AZURE_API_BASE=your-base-url-for-azure
OPENAI_AZURE_API_VERSION=api-version-for-azure
OPENAI_AZURE_DEPLOYMENT_ID=deployment-id-for-azure
OPENAI_API_BASE=your-base-url-for-azure
OPENAI_API_VERSION=api-version-for-azure
OPENAI_DEPLOYMENT_ID=deployment-id-for-azure
IMAGE_PROVIDER=dalle
HUGGINGFACE_API_TOKEN=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo the relocation of the HUGGINGFACE line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! there were also some dupes I have now removed.

USE_MAC_OS_TTS=False
WEAVIATE_HOST="127.0.0.1"
WEAVIATE_PORT=8080
WEAVIATE_PROTOCOL="http"
USE_WEAVIATE_EMBEDDED=False
WEAVIATE_EMBEDDED_PATH="/home/me/.local/share/weaviate"
WEAVIATE_USERNAME=
WEAVIATE_PASSWORD=
MEMORY_INDEX="auto-gpt"
MEMORY_BACKEND="local"
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,26 @@ export PINECONE_ENV="Your pinecone region" # something like: us-east4-gcp

```

## Weaviate Setup

[Weaviate](https://weaviate.io/) is an open-source vector database. It allows to store data objects and vector embeddings from ML-models and scales seamlessly to billion of data objects. [An instance of Weaviate can be created locally (using Docker), on Kubernetes or using Weaviate Cloud Services](https://weaviate.io/developers/weaviate/quickstart).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cs0lar should also mention embedded weaviate here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good spot, thanks! This is now done.

Although still experimental, [Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded) is supported which allows the Auto-GPT process itself to start a Weaviate instance. To enable it, set `USE_WEAVIATE_EMBEDDED` to `True` and make sure you `pip install "weaviate-client>=3.15.4`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Although still experimental, [Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded) is supported which allows the Auto-GPT process itself to start a Weaviate instance. To enable it, set `USE_WEAVIATE_EMBEDDED` to `True` and make sure you `pip install "weaviate-client>=3.15.4`.
Although still experimental, [Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded) is supported which allows the Auto-GPT process itself to start a Weaviate instance. To enable it, set `USE_WEAVIATE_EMBEDDED` to `True` and make sure you `pip install "weaviate-client>=3.15.4"`.


#### Setting up enviornment variables
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed the instructions and ... it did not work 😂

Because it is missing the weaviate client.

So, perhaps edit to say something like "First, run pip install weaviate-client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsm207 HA! I had assumed pip install -r requirements.txt would have been run. But you raise a good point. Have added to README.


In your `.env` file set the following:

```
MEMORY_BACKEND=weaviate
WEAVIATE_HOST="127.0.0.1" # the IP or domain of the running Weaviate instance
WEAVIATE_PORT="8080"
WEAVIATE_PROTOCOL="http"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add something like "set USE_WEAVIATE_EMBEDDED=True if you want to use embedded weaviate"

WEAVIATE_USERNAME="your username"
WEAVIATE_PASSWORD="your password"
WEAVIATE_EMBEDDED_PATH="/home/me/.local/share/weaviate" # this is optional and indicates where the data should be persisted when running an embedded instance
USE_WEAVIATE_EMBEDDED=False
MEMORY_INDEX="Autogpt" # name of the index to create for the application
```

## View Memory Usage

Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ pinecone-client==2.2.1
redis
orjson
Pillow
weaviate-client==3.15.5
9 changes: 9 additions & 0 deletions scripts/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,15 @@ def __init__(self):
self.pinecone_api_key = os.getenv("PINECONE_API_KEY")
self.pinecone_region = os.getenv("PINECONE_ENV")

self.weaviate_host = os.getenv("WEAVIATE_HOST")
self.weaviate_port = os.getenv("WEAVIATE_PORT")
self.weaviate_protocol = os.getenv("WEAVIATE_PROTOCOL", "http")
self.weaviate_username = os.getenv("WEAVIATE_USERNAME", None)
self.weaviate_password = os.getenv("WEAVIATE_PASSWORD", None)
self.weaviate_scopes = os.getenv("WEAVIATE_SCOPES", None)
self.weaviate_embedded_path = os.getenv('WEAVIATE_EMBEDDED_PATH', '~/.local/share/weaviate')
self.use_weaviate_embedded = os.getenv("USE_WEAVIATE_EMBEDDED", "False") == "True"

self.image_provider = os.getenv("IMAGE_PROVIDER")
self.huggingface_api_token = os.getenv("HUGGINGFACE_API_TOKEN")

Expand Down
14 changes: 13 additions & 1 deletion scripts/memory/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@
print("Pinecone not installed. Skipping import.")
PineconeMemory = None

try:
from memory.weaviate import WeaviateMemory
except ImportError:
print("Weaviate not installed. Skipping import.")
WeaviateMemory = None

def get_memory(cfg, init=False):
memory = None
Expand All @@ -28,7 +33,13 @@ def get_memory(cfg, init=False):
" use Redis as a memory backend.")
else:
memory = RedisMemory(cfg)

elif cfg.memory_backend == "weaviate":
if not WeaviateMemory:
print("Error: Weaviate is not installed. Please install weaviate-client to"
" use Weaviate as a memory backend.")
else:
memory = WeaviateMemory(cfg)

if memory is None:
memory = LocalCache(cfg)
if init:
Expand All @@ -41,4 +52,5 @@ def get_memory(cfg, init=False):
"LocalCache",
"RedisMemory",
"PineconeMemory",
"WeaviateMemory"
]
1 change: 0 additions & 1 deletion scripts/memory/pinecone.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this

import pinecone

from memory.base import MemoryProviderSingleton, get_ada_embedding
Expand Down
110 changes: 110 additions & 0 deletions scripts/memory/weaviate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
from config import Config
from memory.base import MemoryProviderSingleton, get_ada_embedding
import uuid
import weaviate
from weaviate import Client
from weaviate.embedded import EmbeddedOptions
from weaviate.util import generate_uuid5

def default_schema(weaviate_index):
return {
"class": weaviate_index,
"properties": [
{
"name": "raw_text",
"dataType": ["text"],
"description": "original text for the embedding"
}
],
}

class WeaviateMemory(MemoryProviderSingleton):
def __init__(self, cfg):
auth_credentials = self._build_auth_credentials(cfg)
Copy link

@hsm207 hsm207 Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cs0lar I think adding support for API key would be immensely useful for users getting started using weaviate's free sandbox environment, so they can have a similar experience with pinecone i.e. just provide an api key and url:

https://weaviate.io/developers/weaviate/client-libraries/python#api-key-authentication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


url = f'{cfg.weaviate_protocol}://{cfg.weaviate_host}:{cfg.weaviate_port}'

if cfg.use_weaviate_embedded:
self.client = Client(embedded_options=EmbeddedOptions(
hostname=cfg.weaviate_host,
port=int(cfg.weaviate_port),
persistence_data_path=cfg.weaviate_embedded_path
))

print(f"Weaviate Embedded running on: {url} with persistence path: {cfg.weaviate_embedded_path}")
else:
self.client = Client(url, auth_client_secret=auth_credentials)

self.index = cfg.memory_index
self._create_schema()

def _create_schema(self):
schema = default_schema(self.index)
if not self.client.schema.contains(schema):
self.client.schema.create_class(schema)

def _build_auth_credentials(self, cfg):
if cfg.weaviate_username and cfg.weaviate_password:
return weaviate_auth.AuthClientPassword(cfg.weaviate_username, cfg.weaviate_password)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cs0lar this line will throw an error as weaviate_auth is not defined anymore in this module

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return weaviate_auth.AuthClientPassword(cfg.weaviate_username, cfg.weaviate_password)
return weaviate.AuthClientPassword(cfg.weaviate_username, cfg.weaviate_password)
```@cs0lar this line will throw an error as `weaviate_auth` is not defined anymore in this module

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cs0lar I think you missed this change

else:
return None

def add(self, data):
vector = get_ada_embedding(data)

doc_uuid = generate_uuid5(data, self.index)
data_object = {
'class': self.index,
'raw_text': data
}

with self.client.batch as batch:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cs0lar is data always going to be uploaded one at a time from auto gpt to weaviate?

batch.add_data_object(
uuid=doc_uuid,
data_object=data_object,
class_name=self.index,
vector=vector
)

batch.flush()
Copy link

@hsm207 hsm207 Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cs0lar this call is unnecessary since you are using the batch context manager


return f"Inserting data into memory at uuid: {doc_uuid}:\n data: {data}"

def get(self, data):
return self.get_relevant(data, 1)


def clear(self):
self.client.schema.delete_all()

# weaviate does not yet have a neat way to just remove the items in an index
# without removing the entire schema, therefore we need to re-create it
# after a call to delete_all
self._create_schema()

return 'Obliterated'

def get_relevant(self, data, num_relevant=5):
query_embedding = get_ada_embedding(data)
try:
results = self.client.query.get(self.index, ['raw_text']) \
.with_near_vector({'vector': query_embedding, 'certainty': 0.7}) \
.with_limit(num_relevant) \
.do()

if len(results['data']['Get'][self.index]) > 0:
return [str(item['raw_text']) for item in results['data']['Get'][self.index]]
else:
return []

except Exception as err:
print(f'Unexpected error {err=}, {type(err)=}')
return []

def get_stats(self):
result = self.client.query.aggregate(self.index) \
.with_meta_count() \
.do()
class_data = result['data']['Aggregate'][self.index]

return class_data[0]['meta'] if class_data else {}
113 changes: 113 additions & 0 deletions tests/test_weaviate_memory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
import unittest
from unittest import mock
import sys
import os

from weaviate import Client
from weaviate.util import get_valid_uuid
from uuid import uuid4

sys.path.append(os.path.abspath('./scripts'))
from config import Config
from memory.weaviate import WeaviateMemory
from memory.base import get_ada_embedding

@mock.patch.dict(os.environ, {
"WEAVIATE_HOST": "127.0.0.1",
"WEAVIATE_PROTOCOL": "http",
"WEAVIATE_PORT": "8080",
"WEAVIATE_USERNAME": "",
"WEAVIATE_PASSWORD": "",
"MEMORY_INDEX": "AutogptTests"
})
class TestWeaviateMemory(unittest.TestCase):
"""
In order to run these tests you will need a local instance of
Weaviate running. Refer to https://weaviate.io/developers/weaviate/installation/docker-compose
for creating local instances using docker.
Alternatively in your .env file set the following environmental variables to run Weaviate embedded (see: https://weaviate.io/developers/weaviate/installation/embedded):

USE_WEAVIATE_EMBEDDED=True
WEAVIATE_EMBEDDED_PATH="/home/me/.local/share/weaviate"
"""
def setUp(self):
self.cfg = Config()

if self.cfg.use_weaviate_embedded:
from weaviate.embedded import EmbeddedOptions

self.client = Client(embedded_options=EmbeddedOptions(
hostname=self.cfg.weaviate_host,
port=int(self.cfg.weaviate_port),
persistence_data_path=self.cfg.weaviate_embedded_path
))
else:
self.client = Client(f"{self.cfg.weaviate_protocol}://{self.cfg.weaviate_host}:{self.cfg.weaviate_port}")

try:
self.client.schema.delete_class(self.cfg.memory_index)
except:
pass

self.memory = WeaviateMemory(self.cfg)

def test_add(self):
doc = 'You are a Titan name Thanos and you are looking for the Infinity Stones'
self.memory.add(doc)
result = self.client.query.get(self.cfg.memory_index, ['raw_text']).do()
actual = result['data']['Get'][self.cfg.memory_index]

self.assertEqual(len(actual), 1)
self.assertEqual(actual[0]['raw_text'], doc)

def test_get(self):
doc = 'You are an Avenger and swore to defend the Galaxy from a menace called Thanos'

with self.client.batch as batch:
batch.add_data_object(
uuid=get_valid_uuid(uuid4()),
data_object={'raw_text': doc},
class_name=self.cfg.memory_index,
vector=get_ada_embedding(doc)
)

batch.flush()

actual = self.memory.get(doc)

self.assertEqual(len(actual), 1)
self.assertEqual(actual[0], doc)


def test_get_stats(self):
docs = [
'You are now about to count the number of docs in this index',
'And then you about to find out if you can count correctly'
]

[self.memory.add(doc) for doc in docs]

stats = self.memory.get_stats()

self.assertTrue(stats)
self.assertTrue('count' in stats)
self.assertEqual(stats['count'], 2)


def test_clear(self):
docs = [
'Shame this is the last test for this class',
'Testing is fun when someone else is doing it'
]

[self.memory.add(doc) for doc in docs]

self.assertEqual(self.memory.get_stats()['count'], 2)

self.memory.clear()

self.assertEqual(self.memory.get_stats()['count'], 0)


if __name__ == '__main__':
unittest.main()