Skip to content

Commit

Permalink
Merge pull request Significant-Gravitas#424 from cs0lar/feature/weavi…
Browse files Browse the repository at this point in the history
…ate-memory

Feature/weaviate memory
  • Loading branch information
BillSchumacher committed Apr 16, 2023
2 parents a2d73dc + 1947df3 commit f36f08a
Show file tree
Hide file tree
Showing 7 changed files with 328 additions and 22 deletions.
21 changes: 21 additions & 0 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,27 @@ REDIS_PASSWORD=
WIPE_REDIS_ON_START=False
MEMORY_INDEX=auto-gpt

### WEAVIATE
# MEMORY_BACKEND - Use 'weaviate' to use Weaviate vector storage
# WEAVIATE_HOST - Weaviate host IP
# WEAVIATE_PORT - Weaviate host port
# WEAVIATE_PROTOCOL - Weaviate host protocol (e.g. 'http')
# USE_WEAVIATE_EMBEDDED - Whether to use Embedded Weaviate
# WEAVIATE_EMBEDDED_PATH - File system path were to persist data when running Embedded Weaviate
# WEAVIATE_USERNAME - Weaviate username
# WEAVIATE_PASSWORD - Weaviate password
# WEAVIATE_API_KEY - Weaviate API key if using API-key-based authentication
# MEMORY_INDEX - Name of index to create in Weaviate
WEAVIATE_HOST="127.0.0.1"
WEAVIATE_PORT=8080
WEAVIATE_PROTOCOL="http"
USE_WEAVIATE_EMBEDDED=False
WEAVIATE_EMBEDDED_PATH="/home/me/.local/share/weaviate"
WEAVIATE_USERNAME=
WEAVIATE_PASSWORD=
WEAVIATE_API_KEY=
MEMORY_INDEX=AutoGpt

### MILVUS
# MILVUS_ADDR - Milvus remote address (e.g. localhost:19530)
# MILVUS_COLLECTION - Milvus collection,
Expand Down
77 changes: 56 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ Development of this free, open-source project is made possible by all the <a hre
- [Redis Setup](#redis-setup)
- [🌲 Pinecone API Key Setup](#-pinecone-api-key-setup)
- [Milvus Setup](#milvus-setup)
- [Weaviate Setup](#weaviate-setup)
- [Setting up environment variables](#setting-up-environment-variables-1)
- [Setting Your Cache Type](#setting-your-cache-type)
- [View Memory Usage](#view-memory-usage)
Expand Down Expand Up @@ -267,7 +268,19 @@ export GOOGLE_API_KEY="YOUR_GOOGLE_API_KEY"
export CUSTOM_SEARCH_ENGINE_ID="YOUR_CUSTOM_SEARCH_ENGINE_ID"
```

## Redis Setup
## Setting Your Cache Type

By default, Auto-GPT is going to use LocalCache instead of redis or Pinecone.

To switch to either, change the `MEMORY_BACKEND` env variable to the value that you want:

* `local` (default) uses a local JSON cache file
* `pinecone` uses the Pinecone.io account you configured in your ENV settings
* `redis` will use the redis cache that you configured
* `milvus` will use the milvus cache that you configured
* `weaviate` will use the weaviate cache that you configured

### Redis Setup
> _**CAUTION**_ \
This is not intended to be publicly accessible and lacks security measures. Therefore, avoid exposing Redis to the internet without a password or at all
1. Install docker desktop
Expand Down Expand Up @@ -306,20 +319,6 @@ Pinecone enables the storage of vast amounts of vector-based memory, allowing fo
2. Choose the `Starter` plan to avoid being charged.
3. Find your API key and region under the default project in the left sidebar.

### Milvus Setup

[Milvus](https://milvus.io/) is a open-source, high scalable vector database to storage huge amount of vector-based memory and provide fast relevant search.

- setup milvus database, keep your pymilvus version and milvus version same to avoid compatible issues.
- setup by open source [Install Milvus](https://milvus.io/docs/install_standalone-operator.md)
- or setup by [Zilliz Cloud](https://zilliz.com/cloud)
- set `MILVUS_ADDR` in `.env` to your milvus address `host:ip`.
- set `MEMORY_BACKEND` in `.env` to `milvus` to enable milvus as backend.
- optional
- set `MILVUS_COLLECTION` in `.env` to change milvus collection name as you want, `autogpt` is the default name.

### Setting up environment variables

In the `.env` file set:
- `PINECONE_API_KEY`
- `PINECONE_ENV` (example: _"us-east4-gcp"_)
Expand All @@ -343,16 +342,52 @@ export PINECONE_ENV="<YOUR_PINECONE_REGION>" # e.g: "us-east4-gcp"
export MEMORY_BACKEND="pinecone"
```

## Setting Your Cache Type
### Milvus Setup

By default, Auto-GPT is going to use LocalCache instead of redis or Pinecone.
[Milvus](https://milvus.io/) is a open-source, high scalable vector database to storage huge amount of vector-based memory and provide fast relevant search.

To switch to either, change the `MEMORY_BACKEND` env variable to the value that you want:
- setup milvus database, keep your pymilvus version and milvus version same to avoid compatible issues.
- setup by open source [Install Milvus](https://milvus.io/docs/install_standalone-operator.md)
- or setup by [Zilliz Cloud](https://zilliz.com/cloud)
- set `MILVUS_ADDR` in `.env` to your milvus address `host:ip`.
- set `MEMORY_BACKEND` in `.env` to `milvus` to enable milvus as backend.
- optional
- set `MILVUS_COLLECTION` in `.env` to change milvus collection name as you want, `autogpt` is the default name.

* `local` (default) uses a local JSON cache file
* `pinecone` uses the Pinecone.io account you configured in your ENV settings
* `redis` will use the redis cache that you configured

### Weaviate Setup
[Weaviate](https://weaviate.io/) is an open-source vector database. It allows to store data objects and vector embeddings from ML-models and scales seamlessly to billion of data objects. [An instance of Weaviate can be created locally (using Docker), on Kubernetes or using Weaviate Cloud Services](https://weaviate.io/developers/weaviate/quickstart).
Although still experimental, [Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded) is supported which allows the Auto-GPT process itself to start a Weaviate instance. To enable it, set `USE_WEAVIATE_EMBEDDED` to `True` and make sure you `pip install "weaviate-client>=3.15.4"`.

#### Setting up environment variables

In your `.env` file set the following:

```
MEMORY_BACKEND=weaviate
WEAVIATE_HOST="127.0.0.1" # the IP or domain of the running Weaviate instance
WEAVIATE_PORT="8080"
WEAVIATE_PROTOCOL="http"
WEAVIATE_USERNAME="your username"
WEAVIATE_PASSWORD="your password"
WEAVIATE_API_KEY="your weaviate API key if you have one"
WEAVIATE_EMBEDDED_PATH="/home/me/.local/share/weaviate" # this is optional and indicates where the data should be persisted when running an embedded instance
USE_WEAVIATE_EMBEDDED=False # set to True to run Embedded Weaviate
MEMORY_INDEX="Autogpt" # name of the index to create for the application
```

### Milvus Setup

[Milvus](https://milvus.io/) is a open-source, high scalable vector database to storage huge amount of vector-based memory and provide fast relevant search.

- setup milvus database, keep your pymilvus version and milvus version same to avoid compatible issues.
- setup by open source [Install Milvus](https://milvus.io/docs/install_standalone-operator.md)
- or setup by [Zilliz Cloud](https://zilliz.com/cloud)
- set `MILVUS_ADDR` in `.env` to your milvus address `host:ip`.
- set `MEMORY_BACKEND` in `.env` to `milvus` to enable milvus as backend.
- optional
- set `MILVUS_COLLECTION` in `.env` to change milvus collection name as you want, `autogpt` is the default name.

## View Memory Usage

1. View memory usage by using the `--debug` flag :)
Expand Down
10 changes: 10 additions & 0 deletions autogpt/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,16 @@ def __init__(self) -> None:
self.pinecone_api_key = os.getenv("PINECONE_API_KEY")
self.pinecone_region = os.getenv("PINECONE_ENV")

self.weaviate_host = os.getenv("WEAVIATE_HOST")
self.weaviate_port = os.getenv("WEAVIATE_PORT")
self.weaviate_protocol = os.getenv("WEAVIATE_PROTOCOL", "http")
self.weaviate_username = os.getenv("WEAVIATE_USERNAME", None)
self.weaviate_password = os.getenv("WEAVIATE_PASSWORD", None)
self.weaviate_scopes = os.getenv("WEAVIATE_SCOPES", None)
self.weaviate_embedded_path = os.getenv("WEAVIATE_EMBEDDED_PATH")
self.weaviate_api_key = os.getenv("WEAVIATE_API_KEY", None)
self.use_weaviate_embedded = os.getenv("USE_WEAVIATE_EMBEDDED", "False") == "True"

# milvus configuration, e.g., localhost:19530.
self.milvus_addr = os.getenv("MILVUS_ADDR", "localhost:19530")
self.milvus_collection = os.getenv("MILVUS_COLLECTION", "autogpt")
Expand Down
13 changes: 13 additions & 0 deletions autogpt/memory/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@
print("Pinecone not installed. Skipping import.")
PineconeMemory = None

try:
from autogpt.memory.weaviate import WeaviateMemory
except ImportError:
print("Weaviate not installed. Skipping import.")
WeaviateMemory = None

try:
from autogpt.memory.milvus import MilvusMemory
except ImportError:
Expand Down Expand Up @@ -48,6 +54,12 @@ def get_memory(cfg, init=False):
)
else:
memory = RedisMemory(cfg)
elif cfg.memory_backend == "weaviate":
if not WeaviateMemory:
print("Error: Weaviate is not installed. Please install weaviate-client to"
" use Weaviate as a memory backend.")
else:
memory = WeaviateMemory(cfg)
elif cfg.memory_backend == "milvus":
if not MilvusMemory:
print(
Expand Down Expand Up @@ -77,4 +89,5 @@ def get_supported_memory_backends():
"PineconeMemory",
"NoMemory",
"MilvusMemory",
"WeaviateMemory"
]
110 changes: 110 additions & 0 deletions autogpt/memory/weaviate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
from autogpt.config import Config
from autogpt.memory.base import MemoryProviderSingleton, get_ada_embedding
import uuid
import weaviate
from weaviate import Client
from weaviate.embedded import EmbeddedOptions
from weaviate.util import generate_uuid5


def default_schema(weaviate_index):
return {
"class": weaviate_index,
"properties": [
{
"name": "raw_text",
"dataType": ["text"],
"description": "original text for the embedding"
}
],
}


class WeaviateMemory(MemoryProviderSingleton):
def __init__(self, cfg):
auth_credentials = self._build_auth_credentials(cfg)

url = f'{cfg.weaviate_protocol}://{cfg.weaviate_host}:{cfg.weaviate_port}'

if cfg.use_weaviate_embedded:
self.client = Client(embedded_options=EmbeddedOptions(
hostname=cfg.weaviate_host,
port=int(cfg.weaviate_port),
persistence_data_path=cfg.weaviate_embedded_path
))

print(f"Weaviate Embedded running on: {url} with persistence path: {cfg.weaviate_embedded_path}")
else:
self.client = Client(url, auth_client_secret=auth_credentials)

self.index = cfg.memory_index
self._create_schema()

def _create_schema(self):
schema = default_schema(self.index)
if not self.client.schema.contains(schema):
self.client.schema.create_class(schema)

def _build_auth_credentials(self, cfg):
if cfg.weaviate_username and cfg.weaviate_password:
return weaviate.AuthClientPassword(cfg.weaviate_username, cfg.weaviate_password)
if cfg.weaviate_api_key:
return weaviate.AuthApiKey(api_key=cfg.weaviate_api_key)
else:
return None

def add(self, data):
vector = get_ada_embedding(data)

doc_uuid = generate_uuid5(data, self.index)
data_object = {
'raw_text': data
}

with self.client.batch as batch:
batch.add_data_object(
uuid=doc_uuid,
data_object=data_object,
class_name=self.index,
vector=vector
)

return f"Inserting data into memory at uuid: {doc_uuid}:\n data: {data}"

def get(self, data):
return self.get_relevant(data, 1)

def clear(self):
self.client.schema.delete_all()

# weaviate does not yet have a neat way to just remove the items in an index
# without removing the entire schema, therefore we need to re-create it
# after a call to delete_all
self._create_schema()

return 'Obliterated'

def get_relevant(self, data, num_relevant=5):
query_embedding = get_ada_embedding(data)
try:
results = self.client.query.get(self.index, ['raw_text']) \
.with_near_vector({'vector': query_embedding, 'certainty': 0.7}) \
.with_limit(num_relevant) \
.do()

if len(results['data']['Get'][self.index]) > 0:
return [str(item['raw_text']) for item in results['data']['Get'][self.index]]
else:
return []

except Exception as err:
print(f'Unexpected error {err=}, {type(err)=}')
return []

def get_stats(self):
result = self.client.query.aggregate(self.index) \
.with_meta_count() \
.do()
class_data = result['data']['Aggregate'][self.index]

return class_data[0]['meta'] if class_data else {}
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ isort
gitpython==3.1.31
pytest
pytest-mock
tweepy
tweepy
Loading

0 comments on commit f36f08a

Please sign in to comment.