Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinecone Vectorized Memory #122

Merged

Conversation

dschonholtz
Copy link
Contributor

@dschonholtz dschonholtz commented Apr 4, 2023

What this does:

  1. Updates the readme showing how to get started with pinecone.
  2. Removes the local memory in favor of pinecone.
  3. Shows memory usage and prints it if the debug flag is on.
  4. Saves everything to memory
  5. Queries the memory on every chat and gets relevant stuff.
  6. Adds memory until the prompt + memory takes up 2500 tokens
  7. Removes all memory management

@dschonholtz dschonholtz marked this pull request as draft April 4, 2023 00:45
@ryanpeach
Copy link

This is totally necessary imo.

@waynehamadi
Copy link
Contributor

yeah thank you very much

@waynehamadi
Copy link
Contributor

waynehamadi commented Apr 4, 2023

@Torantulino could you create a pinecone account for autogpt? This way you can put the api key in the repo settings (I think only you has access to it right now) as a secret key and we can use it in our integration tests in github action.
We will need tests for this pinecone integration because I expect a lot of people will want to add things.

@jantic
Copy link

jantic commented Apr 4, 2023

This is really nice and clean code. And it's an exciting update!

Three thoughts:

  • Do we really need/want delete/overwrite? Seems a bit unnecessary and I don't really see the benefit considering the scope of what will be stored (not that huge, right?)
  • Down the road, perhaps things like building in a sort of "recency bias" will help to compile even better memory retrievals.
  • Perhaps memory retrievals should keep filling the context until token capacity is reached rather than a fixed number?

scripts/memory.py Outdated Show resolved Hide resolved
@dschonholtz dschonholtz force-pushed the pinecone-memory branch 2 times, most recently from c258b57 to f60bd62 Compare April 4, 2023 21:24
@dschonholtz dschonholtz changed the title Pinecone first commit Pinecone Vectorized Memory Apr 4, 2023
@dschonholtz
Copy link
Contributor Author

dschonholtz commented Apr 4, 2023

This is really nice and clean code. And it's an exciting update!

Three thoughts:

  • Do we really need/want delete/overwrite? Seems a bit unnecessary and I don't really see the benefit considering the scope of what will be stored (not that huge, right?)
  • Down the road, perhaps things like building in a sort of "recency bias" will help to compile even better memory retrievals.
  • Perhaps memory retrievals should keep filling the context until token capacity is reached rather than a fixed number?

I was debating whether or not to leave in delete/overwrite. I wanted to leave the interface the same for the different memory classes.
The simple memory agent definitely needs it. If we want to enforce pinecone usage I can definitely just scrap the delete overwrite code along with the simple agent class

As for memory retrievals. I really like that idea.
For the sake of getting this in I would argue we should make that a solid starter ticket for an excited new dev.
@jantic

@dschonholtz
Copy link
Contributor Author

Ok this should be working and it neatly pulls data out of memory.
I have attempted to tag the relevant issues but I'll do another sweep in a few.

It might be interesting to remove the explicit memory add tool and to just save all history to memory every couple messages. But I'm going to leave that for a future PR as I think the value of having working queryable pinecone is enough for one.

.env.template Outdated
@@ -1,3 +1,5 @@
PINECONE_API_KEY=your-pinecone-api-key
PINECONE_REGION=your-pinecone-region
Copy link
Contributor

@yousefissa yousefissa Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be PINECONE_ENV or we should update docs/code to reflrect the new name

Copy link

@jantic jantic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting!

The bummer part: overwrite calls are failing. It's saying it's missing the "data" parameter but it looks like the problem is the function signature is inconsistent with SimpleMemory.

My two cents: I personally think the memory mechanism has too much complication and shouldn't be subject to "memory management" by the model. Just save everything and retrieve what's relevant. If getting rid of "SimpleMemory" makes life easier on this front, I think that would be a good thing. Just stick with the much better memory implementation (PineCone) and save yourself maintenance and reasoning headaches....

@Torantulino I know the feeling about having a flood of pull requests and endless requests coming your way with a new viral open source project. So I feel for you. But if you have the time, this might be something worth thinking about.

@dschonholtz
Copy link
Contributor Author

Sounds good.

I was a bit afraid of that.
I'll torch the memory management and simple memory and repush. Unfortunately that will have to be tomorrow.

@lolxdmainkaisemaanlu
Copy link

Hope this gets merged soon, would be a huge upgrade.

@dschonholtz
Copy link
Contributor Author

So I've been working on this. I deleted a bunch of code :)
For "saving everything", I save everything every message and then we do look up on every message too. I cap the memory + prompt token usage to 2500 token leaving 1500 tokens for the response.

On another note. You would really like the agent to be able to traverse it's memories if there isn't enough context, so maybe adding a tool to think about something so it can traverse it's own memories would be useful.
Something like, memories stored in SQL. Agent notices a memory that is interesting with respect to a future goal. Decides that it should explore that memory further, and then can pull memories that were saved around that memory. You might get this for free if you save memories in an overlapping way though. You also might just be able to get neighboring vector ids and then you wouldn't need to use SQL.

I am trying to keep these changes relatively simple so we can merge today.

requirements.txt Outdated Show resolved Hide resolved
@jantic
Copy link

jantic commented Apr 5, 2023

You sometimes get overlap associated with message history and memories, but for now I think that is fine.

Yeah even with the overlap you cite, I'm pretty sure this will be a big improvement over what was in place previously, based on what I've observed. On my end at least (using GPT-3.5) it was really underutilizing the memory when the model was managing it. It'd only save very occasionally. Granted I'd bet it's better with GPT-4 with memory management but I'd really like to not unnecessarily handicap 3.5. And I'm pretty sure it'll be better to just unburden the model from this altogether.

On another note. You would really like the agent to be able to traverse it's memories if there isn't enough context, so maybe adding a tool to think about something so it can traverse it's own memories would be useful.

That sounds potentially very cool. It could be expensive and slow without limits, but definitely worth thinking about esp if there's going to be the option of using local models.

There's also the option of squeezing more into memory. Just recently there was a bunch of buzz on Twitter about an apparent internal compressed "language" of sorts that the GPT-4 model has going on. It's lossy but perhaps good enough? What I'm wondering is if a bulk of processing could be done with this https://twitter.com/gfodor/status/1643303237578293249

@dschonholtz
Copy link
Contributor Author

I'm extremely skeptical of the compression stuff. That is the kind of thing that seems likely to be lossy in subtle ways and have many non-obvious failure modes.
Definitely worth exploring later, even if we just summarize the existing memories rather than having it compress stuff.

@dschonholtz
Copy link
Contributor Author

dschonholtz commented Apr 5, 2023

This still formerly needs a review.
Please review if you are able and upvote the associated item in the discord #pull requests channel to see this get merged Remember this is pull 122
I also am on EST. So I will only be able to click merge or make changes for the next 3 ish hours. If this needs minor changes to merge it when I am unavailable, feel free to to make them and merge.

@Torantulino
Copy link
Member

Excellent work!

@alkeryn
Copy link

alkeryn commented Apr 5, 2023

looks like a cool pr but why not use an open source and self hostable vector database like milvus or other as to depend as little as possible on external services ?

@dschonholtz
Copy link
Contributor Author

Milvus looks cool: https://milvus.io/docs/example_code.md

I honestly picked something that a few other people were looking at inspired by the babyAGI library. It seemed like the simplest option to get started with.

Milvus, might be nearly as easy, but requires additional infra. For now, let's get this in and working and if we find people aren't able to operate in the free tier or we want to deploy our own vector DB infra we can switch over. It looks like it should be pretty close to a drag and drop replacement.

Copy link

@jantic jantic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on this. Thank you for doing this!

Some notes:

get_ada_embedding could be @cached I think

~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.

This originally just said save code to files. Now this reads like "save all important memories to file" even though we have the Pinecone memory. Seems like this may need to be reworded...not sure...

If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.

I like this!

@jantic
Copy link

jantic commented Apr 6, 2023

I'm extremely skeptical of the compression stuff. That is the kind of thing that seems likely to be lossy in subtle ways and have many non-obvious failure modes. Definitely worth exploring later, even if we just summarize the existing memories rather than having it compress stuff.

Yeah I am skeptical too, to be clear. It seems like it might be an interesting experimental mode perhaps.

waynehamadi
waynehamadi previously approved these changes Apr 6, 2023
try:
pinecone_mem = PineconeMemory(memory_name, num_relevant)
# test the openai api since we need it to make embeddings.
get_ada_embedding("test")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentional ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dschonholtz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry. Yes, we want to make sure we can make embeddings. WIll throw an error otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I was talking about the hard coded "test", it's been fixed

### Setting up environment variables
For Windows Users:
```
setx PINECONE_API_KEY "YOUR_GOOGLE_API_KEY"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
setx PINECONE_API_KEY "YOUR_GOOGLE_API_KEY"
setx PINECONE_API_KEY "YOUR_PINECONE_API_KEY"

### Setting up environment variables
For Windows Users:
```
setx PINECONE_API_KEY "YOUR_GOOGLE_API_KEY"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
setx PINECONE_API_KEY "YOUR_GOOGLE_API_KEY"
setx PINECONE_API_KEY "YOUR_PINECONE_API_KEY"

```
For macOS and Linux users:
```
export PINECONE_API_KEY="YOUR_GOOGLE_API_KEY"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export PINECONE_API_KEY="YOUR_GOOGLE_API_KEY"
export PINECONE_API_KEY="YOUR_PINECONE_API_KEY"

2. No user assistance
3. Exclusively use the commands listed in double quotes e.g. "command name"
1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@Torantulino Torantulino merged commit 48451e3 into Significant-Gravitas:master Apr 6, 2023
@dschonholtz dschonholtz deleted the pinecone-memory branch April 6, 2023 11:29
@scruffynerf
Copy link

Might want to consider https://github.com/featureform/featureform
(Specifically embeddinghub piece). Purely python, pip installable.

@cvarrichio
Copy link

I realize I'm about 10 hours late to the game, but why do the memory in pinecone? If you need a vector database, use something like FAISS. If you need unstructured, pickle or Elasticsearch. Sql, sqlite3. If you need ada embeddings, use OpenAIEmbeddings(). This introduces additional calls and makes a requirement for a pinecone API key. Can someone explain the advantages?

@pmb2
Copy link

pmb2 commented Apr 8, 2023

This might be useful to get large chunks of data in and out:
image

@irthomasthomas
Copy link

irthomasthomas commented Apr 8, 2023 via email

@cvarrichio
Copy link

FAISS and Chroma are also open source and on github and pypi. I do not know of a specific feature comparison between them.

@alreadydone
Copy link

alreadydone commented Apr 9, 2023

Notice that #372 has been merged, implementing memory based on the open-source Redis (it's not known as a vector database, but apparently supports vector similarity search to some extent; not claiming it's the best). BTW, Pinecone seems to use FAISS.

@Koobah
Copy link

Koobah commented Apr 10, 2023

Did you consider storing all intermediate steps in memory as well? I am particularly thinking about chunks generated during web browsing. It would be great to store and embed the chunks so gpt can refer to them later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.