-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Add PostgreSQL support for long-term memory storage #2892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR adds comprehensive PostgreSQL integration for CrewAI's long-term memory system: - Create LTMPostgresStorage class supporting PostgreSQL 16+ database backend - Implement connection pooling with psycopg for improved performance - Add storage factory pattern via LTMStorageFactory for backend selection - Support environment variable configuration for simplified deployment - Add context manager pattern for improved resource management - Implement robust validation and security measures: - Input validation for all database parameters - Protection against SQL injection - Secure connection string handling - Comprehensive error types for better debugging - Add CI-compatible tests with mocks - Create detailed documentation with examples and best practices - Add optional dependency for psycopg[pool] in pyproject.toml
fc7fe6d
to
fb099b8
Compare
@joaomdmoura can your review crew do this one too? |
@gvieira Do you have any feedback on this one? |
memory=True, # Enable memory system | ||
long_term_memory=long_term_memory, # Use PostgreSQL for long-term memory | ||
entity_memory=EntityMemory() # Required for automatic memory saving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For your use case, I'd recommend encouraging users to set only long_term_memory instead of configuring all the individual memory attributes.
|
||
### Automatic Memory Saving | ||
|
||
**Requires both `long_term_memory` AND `entity_memory` configured** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to make entity_memory
requierd?
) | ||
|
||
# Save manually | ||
crew._long_term_memory.save(memory_item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is a private attribute, we advise against relying on it directly. We can think in another public access, but not a private one
|
||
```python | ||
# Get the storage from your memory instance | ||
postgres_storage = crew.memory.storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you inferring that memory is long-term but might be any other. If you want to get long-term's storage I'd recommend expose it in our API instead
result = crew.kickoff() | ||
finally: | ||
# Always clean up resources, even if an error occurs | ||
memory.cleanup() # You can also use memory.close() for backward compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean reset
, isn't?
def __init__( | ||
self, | ||
storage=None, | ||
storage_type: str = "sqlite", | ||
path: Optional[str] = None, | ||
postgres_connection_string: Optional[str] = None, | ||
postgres_schema: Optional[str] = None, | ||
postgres_table_name: Optional[str] = None, | ||
postgres_min_pool_size: Optional[int] = None, | ||
postgres_max_pool_size: Optional[int] = None, | ||
postgres_use_connection_pool: Optional[bool] = None, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that storage_type
can be used as a factory parameter. However, I think this class should remain storage-type agnostic. We could rely on **kwargs
instead of mapping all supported attributes for each storage type here. IMO, the factory (as the PostgresStorageFactory) should be responsible for handling those mappings.. not this class
# Ensure quality is in metadata (from item.quality if available) | ||
if "quality" not in metadata and item.quality is not None: | ||
metadata["quality"] = item.quality | ||
|
||
# Check if quality is available | ||
if "quality" not in metadata: | ||
raise ValueError("Memory quality must be provided either in item.quality or item.metadata['quality']") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sending these info as metadata? Is it exclusive for LTStorage Postgress ?
self.storage.reset() | ||
|
||
def cleanup(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood what this method was doing.. at first, I thought it was meant to reset created entries, but I realized it’s actually for closing any open connections.
That said, I do have some concerns about the current API design, it feels a bit confusing. Here are a few thoughts off the top of my head:
- If users want to use long-term memory with Postgres, we should always encourage them to use context managers.
- Let’s consider removing this cleanup method, as its name could be easily confused with something like reset. If we decide to keep it, we should rename it to something clearer—maybe just close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to discuss the API usage if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tibbon I love your suggestion!
I just dropped some comments, I didn't finishs the review.. I'm looking forward for the next steps!
This PR adds PostgreSQL integration for CrewAI's long-term memory system. I added this because I'm using CrewAI in AWS Lambda and can't easily share an on-disk SQLite3 database between runs for long-term memory persistence. With this, you can use RDS/PostgreSQL. You still need mem0 for Entity storage.
This is my first contribution here, but I've tried to ensure good test coverage, tried everything out locally, added documentation, and ensured that everything passed the CI tests and linting. I'm not much of a Python programmer, and I'm happy to get feedback if my code isn't properly idiomatic. I'll amend it as needed.